Could music fingerprinting be copyright infringement?
Music fingerprinting is the process of extracting essential
information from digitized audio recordings (e.g. CD tracks) so it can
be searched in a database. Typically a large database is compiled of
known recordings and then the database is searched for fingerprints of
unknown audio samples. When a match is found, the previously unknown
audio sample can be identified reliably.
Music fingerprinting has a lot of applications, some of them are:
- Song identification services like Shazam. The user can record a
small sample of a song (played on the radio or as background music in
a shop) using a mobile phone, which is then
fingerprinted. The server matches this fingerprint against a database
of fingerprints of known songs. Then the server provides the artist
and song title, so the user can purchase a CD of that song.
- Automatic royalty collection from radio stations. Songs played by
a radio station are fingerprinted and then identified automatically.
- Filtering of infringing audio from file sharing services of video
sites like Youtube.
Music fingerprinting is big business, but entry into the market with
independently developed software (not just buying one of the existing
solutions) is not
simple. And this is a gross understatement. There are several reasons
why it is so hard to enter the market:
- Music fingerprinting algorithms are patented and optimizations of these
algorithms in existing applications are protected by trade
secrets. One must first negotiate a license for all relevant patents
and then one has to spend much time to develop the software, also
tuning the algorithm for best matching capabilities as these aspects
of existing applications are protected by trade secrets. It may be
possible to write an algorithm that does not use any of the patented technologies.
- Real life applications of audio fingerprinting require a huge
database of fingerprints of (nearly) all recorded music. Shazam has a
database of 8 million songs (the equivalent of more than half a
million CDs). Before you can compile this database, you
need access to the recordings in the first place. Although it is
certainly possible to create fingerprints of recordings without making
intermediate copies, it is much easier to do it when you can make
intermediate copies, especially if the process of fingerprinting must
be repeated later. But in most countries these copies cannot be
made without permission.
- When you decide to improve the fingerprinting method, you have to
repeat the entire fingerprinting process for all recordings.
- The format of fingerprints is not standardized (not even publicly
documented). Therefore you cannot buy just the database of
fingerprints and use it with your own application.
-
The copyright status of music fingerprints themselves is unclear. This
risk can be mitigated though.
This article focuses on the copyright implications of music
fingerprinting.
How music fingerprinting works
Landmark Digital Services LLC owns several patents on music
fingerprinting, two of them are:
This
article by Bryan Jacobs describes how the matching algorithm in
Shazam works. Here
is an explanation of the music fingerprinting and matching
algorithm that is accompanied by a Matlab implementation. Finally Roy
van Rijn in The Netherlands created a rough implementation of a music
fingerprinting program in Java as described