Perhaps obviously this is the same technique that enables ACR on TVs.
It occurs to me that Shazam has such a better reputation online because the intent and consent of the user is honored.
It makes me wonder if there couldn’t be an implementation on TVs that is similar and actually is a net positive for consumers. Basically would customers actually like TV ACR if the data wasn’t just going to sell more ads?
So the value-add would be the consumer would get to find out the name of the show or movie that’s playing, the same info that also pops up if they hit the pause button?
> On the flipside, this "fingerprint" approach is also what makes Shazam work poorly if you just sing into it. You're likely to generate different hashes than the original song, even if you are a very good singer! This is why newer, machine-learning-based systems are built to handle humming and singing, by matching on melody rather than exact frequencies.
So this is why singing/whistling a song to my phone never worked! I've always imagined the tech as some sort of wave pattern matching but the DFT is obviously more efficient for many scenarios. Cool article!
Is a brilliant algo, and also works for multi-dim data. U can choose different distance functions - still works. Perhaps Dijkstra-shortest-path level significant for the robotic/ai era.
Recognizing a recording isn't hard to do, because, for the same recording, the chords follow each other with precisely repeatable timing. That's been around for well over a decade. Recognizing a different recording, say, a, cover version, of the same song, is much more work.
Audible Magic claims to be able to recognize multiple performances of the same songs, and even parodies.[1] Using, of course, "AI technology" and much more compute.
"Isn't hard to do" is doing some heavy lifting. Obviously on a society level it's simple tech we managed ages ago. But I would bet if you tasked individual devs at building it without looking up the answer, very few could do it.
It has been working "fine" for me generally for popular music. But then I was at a ice skating competition where there were some really nice synth:y music going on in the pauses, and I used Shazam on several of the songs, and I tried several times on each. It did not find a single one correctly.
Either this was unreleased music or very small niched music or something, or Shazam totally failed?
Yeah Shazam is mostly useless for songs that aren’t in the streaming apps, I’ve found. but not entirely useless! It sometimes matches me with stuff that’s only on YouTube.
Forgive my ignorance, but what does SCP mean in this context? (my normal go-to of 'secure copy' doesn't fit).
Thanks for the other links, the question in this title is one I've day-dreamily thought about on occasion, but never dug into. Will have a read of all three.
I feel like it does not work well. Shazam struggles to recognize music in real life environments that have some background noise, even with a lot of time. It’s much worse than the built in music recognition Google’s phones have, for example.
Might be the best visual explainer of Shazam original audio fingerprinting algorithm from the 2003 paper (I guess they´ve switched to ML models at some point?)
Not unless your noise is louder at certain dominant frequencies than the source. The article gives examples, but the algorithm basically throws away everything except frequency peaks, in order to make the lookups faster.
- OG shazam paper https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf (he has a talk on youtube btw look it up if really care)
- https://hackernews.hn/item?id=18069968 shazam employee blogpost
- https://hackernews.hn/item?id=38538996 shazam cofounder endorsed explainer
- go algo repro https://hackernews.hn/item?id=41127726
as with all ML things... the code is much less % of the value than the data...