Detecting derivatives at planetary scale — Sound Post

Why classical fingerprinting fails

The Shazam-era audio fingerprint — a constellation of spectral peaks extracted from a few seconds of audio and matched against a reference database — solves one problem well. It finds exact copies of known references in noisy environments: a song playing on a club system, a track underneath a radio voice-over, a record in the background of a phone call. For that problem, classical fingerprinting is essentially solved. The recall is near-perfect; the false-positive rate is comfortably below one in ten million.

It does not, however, find a 130-BPM sped-up edit of a 100-BPM original. It does not find an acoustic fan cover where the instrumentation is completely different from the master. It does not find a 9-second short-form mashup where two-thirds of the audio is your track but it's been pitched up half a semitone and layered under a vocal that isn't yours. The constellation that classical fingerprinting looks for moves, distorts, or dissolves entirely under these transformations.

This isn't a flaw in the algorithm; it's a fundamental limit of what a single timbre-based representation can resolve. To find what classical fingerprinting cannot, you need different representations of the same audio, and you need them in parallel.

Multi-scale matching

Nomad Listen runs four detectors in parallel, each operating on a different representation of the audio. A track and a candidate derivative match only when at least two detectors agree above their respective thresholds — the agreement rule is what keeps the system useful in production rather than drowning the rights team in false positives.

The four detectors are: tempo-invariant melodic, which extracts the melodic contour onto a tempo-normalised pitch curve and is what catches sped-up edits and pitch-shifted remixes; harmonic-progression, which models the chord-change sequence at the bar level and catches acoustic covers in different instrumentation; lyric-aware, which runs an audio-to-phoneme encoder and matches phoneme runs against transcribed reference lyrics, catching lyric-swapped mashups where most of the words are still the original; and timbre-only, which is the classical Shazam-class detector preserved as a reliable baseline for noisy exact copies.

The detectors are deliberately heterogeneous. They have different failure modes. A sped-up edit fools timbre-only but is invariant under the melodic detector. An acoustic cover fools timbre-only and melodic but the harmonic-progression detector catches it. A short-form mashup might confuse melodic and harmonic individually but the lyric-aware detector reads the original words underneath. Agreement between any two of these is a much higher-confidence signal than the score from any one alone.

Why this matters commercially

Short-form-driven derivatives now drive a measurable share of catalogue streams on every major platform. Internal data from the eight catalogues we've reconciled in pilot suggests that between 6% and 14% of total catalogue play activity, depending on genre mix, originates from derivative content — sped-up edits, fan covers, short-form mashups, snippet remixes. For some catalogues with strong short-form presence, the figure is north of 20%.

Without derivative detection, rights holders cannot evidence the chain of attribution that should pay them. The plays exist, the audience is real, the platforms collect (and often distribute) money — but the money flows on the basis of metadata that the original rights holder never gets to influence. A derivative uploaded by a third party with imprecise or incorrect attribution is, structurally, lost revenue from the moment it goes live.

Multi-scale matching changes the economics of this directly. By surfacing the derivatives a rights team can evidence, we make it possible to claim them — through Content ID, through MLC dispute, through neighbouring-rights agencies, or through direct platform conversations. The pilot data shows that catalogues running continuous derivative detection recover derivative-driven revenue at 2–5× the rate of those running quarterly snapshots, and at roughly the same triage cost.

Production numbers

The current production index contains 78 million reference tracks. Match latency on an 8-second query window — what a typical short-form clip provides — runs comfortably under one second across the full index. The false-positive rate at the conservative confidence band, which is the band most rights teams use for automated claiming, is 1.6 × 10⁻⁷ per match attempt. At a query volume of 200 million attempts per month — roughly what a major-distributed indie sees — that's about 32 false-positive surfaces per month before human triage.

Recall is more interesting because there's no single number that captures it. On our internal cover-test corpus — 4,200 confirmed covers across 16 genres — top-band recall is 94%. On the short-form derivative corpus — 11,000 confirmed snippet remixes scraped from platform takedown notices — top-band recall is 87%. On lyric-swapped mashups, which are the hardest single case, recall drops to 71%, and the agreement rule sometimes requires the lyric-aware detector to push past its default threshold. We're transparent about that band; teams using Nomad Listen know to expect it.

Where this is going

The next two years of work on Listen are about closing the loop with the platforms themselves. Direct integration with the major short-form platforms to surface derivatives at upload time — before they accumulate plays under wrong attribution — is technically feasible and commercially overdue. The platforms have an obvious incentive to support this; the rights-holders have an obvious incentive to demand it; the missing piece has been a detection layer that produces output a platform can actually act on.

Labels begin to recover royalties they did not previously know were owed. Publishers begin to evidence sync claims they could not previously substantiate. Managers begin to surface derivative-driven audience patterns their artists' campaigns can lean into. None of this is hypothetical; it's what the pilot catalogues are already doing. The platform work is the multiplier — the part that closes the loop between detection and payment without quarterly waiting.

If you run a catalogue and want to estimate your own derivative coverage gap, send us a 1,000-track sample. We'll come back with the number, three specific tracks where derivatives are running uncovered, and an honest answer about whether Nomad Listen is the right tool for your particular case.

// Filed under: Audio Research · Nomad Listen · 78M tracks

Fan covers, sped-up edits, short-form mashups:
detecting derivatives at planetary scale.

Why classical fingerprinting fails

Multi-scale matching

Why this matters commercially

Production numbers

Where this is going

Continue reading

Why your royalty statement is wrong, and how AI reconciliation finally finds the missing money

A&R by listening: sub-genre embeddings that find the next artist instead of the next chart