{"id":13396938,"url":"https://github.com/worldveil/dejavu","last_synced_at":"2025-05-13T21:10:26.661Z","repository":{"id":11942433,"uuid":"14512299","full_name":"worldveil/dejavu","owner":"worldveil","description":"Audio fingerprinting and recognition in Python","archived":false,"fork":false,"pushed_at":"2024-04-22T19:23:00.000Z","size":76344,"stargazers_count":6538,"open_issues_count":133,"forks_count":1450,"subscribers_count":261,"default_branch":"master","last_synced_at":"2025-04-28T17:09:54.189Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/worldveil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-11-19T02:50:35.000Z","updated_at":"2025-04-26T08:34:59.000Z","dependencies_parsed_at":"2022-06-26T01:35:28.214Z","dependency_job_id":"18e4b4d5-c0c8-492d-8862-ac291900d4e8","html_url":"https://github.com/worldveil/dejavu","commit_stats":{"total_commits":124,"total_committers":24,"mean_commits":5.166666666666667,"dds":0.7741935483870968,"last_synced_commit":"e56a4a221ad204654a191d217f92aebf3f058b62"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldveil%2Fdejavu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldveil%2Fdejavu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldveil%2Fdejavu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldveil%2Fdejavu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/worldveil","download_url":"https://codeload.github.com/worldveil/dejavu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254029002,"owners_count":22002283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T18:01:08.153Z","updated_at":"2025-05-13T21:10:21.639Z","avatar_url":"https://github.com/worldveil.png","language":"Python","funding_links":[],"categories":["Audio","Python","资源列表","HarmonyOS","音频","音频处理","\u003ca id=\"170048b7d8668c50681c0ab1e92c679a\"\u003e\u003c/a\u003e工具","Audio automation","Audio [🔝](#readme)","Awesome Python","Uncategorized"],"sub_categories":["音频","Windows Manager","\u003ca id=\"016bb6bd00f1e0f8451f779fe09766db\"\u003e\u003c/a\u003e指纹\u0026\u0026Fingerprinting","Software","Drone Frames","Audio","Uncategorized"],"readme":"dejavu\n==========\n\nAudio fingerprinting and recognition algorithm implemented in Python, see the explanation here:  \n[How it works](http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/)\n\nDejavu can memorize audio by listening to it once and fingerprinting it. Then by playing a song and recording microphone input or reading from disk, Dejavu attempts to match the audio against the fingerprints held in the database, returning the song being played. \n\nNote: for voice recognition, *Dejavu is not the right tool!* Dejavu excels at recognition of exact signals with reasonable amounts of noise.\n\n## Quickstart with Docker\n\nFirst, install [Docker](https://docs.docker.com/get-docker/).\n\n```shell\n# build and then run our containers\n$ docker-compose build\n$ docker-compose up -d\n\n# get a shell inside the container\n$ docker-compose run python /bin/bash\nStarting dejavu_db_1 ... done\nroot@f9ea95ce5cea:/code# python example_docker_postgres.py \nFingerprinting channel 1/2 for test/woodward_43s.wav\nFingerprinting channel 1/2 for test/sean_secs.wav\n...\n\n# connect to the database and poke around\nroot@f9ea95ce5cea:/code# psql -h db -U postgres dejavu\nPassword for user postgres:  # type \"password\", as specified in the docker-compose.yml !\npsql (11.7 (Debian 11.7-0+deb10u1), server 10.7)\nType \"help\" for help.\n\ndejavu=# \\dt\n            List of relations\n Schema |     Name     | Type  |  Owner   \n--------+--------------+-------+----------\n public | fingerprints | table | postgres\n public | songs        | table | postgres\n(2 rows)\n\ndejavu=# select * from fingerprints limit 5;\n          hash          | song_id | offset |        date_created        |       date_modified        \n------------------------+---------+--------+----------------------------+----------------------------\n \\x71ffcb900d06fe642a18 |       1 |    137 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153\n \\xf731d792977330e6cc9f |       1 |    148 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153\n \\x71ff24aaeeb55d7b60c4 |       1 |    146 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153\n \\x29349c79b317d45a45a8 |       1 |    101 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153\n \\x5a052144e67d2248ccf4 |       1 |    123 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153\n(10 rows)\n\n# then to shut it all down...\n$ docker-compose down\n```\n\nIf you want to be able to use the microphone with the Docker container, you'll need to do a [little extra work](https://stackoverflow.com/questions/43312975/record-sound-on-ubuntu-docker-image). I haven't had the time to write this up, but if anyone wants to make a PR, I'll happily merge.\n\n## Docker alternative on local machine\n\nFollow instructions in [INSTALLATION.md](INSTALLATION.md)\n\nNext, you'll need to create a MySQL database where Dejavu can store fingerprints. For example, on your local setup:\n\t\n\t$ mysql -u root -p\n\tEnter password: **********\n\tmysql\u003e CREATE DATABASE IF NOT EXISTS dejavu;\n\nNow you're ready to start fingerprinting your audio collection! \n\nYou may also use Postgres, of course. The same method applies.\n\n## Fingerprinting\n\nLet's say we want to fingerprint all of July 2013's VA US Top 40 hits. \n\nStart by creating a Dejavu object with your configurations settings (Dejavu takes an ordinary Python dictionary for the settings).\n\n```python\n\u003e\u003e\u003e from dejavu import Dejavu\n\u003e\u003e\u003e config = {\n...     \"database\": {\n...         \"host\": \"127.0.0.1\",\n...         \"user\": \"root\",\n...         \"password\": \u003cpassword above\u003e, \n...         \"database\": \u003cname of the database you created above\u003e,\n...     }\n... }\n\u003e\u003e\u003e djv = Dejavu(config)\n```\n\nNext, give the `fingerprint_directory` method three arguments:\n* input directory to look for audio files\n* audio extensions to look for in the input directory\n* number of processes (optional)\n\n```python\n\u003e\u003e\u003e djv.fingerprint_directory(\"va_us_top_40/mp3\", [\".mp3\"], 3)\n```\n\nFor a large amount of files, this will take a while. However, Dejavu is robust enough you can kill and restart without affecting progress: Dejavu remembers which songs it fingerprinted and converted and which it didn't, and so won't repeat itself. \n\nYou'll have a lot of fingerprints once it completes a large folder of mp3s:\n```python\n\u003e\u003e\u003e print djv.db.get_num_fingerprints()\n5442376\n```\n\nAlso, any subsequent calls to `fingerprint_file` or `fingerprint_directory` will fingerprint and add those songs to the database as well. It's meant to simulate a system where as new songs are released, they are fingerprinted and added to the database seemlessly without stopping the system. \n\n## Configuration options\n\nThe configuration object to the Dejavu constructor must be a dictionary. \n\nThe following keys are mandatory:\n\n* `database`, with a value as a dictionary with keys that the database you are using will accept. For example with MySQL, the keys must can be anything that the [`MySQLdb.connect()`](http://mysql-python.sourceforge.net/MySQLdb.html) function will accept. \n\nThe following keys are optional:\n\n* `fingerprint_limit`: allows you to control how many seconds of each audio file to fingerprint. Leaving out this key, or alternatively using `-1` and `None` will cause Dejavu to fingerprint the entire audio file. Default value is `None`.\n* `database_type`: `mysql` (the default value) and `postgres` are supported. If you'd like to add another subclass for `BaseDatabase` and implement a new type of database, please fork and send a pull request!\n\nAn example configuration is as follows:\n\n```python\n\u003e\u003e\u003e from dejavu import Dejavu\n\u003e\u003e\u003e config = {\n...     \"database\": {\n...         \"host\": \"127.0.0.1\",\n...         \"user\": \"root\",\n...         \"password\": \"Password123\", \n...         \"database\": \"dejavu_db\",\n...     },\n...     \"database_type\" : \"mysql\",\n...     \"fingerprint_limit\" : 10\n... }\n\u003e\u003e\u003e djv = Dejavu(config)\n```\n\n## Tuning\n\nInside `config/settings.py`, you may want to adjust following parameters (some values are given below).\n\n    FINGERPRINT_REDUCTION = 30\n    PEAK_SORT = False\n    DEFAULT_OVERLAP_RATIO = 0.4\n    DEFAULT_FAN_VALUE = 5\n    DEFAULT_AMP_MIN = 10\n    PEAK_NEIGHBORHOOD_SIZE = 10\n    \nThese parameters are described within the file in detail. Read that in-order to understand the impact of changing these values.\n\n## Recognizing\n\nThere are two ways to recognize audio using Dejavu. You can recognize by reading and processing files on disk, or through your computer's microphone.\n\n### Recognizing: On Disk\n\nThrough the terminal:\n\n```bash\n$ python dejavu.py --recognize file sometrack.wav \n{'total_time': 2.863781690597534, 'fingerprint_time': 2.4306554794311523, 'query_time': 0.4067542552947998, 'align_time': 0.007731199264526367, 'results': [{'song_id': 1, 'song_name': 'Taylor Swift - Shake It Off', 'input_total_hashes': 76168, 'fingerprinted_hashes_in_db': 4919, 'hashes_matched_in_input': 794, 'input_confidence': 0.01, 'fingerprinted_confidence': 0.16, 'offset': -924, 'offset_seconds': -30.00018, 'file_sha1': b'3DC269DF7B8DB9B30D2604DA80783155912593E8'}, {...}, ...]}\n```\n\nor in scripting, assuming you've already instantiated a Dejavu object: \n\n```python\n\u003e\u003e\u003e from dejavu.logic.recognizer.file_recognizer import FileRecognizer\n\u003e\u003e\u003e song = djv.recognize(FileRecognizer, \"va_us_top_40/wav/Mirrors - Justin Timberlake.wav\")\n```\n\n### Recognizing: Through a Microphone\n\nWith scripting:\n\n```python\n\u003e\u003e\u003e from dejavu.logic.recognizer.microphone_recognizer import MicrophoneRecognizer\n\u003e\u003e\u003e song = djv.recognize(MicrophoneRecognizer, seconds=10) # Defaults to 10 seconds.\n```\n\nand with the command line script, you specify the number of seconds to listen:\n\n```bash\n$ python dejavu.py --recognize mic 10\n```\n\n## Testing\n\nTesting out different parameterizations of the fingerprinting algorithm is often useful as the corpus becomes larger and larger, and inevitable tradeoffs between speed and accuracy come into play. \n\n![Confidence](plots/confidence.png)\n\nTest your Dejavu settings on a corpus of audio files on a number of different metrics:\n\n* Confidence of match (number fingerprints aligned)\n* Offset matching accuracy\n* Song matching accuracy\n* Time to match\n\n![Accuracy](plots/matching_graph.png)\n\nAn example script is given in `test_dejavu.sh`, shown below:\n\n```bash\n#####################################\n### Dejavu example testing script ###\n#####################################\n\n###########\n# Clear out previous results\nrm -rf ./results ./temp_audio\n\n###########\n# Fingerprint files of extension mp3 in the ./mp3 folder\npython dejavu.py --fingerprint ./mp3/ mp3\n\n##########\n# Run a test suite on the ./mp3 folder by extracting 1, 2, 3, 4, and 5 \n# second clips sampled randomly from within each song 8 seconds \n# away from start or end, sampling offset with random seed = 42, and finally, \n# store results in ./results and log to ./results/dejavu-test.log\npython run_tests.py \\\n    --secs 5 \\\n    --temp ./temp_audio \\\n    --log-file ./results/dejavu-test.log \\\n    --padding 8 \\\n    --seed 42 \\\n    --results ./results \\\n    ./mp3\n```\n\nThe testing scripts are as of now are a bit rough, and could certainly use some love and attention if you're interested in submitting a PR! For example, underscores in audio filenames currently [breaks](https://github.com/worldveil/dejavu/issues/63) the test scripts. \n\n## How does it work?\n\nThe algorithm works off a fingerprint based system, much like:\n\n* [Shazam](http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf)\n* [MusicRetrieval](http://www.cs.cmu.edu/~yke/musicretrieval/)\n* [Chromaprint](https://oxygene.sk/2011/01/how-does-chromaprint-work/)\n\nThe \"fingerprints\" are locality sensitive hashes that are computed from the spectrogram of the audio. This is done by taking the FFT of the signal over overlapping windows of the song and identifying peaks. A very robust peak finding algorithm is needed, otherwise you'll have a terrible signal to noise ratio.\n\nHere I've taken the spectrogram over the first few seconds of \"Blurred Lines\". The spectrogram is a 2D plot and shows amplitude as a function of time (a particular window, actually) and frequency, binned logrithmically, just as the human ear percieves it. In the plot below you can see where local maxima occur in the amplitude space:\n\n![Spectrogram](plots/spectrogram_peaks.png)\n\nFinding these local maxima is a combination of a high pass filter (a threshold in amplitude space) and some image processing techniques to find maxima. A concept of a \"neighboorhood\" is needed - a local maxima with only its directly adjacent pixels is a poor peak - one that will not survive the noise of coming through speakers and through a microphone.\n\nIf we zoom in even closer, we can begin to imagine how to bin and discretize these peaks. Finding the peaks itself is the most computationally intensive part, but it's not the end. Peaks are combined using their discrete time and frequency bins to create a unique hash for that particular moment in the song - creating a fingerprint.\n\n![Spectgram zoomed](plots/spectrogram_zoomed.png)\n\nFor a more detailed look at the making of Dejavu, see my blog post [here](https://willdrevo.com/fingerprinting-and-audio-recognition-with-python/).\n\n## How well it works\n\nTo truly get the benefit of an audio fingerprinting system, it can't take a long time to fingerprint. It's a bad user experience, and furthermore, a user may only decide to try to match the song with only a few precious seconds of audio left before the radio station goes to a commercial break.\n\nTo test Dejavu's speed and accuracy, I fingerprinted a list of 45 songs from the US VA Top 40 from July 2013 (I know, their counting is off somewhere). I tested in three ways:\n\n1. Reading from disk the raw mp3 -\u003e wav data, and\n1. Playing the song over the speakers with Dejavu listening on the laptop microphone.\n1. Compressed streamed music played on my iPhone\n\nBelow are the results.\n\n### 1. Reading from Disk\n\nReading from disk was an overwhelming 100% recall - no mistakes were made over the 45 songs I fingerprinted. Since Dejavu gets all of the samples from the song (without noise), it would be nasty surprise if reading the same file from disk didn't work every time!\n\n### 2. Audio over laptop microphone\n\nHere I wrote a script to randomly chose `n` seconds of audio from the original mp3 file to play and have Dejavu listen over the microphone. To be fair I only allowed segments of audio that were more than 10 seconds from the starting/ending of the track to avoid listening to silence. \n\nAdditionally my friend was even talking and I was humming along a bit during the whole process, just to throw in some noise.\n\nHere are the results for different values of listening time (`n`):\n\n![Matching time](plots/accuracy.png)\n\nThis is pretty rad. For the percentages:\n\nNumber of Seconds | Number Correct | Percentage Accuracy\n----|----|----\n1 | 27 / 45 | 60.0%\n2 | 43 / 45 | 95.6%\n3 | 44 / 45 | 97.8%\n4 | 44 / 45 | 97.8%\n5 | 45 / 45 | 100.0%\n6 | 45 / 45 | 100.0%\n\nEven with only a single second, randomly chosen from anywhere in the song, Dejavu is getting 60%! One extra second to 2 seconds get us to around 96%, while getting perfect only took 5 seconds or more. Honestly when I was testing this myself, I found Dejavu beat me - listening to only 1-2 seconds of a song out of context to identify is pretty hard. I had even been listening to these same songs for two days straight while debugging...\n\nIn conclusion, Dejavu works amazingly well, even with next to nothing to work with. \n\n### 3. Compressed streamed music played on my iPhone\n\nJust to try it out, I tried playing music from my Spotify account (160 kbit/s compressed) through my iPhone's speakers with Dejavu again listening on my MacBook mic. I saw no degredation in performance; 1-2 seconds was enough to recognize any of the songs.\n\n## Performance\n\n### Speed\n\nOn my MacBook Pro, matching was done at 3x listening speed with a small constant overhead. To test, I tried different recording times and plotted the recording time plus the time to match. Since the speed is mostly invariant of the particular song and more dependent on the length of the spectrogram created, I tested on a single song, \"Get Lucky\" by Daft Punk:\n\n![Matching time](plots/matching_time.png)\n\nAs you can see, the relationship is quite linear. The line you see is a least-squares linear regression fit to the data, with the corresponding line equation:\n\n    1.364757 * record_time - 0.034373 = time_to_match\n    \nNotice of course since the matching itself is single threaded, the matching time includes the recording time. This makes sense with the 3x speed in purely matching, as:\n    \n    1 (recording) + 1/3 (matching) = 4/3 ~= 1.364757\n    \nif we disregard the miniscule constant term.\n\nThe overhead of peak finding is the bottleneck - I experimented with multithreading and realtime matching, and alas, it wasn't meant to be in Python. An equivalent Java or C/C++ implementation would most likely have little trouble keeping up, applying FFT and peakfinding in realtime.\n\nAn important caveat is of course, the round trip time (RTT) for making matches. Since my MySQL instance was local, I didn't have to deal with the latency penalty of transfering fingerprint matches over the air. This would add RTT to the constant term in the overall calculation, but would not effect the matching process. \n\n### Storage\n\nFor the 45 songs I fingerprinted, the database used 377 MB of space for 5.4 million fingerprints. In comparison, the disk usage is given below:\n\nAudio Information Type | Storage in MB \n----|----\nmp3 | 339\nwav | 1885\nfingerprints | 377\n\nThere's a pretty direct trade-off between the necessary record time and the amount of storage needed. Adjusting the amplitude threshold for peaks and the fan value for fingerprinting will add more fingerprints and bolster the accuracy at the expense of more space. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldveil%2Fdejavu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworldveil%2Fdejavu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldveil%2Fdejavu/lists"}