{"id":17054203,"url":"https://github.com/f0k/birdclef2018","last_synced_at":"2025-04-12T17:00:28.229Z","repository":{"id":140235012,"uuid":"139725677","full_name":"f0k/birdclef2018","owner":"f0k","description":"BirdCLEF 2018 implementation","archived":false,"fork":false,"pushed_at":"2019-05-03T15:46:40.000Z","size":71,"stargazers_count":15,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-26T11:21:19.689Z","etag":null,"topics":["audio-classification","bioacoustics","deep-learning"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/f0k.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-07-04T13:26:09.000Z","updated_at":"2024-03-29T19:23:45.000Z","dependencies_parsed_at":"2023-11-27T00:10:30.177Z","dependency_job_id":"a603554c-9a07-4a05-8d86-b5a30d3d5b31","html_url":"https://github.com/f0k/birdclef2018","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/f0k%2Fbirdclef2018","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/f0k%2Fbirdclef2018/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/f0k%2Fbirdclef2018/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/f0k%2Fbirdclef2018/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/f0k","download_url":"https://codeload.github.com/f0k/birdclef2018/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248602273,"owners_count":21131613,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-classification","bioacoustics","deep-learning"],"created_at":"2024-10-14T10:14:16.985Z","updated_at":"2025-04-12T17:00:28.197Z","avatar_url":"https://github.com/f0k.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"Bird Identification from Timestamped, Geotagged Audio Recordings\n================================================================\n\nThis is the implementation of the\n[BirdCLEF 2018](http://www.imageclef.org/node/230) submission by\n[OFAI](http://www.ofai.at) within the [aMOBY](http://amoby.ofai.at) project.\n\nIt allows training an ensemble of neural networks to recognize 1500 South\nAmerican bird species in audio recordings, with an option to factor in\nmetadata about the recording date, time and location.\n\nIt contains the code for preparing the dataset (converting audio files and\nparsing metadata), for training a set of different models on audio recordings\nand/or metadata, for finding weights to form an ensemble of those models, and\nfor producing the predictions on the test set for submission to the challenge.\n\nFor a detailed description of the approach, please refer to the paper\n\"Bird Identification from Timestamped, Geotagged Audio Recordings\"\nby Jan Schlüter included in the CLEF Working Notes 2018.\n[[Paper](http://ofai.at/~jan.schlueter/pubs/2018_birdclef.pdf),\n [BibTeX](http://ofai.at/~jan.schlueter/pubs/2018_birdclef.bib)]\n\n\nPreliminaries\n-------------\n\nThe code requires the following software:\n* Python 2.7+ or 3.4+\n* Python packages: numpy, scipy, Theano, Lasagne\n* bash or a compatible shell\n* ffmpeg\n\nFor better performance, the following Python packages are recommended:\n* pyfftw (for much faster spectrogram computation)\n\nBefore installing the dependencies, if desired, create and activate an\nenvironment using `pyenv` and/or `virtualenv`/`venv`, or using `conda`.\n\nInstall the bleeding-edge versions of Theano and Lasagne from github:\n```bash\npip install --upgrade --no-deps https://github.com/Theano/Theano/archive/master.zip\npip install --upgrade --no-deps https://github.com/Lasagne/Lasagne/archive/master.zip\n```\n(If not in an environment, add `--user` to install in your home directory, or\n`sudo` to install globally.)\n\nFor GPU support, also install libgpuarray, following its [installation\ninstructions](http://deeplearning.net/software/libgpuarray/installation.html).\nFor a more complete guide including CUDA and cuDNN, please refer to the [From\nZero to Lasagne](https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne)\nguides.\n\nFor faster FFTs, install libfftw3 and pyfftw. On Ubuntu, this can be done with:\n```bash\nsudo apt-get install libfftw3-dev\npip install pyfftw\n```\nUnder conda, it would be:\n```bash\nconda install -c conda-forge pyfftw\n```\n\n\nSetup\n-----\n\nFor preparing the experiments, clone the repository somewhere:\n```bash\ngit clone https://github.com/f0k/birdclef2018.git\n```\nIf you do not have `git` available, download the code from\nhttps://github.com/f0k/birdclef2018/archive/master.zip and extract it.\n\nThe experiments rely on the BirdCLEF 2018 dataset. First download the files\n(specifically, BirdCLEF2017TrainingSetPart1.tar.gz,\nBirdCLEF2017TrainingSetPart2.tar.gz, BirdCLEF2018MonophoneTest.tar.gz,\nBirdCLEF2018SoundscapesTest.tar.gz, BirdCLEF2018SoundscapesValidation.tar.gz)\nand extract them to a common directory. If you were not a BirdCLEF participant,\nask the organizers if they are willing to share the URLs.\n\nThen open the cloned or extracted repository in a bash terminal and execute the\nfollowing:\n```bash\n./datasets/birdclef/recreate.sh\n```\nIt will tell you that you need to specify the path to the extracted files, but\nit will also display some useful hints on how to organize the placement of the\nconverted audio files.\nThis script will call other scripts to convert the audio to 22 kHz mono files\n(this saves time during training), build the file lists for training and\ntesting, and extract the ground truth and metadata from the XML files.\n\nFinally, for all following commands, go into the `experiments` directory:\n```bash\ncd experiments\n```\n\n\nTraining\n--------\n\nTo train all models for the ensemble, simply run:\n```bash\n./train_all.sh\n```\nTo use a GPU, either setup a `.theanorc` file in your home directory, or run:\n```bash\nTHEANO_FLAGS=device=cuda,floatX=float32,gpuarray.preallocate=11000 ./train_all.sh\n```\nThis will train 17 audio, 19 metadata and one combined network(s). On an Nvidia\nTitan X Pascal GPU, a single training run will take up to 10 hours for audio,\nand 50 minutes for metadata networks. If your GPU does not have enough memory,\nreduce or remove the `gpuarray.preallocate=11000` setting, and reduce the batch\nsize that is set in the `defaults.vars` file.\n\nIf you have multiple GPUs, you can distribute runs over these GPUs by running\nthe script multiple times in multiple terminals with different target devices,\ne.g., `THEANO_FLAGS=device=cuda1 ./train_all.sh`. If you have multiple servers\nthat can access the same directory via NFS, you can also run the script on\neach server for further distribution of runs (runs are blocked with lockfiles).\n\nThe script will also compute network predictions after each training run. If\nthis failed for some jobs for some reasons, run:\n```bash\n./predict_missing.sh\n```\nThis will compute any missing network predictions (if none are missing, nothing\nhappens).\n\n\nEvaluation\n----------\n\nTo obtain results for all networks trained so far, run:\n```bash\n./eval_all.sh\n```\nThis will print the Mean Average Precision (MAP) against the foreground species,\nthe MAP against the background species, and the top-*k* accuracy for the\nforeground species for *k* between 1 and 5, all on the validation set (the test\nset is kept secret by the organizers of the BirdCLEF challenge).\n\n\nEnsembling\n----------\n\nAfter all models have been trained, you can run `hyperopt` to find an optimal\nlinear combination of models based on the validation set performance. Install\nit with:\n```bash\npip install hyperopt\n```\nWe can now run `blender.py` to do the actual optimization. The commands are\ndocumented in comments in `submit_all.sh`. For example, for the audio-only\nensemble, run:\n```bash\n./blender.py --dataset=birdclef --labelfile-background=bg.tsv --strategy=hyperopt \\\n  birdclef/{dummy,resnet1}_{lme1,att16,att64}_fdrop05_fM10k_powlearn_shift5_fs1024_mc2cgr1dgr1{,_mixfgbg}{,_ban1}.pred.pkl\n```\nIn the end, it will produce a list of selected models and combination weights\nthat can be directly copied to `submit_all.sh`, preceded by `submit` and a name\nfor the ensemble. It can also be used directly as arguments to `./eval.py` to\nevaluate the ensemble.\n\n\nSubmission\n----------\n\nFinally, to create the CSV files for submission, run:\n```bash\n./submit_all.sh\n```\nPrefix the command with a `THEANO_FLAGS=...` setting if needed.\nThis will compute predictions on the test set for all models participating in\nany of the ensembles, combine the predictions according to the weights, and\nproduce a CSV file for each ensemble.\n\n\nReusing\n-------\n\n### ... for different datasets\n\nDatasets can be added to the `datasets` directory and their name be passed as\nthe `--dataset` argument of `train.py`, `predict.py`, `eval.py` (and\n`blender.py`, if needed). Each dataset directory must contain:\n* an `audio` subdirectory with `.wav` files (this is a strict requirement, since\n  they are accessed as memory maps),\n* a `filelists` directory with at least a `train` and `valid` file listing the\n  file names relative to the `audio` directory, and\n* a `labels` directory with a `fg.tsv` file listing the training and validation\n  file names along with their class labels, with a tab character in between, and\n  a `labelset` file listing all class names to give them a fixed order.\n\n### ... for different frameworks\n\nThe implementation makes some use of features unique to Lasagne, so it is not\ntrivial to port completely to another framework. Some parts may be interesting\nto take out, though:\n* `audio.py` contains code for fast spectrogram computation, and a `WavFile`\n  class for masquerading a `.wav` files as a numpy array that is lazily mapped\n  to memory when needed.\n* `augment.py` contains `grab_random_excerpts()`, which provides a way to yield\n  random excerpts from a set of audio files with wildly different lengths. Each\n  mini-batch will have same-length excerpts, with the length bounded between a\n  given minimum and maximum length, and files drawn from buckets to avoid\n  excessive cropping or padding.\n* `model.py` contains a learnable mel filterbank, a learnable magnitude\n  transformation, PCEN, and log-mean-exp pooling\n* `model_to_fcn.py` implements a conversion of a CNN that classifies excerpts to\n  a fully-convolutional network with dilated convolutions and dilated max-pooling\n  that efficiently processes a full recording, keeping the full output resolution\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ff0k%2Fbirdclef2018","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ff0k%2Fbirdclef2018","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ff0k%2Fbirdclef2018/lists"}