{"id":37686483,"url":"https://github.com/kahst/birdclef-baseline","last_synced_at":"2026-01-16T12:34:11.717Z","repository":{"id":31598560,"uuid":"128376331","full_name":"kahst/BirdCLEF-Baseline","owner":"kahst","description":"The 2018 LifeCLEF bird identification task baseline system.","archived":false,"fork":false,"pushed_at":"2021-12-30T10:04:21.000Z","size":1217,"stargazers_count":47,"open_issues_count":2,"forks_count":15,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-01-27T17:36:16.988Z","etag":null,"topics":["bioaccoustics","birds","birdsong","convolutional-neural-networks","lasagne","lifeclef","sound","spectrogram"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kahst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-06T09:39:34.000Z","updated_at":"2024-01-09T06:04:48.000Z","dependencies_parsed_at":"2022-08-24T22:50:17.774Z","dependency_job_id":null,"html_url":"https://github.com/kahst/BirdCLEF-Baseline","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/kahst/BirdCLEF-Baseline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kahst%2FBirdCLEF-Baseline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kahst%2FBirdCLEF-Baseline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kahst%2FBirdCLEF-Baseline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kahst%2FBirdCLEF-Baseline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kahst","download_url":"https://codeload.github.com/kahst/BirdCLEF-Baseline/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kahst%2FBirdCLEF-Baseline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioaccoustics","birds","birdsong","convolutional-neural-networks","lasagne","lifeclef","sound","spectrogram"],"created_at":"2026-01-16T12:34:11.626Z","updated_at":"2026-01-16T12:34:11.709Z","avatar_url":"https://github.com/kahst.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System\nBy [Stefan Kahl](http://medien.informatik.tu-chemnitz.de/skahl/about/), [Thomas Wilhelm-Stein](https://www.tu-chemnitz.de/informatik/HomePages/Medieninformatik/team.php.en), [Holger Klinck](http://www.birds.cornell.edu/page.aspx?pid=1735\u0026id=489), [Danny Kowerko](https://www.tu-chemnitz.de/informatik/mc/staff.php.en), and [Maximilian Eibl](https://www.tu-chemnitz.de/informatik/HomePages/Medieninformatik/team.php.en)\n\n## Introduction\nWe provide a baseline system for the LifeCLEF bird identification task BirdCLEF2018. We encourage participants to build upon the code base and share their results for future reference. We will keep the repository updated and will add improvements and submission boilerplate in the future. \n\n\u003cb\u003eIf you have any questions or problems running the scripts, don't hesitate to contact us.\u003c/b\u003e\n\nContact:  [Stefan Kahl](http://medien.informatik.tu-chemnitz.de/skahl/about/), [Technische Universität Chemnitz](https://www.tu-chemnitz.de/index.html.en), [Media Informatics](https://www.tu-chemnitz.de/informatik/Medieninformatik/index.php.en)\n\nE-Mail: stefan.kahl@informatik.tu-chemnitz.de\n\nThis project is licensed under the terms of the MIT license.\n\n## Citation\n\nPlease cite the paper in your publications if the repository helps your research.\n\n```\n@article{kahl2018recognizing,\n  title={Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System},\n  author={Kahl, Stefan and Wilhelm-Stein, Thomas and Klinck, Holger and Kowerko, Danny and Eibl, Maximilian},\n  journal={arXiv preprint arXiv:1804.07177},\n  year={2018}\n}\n```\n\n\u003cb\u003eYou can download our paper here:\u003c/b\u003e [https://arxiv.org/abs/1804.07177](https://arxiv.org/abs/1804.07177)\n\n\u003ci\u003eScore update: We now achive a MLRAP score of 0.61 for a single model and 0.72 for an ensemble on the local validation set with the latest updates using different architectures (e.g. ResNet) and dataset compilations.\u003c/i\u003e\n\n## Installation\nThis is a Thenao/Lasagne implementation in Python 2.7 for the identification of hundreds of bird species based on their vocalizations. This code is tested using Ubuntu 16.04 LTS but should work with other distributions as well.\n\nBefore cloning the repository, you need to install CUDA, cuDNN, OpenCV, Libav, Theano and Lasagne. You can find more detailed instructions [below](#installation-details). After that, you can use the Python package tool PIP to install missing dependencies after the download of the repository:\n\n```\ngit clone https://github.com/kahst/BirdCLEF-Baseline.git\ncd BirdCLEF-Baseline\nsudo pip install –r requirements.txt\n```\n\n## Docker\nOn your host system you need to ...\n1. Install [Docker Engine Utility for NVIDIA GPUs](https://github.com/NVIDIA/nvidia-docker)\n2. Clone repository `git clone https://github.com/kahst/BirdCLEF-Baseline.git`\n2. Run `./docker-run \u003cpath_to_datasets\u003e`\n\nThe `docker-run.sh` script takes care of all required tasks (see [Workflow](#workflow))\n\n## Dataset\nYou can download the BirdCLEF training and test data via https://www.crowdai.org. \n\nYou need to register for the challenges to access the data. After download, you need to unpack the two archives and change the path to the resulting directory containing \"wav\" and \"xml\" folders in the `config.py` script.\n\n\u003ci\u003e\u003cb\u003eNote:\u003c/b\u003e The dataset is quite large, you will need \u003cb\u003e~250 GB\u003c/b\u003e for the training data.\u003c/i\u003e\n\n## Workflow\n\nOur workflow consists of four main phases: First, we need to \u003cb\u003esort the BirdCLEF training data\u003c/b\u003e. Secondly, we \u003cb\u003eextract spectrograms\u003c/b\u003e from audio recordings. Thirdly, we \u003cb\u003etrain a deep neural net\u003c/b\u003e based on the resulting spectrograms - we treat the audio classification task as an image processing problem. Finally, we \u003cb\u003etest the trained net\u003c/b\u003e given a local validation set of unseen audio recordings. We also support the final submission format - you can build a valid submission after training.\n\n### Sorting the data\n\nWe want to divide the dataset into a train and validation split. The validation split should comprise 10% of the entire dataset and should contain at least one sample per bird species. Additionally, we want to copy the samples into folders named after the class they represent. The training script uses subfolders as class names (labels), so the sorted dataset should look like this:\n\n```\ndataset\n¦\n+--train   \n¦  ¦\n¦  +---species1\n¦  ¦      file011.wav\n¦  ¦      file012.wav\n¦  ¦      ...\n¦  ¦   \n¦  +---species2\n¦  ¦      file021.wav\n¦  ¦      file022.wav\n¦  ¦      ...\n¦  ¦    \n¦  +---...\n¦\n+--val\n¦  ¦\n¦  +---species1\n¦  ¦      file013.wav\n¦  ¦      ...\n¦  ¦\n¦  +---species2\n¦  ¦      file023.wav\n¦  ¦      ...\n¦  ¦\n¦  +---...\n¦\n+--metadata\n      file011.json\n      file012.json\n      ...\n\n```\nBefore running the script `sort_data.py`, you need to adjust the path pointing to the extracted wav and xml files from the BirdCLEF training data in the `config.py` by setting the value for `TRAINSET_PATH`. We are using the scientific name of each species as label, that makes ist easier to include background species in the metric for evaluation. However, you can use any class name you want - the class ID provided with the xml files would be equally good.\n\nThe `metadata` directory contains JSON-files which store some additional information, most importantly the list of background species of each recording.\n\n\u003ci\u003e\u003cb\u003eNote:\u003c/b\u003e You can use any other dataset for training, as long you organize it in the same way. Simply adjust the sorting script accordingly.\u003c/i\u003e\n\n### Spectrogram Extraction\n\nExtracting spectrograms from audio recordings is a vital part of our system. We decided to use MEL-scale log-amplitude spectrograms, which each represent one second of a recording. We are using \u003cb\u003elibrosa\u003c/b\u003e for all of the audio processing. The script `utils/audio.py` contains all the logic. You can run the script stand-alone with the provided example wav-file.\n\nYou can run the script `spec.py` to start the extraction - this might take a while, depending on your CPU.\n\nThe `config.py` contains a section with all important settings, like sample rate, chunk length and cut-off frequencies. We are using these settinsg as defaults:\n\n```\nSPEC_TYPE = 'melspec'\nSAMPLE_RATE = 44100\nSPEC_FMIN = 500\nSPEC_FMAX = 15000\nSPEC_LENGTH = 1.0\nSPEC_OVERLAP = 0.25\nSPEC_MINLEN = 1.0\nSPEC_SIGNAL_THRESHOLD = 0.001\n```\n\nMost monophonic recordings from the BirdCLEF dataset are sampled at `44.1 kHz`, we use a low-pass and high-pass filter at `15 kHz` and `500 Hz`. Our signal chunks are of `1 s` length - you can use any other chunk length if you like. The `SPEC_OVERLAP` value defines the step width for extraction, consecutive spectrograms are overlapping by the defined amount. The `SPEC_MINLEN` value excludes all chunks shorter than `1 s` from the extraction. We support linear and mel-scale spectrograms.\n\nOur rule-based spectrogram analysis rejects samples, which do not contain any bird sounds. It also estimates the signal-to-noise ratio based on some simple calculations. The rejection threshold is set through the `SPEC_SIGNAL_THRESHOLD` value and will be preserved in the filename of the saved spectrogram file.\n\n### Training\n\nIf your dataset is sorted and all specs have been extracted, you can start training your own CNN. If you changed some of the paths, make sure to adjust the settings in the `config.py` accordingly.\n\nThere are numerous settings that you can change to adjust the net itself and the training process. Most of them might have significant impact on the duration of the training process, memory consumption and result quality.\n\nAll options are preceded by a comment explaining the impact of changes - if you still have questions or run into any trouble, please do not hesitate to contact us.\n\nTo start the training, simply run the script `train.py`. This will automatically call the following procedures:\n\n- parsing the dataset for samples\n- building a neural net\n- compiling Thenao test and train functions\n- generating batches of samples (incl. augmentation)\n- training the net for some epochs\n- validating the net after each epoch\n- saving snapshots at certain points during training\n- saving best snapshopt after training has completed\n\nWhen finished (this might take a looooong time), you can find the best model in the `snapshot/` directory named after the run name specified in the `config.py`.\n\n\u003ci\u003e\u003cb\u003eNote:\u003c/b\u003e If you run out of GPU memory, you should consider lowering the batch size and/or input size of the net, or dial down on the parameter count of the net (But hey: Who wants to do that?).\u003c/i\u003e\n\n### Testing\n\nWe already created a local validation split with `sort_data.py`. We now make use of those unseen recordings and assess the performance of the best snapshot from training (e.g. `TEST_MODEL = 'BirdCLEF_TUC_CLO_EXAMPLE_model_epoch_50.pkl'`). \n\nTesting includes the spectrogram extraction for each test recording (specify how many specs to use with `MAX_SPECS_PER_FILE`) and the prediction of class scores for each segment. Finally, we calculate the global score for the entire recording by pooling individual scores of all specs. We use \u003cb\u003eMedian Filtered Pooling\u003c/b\u003e for that - you can change the pooling strategy in the `test.py` by adjusting this lines: \n\n```\nrow_median = np.median(p, axis=1, keepdims=True)\np[p \u003c row_median * 1.5] = 0.0\np_pool = np.mean((p * 2) ** 2, axis=0)\np_pool -= p_pool.min()\nif p_pool.max() \u003e 1.0:\n    p_pool /= p_pool.max()\n```\n\nThe local validation split from our baseline approach contains 4399 recordings - 10% of the entire training set but at least one recording per species. The metric we use is called \u003cb\u003eMean Label Ranking Average Precision\u003c/b\u003e (MLRAP) and our best net scores a MLRAP of 0.612 including background species (`TEST_WITH_BG_SPECIES = True`).\n\n\u003cb\u003eThe results are competitive, but still - there is a lot of room for improvements :)\u003c/b\u003e\n\n### Evaluation\n\nIf you want to experiment with the system and evaluate different settings or CNN layouts, you can simply change some values in the `config.py` and run the script `evaluate.py`. This will automatically run the training, save a snapshot and test the trained model using the local validation split. All you have to do is sit and wait for a couple of hours :)\n\n### Model Distillation\n\nWe support model distillation which allows to compress ('distill') large models or ensembles into smaller models with less parameters. All you need to do is to define a `TEACHER` model (or a list of models) and we will use the teacher predictions as ground truth during training instead of the binary, one-hot targets.\n\n### Submission\n\nYou can use our code to build a valid BirdCLEF sumbission for bith tasks - monophone and soundscape. Use the script `submission_monophone.py` and `submission_soundscape.py` after training. You need to specify one or more `TEST_MODELS` and you have to adjust the `TESTSET_PATH` and change it to the individual monophonic and soundscape test paths.\n\n\u003ci\u003e\u003cb\u003eNote:\u003c/b\u003e You will need to download the test data available from \u003cb\u003ecrowdai.org\u003c/b\u003e, first.\u003c/i\u003e\n\n## Installation Details\n\nThe versions that you need for your machine differ, depending on OS and GPU. The installation process listed below should work with Ubuntu 16.04 LTS and any CUDA-capable GPU by NVIDIA.\n\nFirst of all, you should update your system:\n\n```\nsudo apt-get update\nsudo apt-get upgrade\n```\n\n### CUDA\n\nDownload CUDA 9.1 (you might want to use newer versions, if available): \n\nhttps://developer.nvidia.com/cuda-downloads?target_os=Linux\u0026target_arch=x86_64\u0026target_distro=Ubuntu\u0026target_version=1604\u0026target_type=debnetwork\n\nInstall CUDA:\n\n```\nsudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb\nsudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub\nsudo apt-get update\nsudo apt-get install cuda\n```\n\nAdd the paths to the .bashrc:\n\n```\nPATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}\nLD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}\n```\n\n\u003ci\u003e\u003cb\u003eNote:\u003c/b\u003e You should be able to run `nvidia-smi` as command and see some details about your GPU. If not, the proper drivers are missing. You can install the drivers for your GPU with e.g. `sudo apt-get install nvidia-390`.\u003c/i\u003e\n\n### cuDNN\n\nDownload cuDNN (you need to be registered):\n\nhttps://developer.nvidia.com/cudnn\n\nInstalling from a Tar File:\n\nNavigate to your \u003ccudnnpath\u003e directory containing the cuDNN Tar file.\nUnzip the cuDNN package.\n\n```\ntar -xzvf cudnn-9.0-linux-x64-v7.tgz\n```\n\nCopy the following files into the CUDA Toolkit directory.\n\n```\nsudo cp cuda/include/cudnn.h /usr/local/cuda/include\nsudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64\nsudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*\n```\n\n### Theano\n\nPrerequisites (incl. Python):\n\n```\nsudo apt-get install python-dev python-pip libblas-dev liblapack-dev cmake\nsudo pip install numpy, scipy, cython\n```\nInstall gpuarray:\n\nhttp://deeplearning.net/software/libgpuarray/installation.html\n\n```\ngit clone https://github.com/Theano/libgpuarray.git\ncd libgpuarray\nmkdir Build\ncd Build\ncmake .. -DCMAKE_BUILD_TYPE=Release # or Debug if you are investigating a crash\nmake\nsudo make install\ncd ..\n\nsudo python setup.py build\nsudo python setup.py install\n\nsudo ldconfig\n```\n\nInstall Theano:\n\n```\ngit clone git://github.com/Theano/Theano.git\ncd Theano\nsudo pip install -e .\n```\n\n### .theanorc\n\nAdjust .theanorc in your home directory to select a GPU and fix random seeds:\n\n```\n[global]\ndevice=cuda\nfloatX=float32\noptimizer_excluding=low_memory\n\n[mode]=FAST_RUN\n\n[dnn.conv]\nalgo_bwd_filter=deterministic\nalgo_bwd_data=deterministic\n\n[gpuarray]\npreallocate=0\n```\n\n### Lasagne\n\nClone the repository and install Lasagne:\n\n```\nsudo pip install https://github.com/Lasagne/Lasagne/archive/master.zip\n```\n\n### OpenCV\n\nWe use OpenCV for image processing; you can install the cv2 package for Python running this command:\n\n```\nsudo apt-get install python-opencv\n```\n\n### Libav\n\nThe audio processing library Librosa uses the Libav tools:\n\n```\nsudo apt-get install libav-tools\n```\n\nIf you have trouble with some of the installation steps, you can open an issue or contact us. Thenao and Lasagne offer comprehensive installation guides, too - you should consult them for further information.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkahst%2Fbirdclef-baseline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkahst%2Fbirdclef-baseline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkahst%2Fbirdclef-baseline/lists"}