{"id":18829821,"url":"https://github.com/gojibjib/voice-grabber","last_synced_at":"2026-05-10T05:56:26.323Z","repository":{"id":198259788,"uuid":"131483262","full_name":"gojibjib/voice-grabber","owner":"gojibjib","description":"Collection of scripts to gather training (meta) data for the ML model","archived":false,"fork":false,"pushed_at":"2018-09-19T18:18:12.000Z","size":3421,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-30T06:27:17.484Z","etag":null,"topics":["dataprocessing","dataset","europeana","golang"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gojibjib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-29T09:56:02.000Z","updated_at":"2020-05-11T16:08:05.000Z","dependencies_parsed_at":"2023-10-05T03:40:56.003Z","dependency_job_id":"c84ebb24-476e-4c57-8c8c-35d3bf100a8c","html_url":"https://github.com/gojibjib/voice-grabber","commit_stats":null,"previous_names":["gojibjib/voice-grabber"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gojibjib%2Fvoice-grabber","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gojibjib%2Fvoice-grabber/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gojibjib%2Fvoice-grabber/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gojibjib%2Fvoice-grabber/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gojibjib","download_url":"https://codeload.github.com/gojibjib/voice-grabber/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239765465,"owners_count":19693141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataprocessing","dataset","europeana","golang"],"created_at":"2024-11-08T01:46:40.626Z","updated_at":"2026-01-23T20:30:15.233Z","avatar_url":"https://github.com/gojibjib.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [voice-grabber](https://github.com/gojibjib/voice-grabber)\nThis repo is a collection of scripts to download the dataset necessary to train the [jibjib-model](https://github.com/gojibjib/jibjib-model)\n\n## Repo layout\nThe complete list of JibJib repos is:\n\n- [jibjib](https://github.com/gojibjib/jibjib): Our Android app. Records sounds and looks fantastic.\n- [deploy](https://github.com/gojibjib/deploy): Instructions to deploy the JibJib stack.\n- [jibjib-model](https://github.com/gojibjib/jibjib-model): Code for training the machine learning model for bird classification\n- [jibjib-api](https://github.com/gojibjib/jibjib-api): Main API to receive database requests \u0026 audio files.\n- [jibjib-data](https://github.com/gojibjib/jibjib-data): A MongoDB instance holding information about detectable birds.\n- [jibjib-query](https://github.com/gojibjib/jibjib-query): A thin Python Flask API that handles communication with the [TensorFlow Serving](https://www.tensorflow.org/serving/) instance.\n- [gopeana](https://github.com/gojibjib/gopeana): A API client for [Europeana](https://europeana.eu), written in Go.\n- [voice-grabber](https://github.com/gojibjib/voice-grabber): A collection of scripts to construct the dataset required for model training\n\n## Scripts\nIn the top level of this repo, there are several helper scripts to create/change JSON and CSV files, as well as `converter.py` to convert audio files from `mp3` to `wav`.\n\n### [data_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/data_grabber)\nThis Go script uses [gopeana](https://github.com/gojibjib/gopeana) to populate both a JSON and CSV file with information about the on Europeana published bird voices from the [Tierstimmenarchiv](www.tierstimmenarchiv.de) ([open dataset](https://www.europeana.eu/portal/de/search?f[REUSABILITY][]=open\u0026q=tierstimmenarchiv) of the [Museum für Naturkunde Berlin](https://www.museumfuernaturkunde.berlin/))\n\n### [file_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/file_grabber)\nThis Go script uses the output of [data_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/data_grabber) to follow the links provided on Europeana and download the audio files.\n\n### [wiki_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/wiki_grabber)\nThis Python script takes input from a CSV file and uses the Wikipedia API to extract summaries about birds, then saves it in a seperate CSV.\n\n### [xeno_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/xeno_grabber)\nThis is a collection of scripts to:\n\n- clean the files directory (in our case, in order to bring down the total number of classes, birds with a German Wikipedia entry were used.)\n- nicely crawl [Xeno Canto](www.xeno-canto.org) for audio files of birds\n- download the audio files from Xeno Canto\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgojibjib%2Fvoice-grabber","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgojibjib%2Fvoice-grabber","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgojibjib%2Fvoice-grabber/lists"}