https://github.com/gojibjib/voice-grabber

Collection of scripts to gather training (meta) data for the ML model
https://github.com/gojibjib/voice-grabber

dataprocessing dataset europeana golang

Last synced: about 1 month ago
JSON representation

Collection of scripts to gather training (meta) data for the ML model

Host: GitHub
URL: https://github.com/gojibjib/voice-grabber
Owner: gojibjib
License: gpl-3.0
Created: 2018-04-29T09:56:02.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2018-09-19T18:18:12.000Z (over 7 years ago)
Last Synced: 2024-12-30T06:27:17.484Z (over 1 year ago)
Topics: dataprocessing, dataset, europeana, golang
Language: Python
Homepage:
Size: 3.26 MB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # [voice-grabber](https://github.com/gojibjib/voice-grabber)

This repo is a collection of scripts to download the dataset necessary to train the [jibjib-model](https://github.com/gojibjib/jibjib-model)

## Repo layout

The complete list of JibJib repos is:

- [jibjib](https://github.com/gojibjib/jibjib): Our Android app. Records sounds and looks fantastic.

- [deploy](https://github.com/gojibjib/deploy): Instructions to deploy the JibJib stack.

- [jibjib-model](https://github.com/gojibjib/jibjib-model): Code for training the machine learning model for bird classification

- [jibjib-api](https://github.com/gojibjib/jibjib-api): Main API to receive database requests & audio files.

- [jibjib-data](https://github.com/gojibjib/jibjib-data): A MongoDB instance holding information about detectable birds.

- [jibjib-query](https://github.com/gojibjib/jibjib-query): A thin Python Flask API that handles communication with the [TensorFlow Serving](https://www.tensorflow.org/serving/) instance.

- [gopeana](https://github.com/gojibjib/gopeana): A API client for [Europeana](https://europeana.eu), written in Go.

- [voice-grabber](https://github.com/gojibjib/voice-grabber): A collection of scripts to construct the dataset required for model training

## Scripts

In the top level of this repo, there are several helper scripts to create/change JSON and CSV files, as well as `converter.py` to convert audio files from `mp3` to `wav`.

### [data_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/data_grabber)

This Go script uses [gopeana](https://github.com/gojibjib/gopeana) to populate both a JSON and CSV file with information about the on Europeana published bird voices from the [Tierstimmenarchiv](www.tierstimmenarchiv.de) ([open dataset](https://www.europeana.eu/portal/de/search?f[REUSABILITY][]=open&q=tierstimmenarchiv) of the [Museum für Naturkunde Berlin](https://www.museumfuernaturkunde.berlin/))

### [file_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/file_grabber)

This Go script uses the output of [data_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/data_grabber) to follow the links provided on Europeana and download the audio files.

### [wiki_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/wiki_grabber)

This Python script takes input from a CSV file and uses the Wikipedia API to extract summaries about birds, then saves it in a seperate CSV.

### [xeno_grabber/](https://github.com/gojibjib/voice-grabber/tree/master/xeno_grabber)

This is a collection of scripts to:

- clean the files directory (in our case, in order to bring down the total number of classes, birds with a German Wikipedia entry were used.)

- nicely crawl [Xeno Canto](www.xeno-canto.org) for audio files of birds

- download the audio files from Xeno Canto

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gojibjib/voice-grabber

Awesome Lists containing this project

README