Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tandembank/data-science.dataset-labeller
Web-based tool for labelling datasets
https://github.com/tandembank/data-science.dataset-labeller
datascience dataset-generation datasets django javascript python react
Last synced: about 2 months ago
JSON representation
Web-based tool for labelling datasets
- Host: GitHub
- URL: https://github.com/tandembank/data-science.dataset-labeller
- Owner: tandembank
- License: mit
- Created: 2018-10-23T16:22:35.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-24T17:44:26.000Z (over 5 years ago)
- Last Synced: 2024-07-09T15:56:55.035Z (2 months ago)
- Topics: datascience, dataset-generation, datasets, django, javascript, python, react
- Language: JavaScript
- Size: 414 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dataset labeller
[![](https://img.shields.io/travis/tandembank/data-science.dataset-labeller.svg)](https://travis-ci.org/tandembank/data-science.dataset-labeller) [![](https://img.shields.io/codecov/c/gh/tandembank/data-science.dataset-labeller.svg)](https://codecov.io/gh/tandembank/data-science.dataset-labeller) [![](https://img.shields.io/github/license/tandembank/data-science.dataset-labeller.svg)](https://github.com/tandembank/data-science.dataset-labeller/blob/master/LICENSE)
This is a web-based tool we developed to label datasets quickly at [Tandem](https://tandem.co.uk). It is based on Python, Django, React and run through Docker Compose.
![Creating a new dataset and starting to label](https://epixstudios.co.uk/uploads/filer_public/7f/62/7f62f0ad-9cf3-47ba-9ad6-79282f456c7f/dataset_labeller_demo.gif)
## Installation and Running
The easiest way to get this application running is via [Docker Compose](https://docs.docker.com/compose/install/). Once you have this working, run the following commands to install.
git clone https://github.com/tandembank/data-science.dataset-labeller.git
cd data-science.dataset-labeller
cp docker-compose.example.yml docker-compose.yml
docker-compose buildYou should now have the Docker image built. To run it, along with it's database server, run this from the same loaction:
docker-compose up
After a few seconds you should be able to access it via http://localhost:8080/ in your browser.
## Usage
This is the process for labelling a new dataset:
1. Upload a CSV file containing rows that you want to label.
2. Give it a name.
3. Select the columns that should be displayed to a person labelling.
2. Define the possible category labels and keyboard shortcuts to make things faster.
3. Decide how many people need to label each row datapoint – this is useful if you want to get a consensus.
4. Save dataset.
5. Get your team to login and label it.
6. View job progress on the dashboard.
7. Download the labelled dataset as a CSV – it'll have an extra column with the labels## Features
* Import and export data in the format that you're comfortable with – no need to pre-process data, just select the columns to display for labelling.
* Each user has their own account so you can see who labelled what.
* Labellers can access the tool remotely or within a corporate network using just their web browser.
* Slick and quick user interface while you're labelling – the next few datapoints are already loaded in your browsers so they're ready to show as soon as you've labelled the current one.
* Multiple users can be labelling at once as we use locks to avoid collisions.
* If some datapoints are tricky to label or your team are going at break-neck speed you can choose to get a consensus from an odd number of users, say 3 or 5.
* Database included and configured in the Docker Compose file.
* Cell content such as JSON lists gets displayed nicely formatted. We aim to extend this to identify other formats and image URLs.