
An open API service indexing awesome lists of open source software.

Open Source Data Annotation & Labeling Tools

List: awesome-open-data-annotation

ai annotation datacentric labelled-data labelling machine-learning mlops

Last synced: about 1 month ago
JSON representation

Open Source Data Annotation & Labeling Tools




# 🏷 Open Source Data Annotation & Labeling Tools


At [ZenML]( we believe that annotation and labeling workflows are a core part of
the machine learning lifecycle. As an open-source tool, we wanted to highlight
and recognize the variety of tools that are available to help your workflows
become more data-centric. We had three core criteria to decide whether a
particular tool could make it into the list:

- The tool has an open-source licence.
- The tool is actively maintained.
- The tool is functional and fit for purpose.

We welcome contributions to this list, so if you know of a tool that
we've missed or if you've built one yourself, please do create a PR!

🔥 **Do you use these tools or do you want to add one to your MLOps stack?** At
ZenML, we are looking for design partnerships and collaboration to develop the
integrations and workflows around using annotation within the MLOps lifecycle.
If you'd like to learn more, please [join our
Slack]( and leave us a message!

## Contents

- [Multi Modal / Multi Domain](#multi-modal-multi-domain)
- [Text](#text)
- [Images](#images)
- [Audio](#audio)
- [Video](#video)
- [Time Series](#time-series)
- [Other](#other)

# Multi Modal / Multi Domain

| Name | Description | License |
| ---- | ----------- | ------- |
| [Acharya]( | A Data Centric MLOps tool for your Named Entity Recognition projects ![]( | ? |
| [Adala]( | An Autonomous Data (Labeling) Agent framework. ![]( | Apache-2 |
| [Classifai]( | A comprehensive open-source data annotation platform ![]( | Apache-2 |
| [Computer Vision Annotation Tool (CVAT)]( | A free, online, interactive video and image annotation tool for computer vision ![]( | MIT |
| [Data Annotator for Machine Learning (DAML)]( | An application that helps machine learning teams facilitating the creation and management of annotations ![]( | Apache-2 |
| [DataGym]( | Open source annotation and labeling tool for image and video assets ![]( | MIT |
| [Diffgram]( | Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale ![]( | ELv2 |
| [Hover]( | Explore and label on a map of raw data. Handles text, audio and images. ![]( | MIT |
| [Label Studio]( | A multi-type data labeling and annotation tool with standardized output format ![]( | Apache-2 |
| [Pigeon]( | A simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook ![]( | Apache-2 |
| [Tator]( | Video analytics web platform ![]( | AGPL-3 |
| [TornadoAi]( | A human-in-the-loop machine learning framework ![]( | AGPL-3 |
| [Universal Data Tool]( | A web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard ![]( | MIT |
| [VGG Image Annotator (VIA)]( | A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers | BSD-2 |
| [VIAME]( | Video and Image Analytics for Multiple Environments ![]( | Custom |

# Text

| Name | Description | License |
| ---- | ----------- | ------- |
| [Annotation Lab]( | An NLP annotation tool included in `spark-nlp` ![]( | Apache-2 |
| [Argilla]( | A production-ready Python framework for exploring, annotating, and managing data in NLP projects ![]( | Apache-2 |
| [bulk]( | Bulk is a quick developer tool to apply some bulk labels ![](| MIT |
| [CoreNLP]( | A Java suite of core NLP tools ![]( | GPL-3 |
| [DataQA]( | Labeling platform for text using weak supervision ![]( | GPL-3 |
| [doccano]( | An open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks ![]( | MIT |
| [FLAT - FoLiA Linguistic Annotation Tool]( | A web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation ![]( | GPL-3 |
| [INCEpTION]( | A semantic annotation platform offering intelligent annotation assistance and knowledge management ![]( | Apache-2 |
| [knodle]( | Knodle (Knowledge-supervised Deep Learning Framework) ![](| Apache-2 |
| [NER Annotator for Spacy]( | NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags. ![]( | MIT |
| [NPLM]( | Noisy Partial Label Model(NPLM) ![](| N/A |
| [Potato]( | An annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing ![]( | PolyForm Shield |
| [refinery]( | The data scientist's open-source choice to scale, assess and maintain natural language data. ![]( | Apache-2 |
| [SMART]( | A tool for building labeled training datasets for supervised machine learning tasks in NLP ![]( | MIT |
| [SpaCy annotator]( | Spacy NER annotator using ipywidgets ![]( | N/A |
| [Small-Text]( | Active Learning for Text Classification ![]( | MIT |
| [Snorkel]( | Programmatically Build and Manage Training Data ![]( | Apache-2 |
| [skweak]( | skweak: Weak supervision for NLP ![](| MIT |
| [TALEN]( | A way to do annotations for NER ![]( | Custom |
| [YEDDA]( | A lightweight collaborative text span annotation tool ![]( | Apache-2 |
| [WeaSEL]( | WeaSEL: Weakly Supervised End-to-end Learning ![](| Apache-2 |

# Images

| Name | Description | License |
| ---- | ----------- | ------- |
| [3D Slicer]( | Visualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes ![]( | BSD |
| [Annotorious]( | A JavaScript library for image annotation ![]( | BSD-3 |
| [CATMAID]( | The Collaborative Annotation Toolkit for Massive Amounts of Image Data ![]( | GPL-3 |
| [COCO Annotator]( | A web-based image segmentation tool for object detection, localization, and keypoints ![]( | MIT |
| [DeepLabel]( | A cross-platform desktop image annotation tool for machine learning ![]( | MIT |
| [ilastik]( | Segment, classify, track and count your cells or other experimental data ![]( | Custom |
| [ImageTagger]( | An open source online platform for collaborative image labeling ![]( | MIT |
| [imglab]( | A web based tool to label images for objects that can be used to train dlib or other object detectors ![]( | MIT |
| [KNOSSOS]( | A software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity ![]( | GPL-2 |
| [LabelFlow]( | An open platform for image labeling ![]( | Custom |
| [labelme]( | Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation) ![]( | Custom |
| [LabelImg]( | A graphical image annotation tool and label object bounding boxes in images ![]( | MIT |
| [LOST]( | A flexible web-based framework for semi-automatic image annotation ![]( | MIT |
| [Make Sense]( | A free-to-use online tool for labeling photos ![]( | GPL-3 |
| [OHIF Medical Imaging Viewer]( | OHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker ![]( | MIT |
| [OpenLabeler]( | An open source desktop application for annotating objects for AI appplications ![]( | Apache-2 |
| [Pixano]( | A web-based smart-annotation tool for computer vision applications ![]( | CeCILL-C |
| [Scalabel]( | A web-based visual data annotation tool, supporting both 2D and 3D data labeling ![]( | Apache-2 |
| [webKnossos]( | A fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics ![]( | AGPL-3 |

# Video

| Name | Description | License |
| ---- | ----------- | ------- |
| [DIVE]( | Media annotation and analysis tools for web and desktop ![]( | Apache-2 |

# Audio

| Name | Description | License |
| ---- | ----------- | ------- |
| [aubio]( | A library for audio and music analysis ![]( | GPL-3 |
| [audino]( | Open source audio annotation tool ![]( | MIT |
| [Praat]( | Annotation tool for phonetics analysis ![]( | GPL-3 |
| [Peaks.js]( | JavaScript UI component for interacting with audio waveforms ![]( | LGPL-3 |
| [Wavesurfer.js]( | Navigable waveform built on Web Audio and Canvas ![]( | BSD-3 |

# Time Series

| Name | Description | License |
| ---- | ----------- | ------- |
| [sktime]( | A framework for machine learning with time series ![]( | BSD-3 |

# Other

| Name | Description | License |
| ---- | ----------- | ------- |
| [Encord Active]( | Toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling ![]( | Apache-2 |
| [OpenCRAVAT]( | A modular annotation tool for genomic variants ![]( | MIT |
| [Personal Cancer Genome Reporter (PCGR)]( | A stand-alone software package for translation of individual tumor genomes for precision cancer medicine ![]( | MIT |
| [Quepid]( | Gather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm. ![]( | Apache-2 |

# Acknowledgements

Thanks to the creators of
[repositories]( (and [this
one](!) for getting us
going down the path of creating our own. I used these efforts to get started in
my survey of the space before adding, updating and pruning as per the
open-source and other criteria specified above.