https://github.com/zenml-io/awesome-open-data-annotation
Open Source Data Annotation & Labeling Tools
https://github.com/zenml-io/awesome-open-data-annotation
List: awesome-open-data-annotation
ai annotation datacentric labelled-data labelling machine-learning mlops
Last synced: about 1 month ago
JSON representation
Open Source Data Annotation & Labeling Tools
- Host: GitHub
- URL: https://github.com/zenml-io/awesome-open-data-annotation
- Owner: zenml-io
- License: mit
- Created: 2022-06-05T14:12:37.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-31T09:49:16.000Z (4 months ago)
- Last Synced: 2025-03-01T15:41:19.954Z (about 2 months ago)
- Topics: ai, annotation, datacentric, labelled-data, labelling, machine-learning, mlops
- Homepage:
- Size: 102 KB
- Stars: 554
- Watchers: 14
- Forks: 45
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-open-data-centric-ai - awesome list - io) on that topic. (Monitoring)
README
# 🏷 Open Source Data Annotation & Labeling Tools
[](https://github.com/zenml-io/zenml)
At [ZenML](https://github.com/zenml-io/zenml) we believe that annotation and labeling workflows are a core part of
the machine learning lifecycle. As an open-source tool, we wanted to highlight
and recognize the variety of tools that are available to help your workflows
become more data-centric. We had three core criteria to decide whether a
particular tool could make it into the list:- The tool has an open-source licence.
- The tool is actively maintained.
- The tool is functional and fit for purpose.We welcome contributions to this list, so if you know of a tool that
we've missed or if you've built one yourself, please do create a PR!🔥 **Do you use these tools or do you want to add one to your MLOps stack?** At
ZenML, we are looking for design partnerships and collaboration to develop the
integrations and workflows around using annotation within the MLOps lifecycle.
If you'd like to learn more, please [join our
Slack](https://zenml.io/slack-invite/) and leave us a message!## Contents
- [Multi Modal / Multi Domain](#multi-modal-multi-domain)
- [Text](#text)
- [Images](#images)
- [Audio](#audio)
- [Video](#video)
- [Time Series](#time-series)
- [Other](#other)# Multi Modal / Multi Domain
| Name | Description | License |
| ---- | ----------- | ------- |
| [Acharya](https://github.com/astutic/Acharya) | A Data Centric MLOps tool for your Named Entity Recognition projects  | ? |
| [Adala](https://github.com/HumanSignal/Adala) | An Autonomous Data (Labeling) Agent framework.  | Apache-2 |
| [Classifai](https://github.com/CertifaiAI/classifai) | A comprehensive open-source data annotation platform  | Apache-2 |
| [Computer Vision Annotation Tool (CVAT)](https://github.com/openvinotoolkit/cvat) | A free, online, interactive video and image annotation tool for computer vision  | MIT |
| [Data Annotator for Machine Learning (DAML)](https://github.com/vmware/data-annotator-for-machine-learning) | An application that helps machine learning teams facilitating the creation and management of annotations  | Apache-2 |
| [DataGym](https://github.com/datagym-ai/datagym-core) | Open source annotation and labeling tool for image and video assets  | MIT |
| [Diffgram](https://github.com/diffgram/diffgram) | Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale  | ELv2 |
| [Hover](https://github.com/phurwicz/hover) | Explore and label on a map of raw data. Handles text, audio and images.  | MIT |
| [Label Studio](https://github.com/heartexlabs/label-studio) | A multi-type data labeling and annotation tool with standardized output format  | Apache-2 |
| [Pigeon](https://github.com/agermanidis/pigeon) | A simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook  | Apache-2 |
| [QSL: Quick and Simple Labeler](https://github.com/faustomorales/qsl) | A quick and simple tool for labeling images, videos and time series data, right from Jupyter  | MIT |
| [Shoonya](https://github.com/AI4Bharat/Shoonya) | Platform to Annotate and label data at scale  | MIT |
| [Tator](https://github.com/cvisionai/tator) | Video analytics web platform  | AGPL-3 |
| [TornadoAi](https://github.com/slrbl/human-in-the-loop-machine-learning-tool-tornado) | A human-in-the-loop machine learning framework  | AGPL-3 |
| [Universal Data Tool](https://github.com/UniversalDataTool/universal-data-tool) | A web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard  | MIT |
| [VGG Image Annotator (VIA)](https://gitlab.com/vgg/via) | A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers | BSD-2 |
| [VIAME](https://github.com/VIAME/VIAME) | Video and Image Analytics for Multiple Environments  | Custom |
| [Xtreme1](https://github.com/xtreme1-io/xtreme1) | An all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM  | Apache-2 |# Text
| Name | Description | License |
| ---- | ----------- | ------- |
| [Annotation Lab](https://nlp.johnsnowlabs.com/docs/en/alab/quickstart) | An NLP annotation tool included in `spark-nlp`  | Apache-2 |
| [Argilla](https://github.com/argilla-io/argilla) | A production-ready Python framework for exploring, annotating, and managing data in NLP projects  | Apache-2 |
| [bulk](https://github.com/koaning/bulk) | Bulk is a quick developer tool to apply some bulk labels | MIT |
| [CoreNLP](https://github.com/stanfordnlp/CoreNLP) | A Java suite of core NLP tools  | GPL-3 |
| [DataQA](https://github.com/dataqa/nlp-labelling) | Labeling platform for text using weak supervision  | GPL-3 |
| [doccano](https://github.com/doccano/doccano) | An open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks  | MIT |
| [FLAT - FoLiA Linguistic Annotation Tool](https://github.com/proycon/flat) | A web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation  | GPL-3 |
| [INCEpTION](https://github.com/inception-project/inception) | A semantic annotation platform offering intelligent annotation assistance and knowledge management  | Apache-2 |
| [knodle](https://github.com/knodle/knodle) | Knodle (Knowledge-supervised Deep Learning Framework) | Apache-2 |
| [Markup](https://github.com/samueldobbie/markup) | A web-based document annotation tool, powered by GPT-4 | Unknown |
| [NER Annotator for Spacy](https://github.com/tecoholic/ner-annotator) | NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags.  | MIT |
| [NPLM](https://github.com/BatsResearch/nplm) | Noisy Partial Label Model(NPLM) | N/A |
| [Potato](https://github.com/davidjurgens/potato) | An annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing  | PolyForm Shield |
| [refinery](https://github.com/code-kern-ai/refinery) | The data scientist's open-source choice to scale, assess and maintain natural language data.  | Apache-2 |
| [Slate](https://github.com/jkkummerfeld/slate) | A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python  | ISC |
| [SMART](https://github.com/RTIInternational/SMART) | A tool for building labeled training datasets for supervised machine learning tasks in NLP  | MIT |
| [SpaCy annotator](https://github.com/ieriii/spacy-annotator) | Spacy NER annotator using ipywidgets  | N/A |
| [Small-Text](https://github.com/webis-de/small-text) | Active Learning for Text Classification  | MIT |
| [Snorkel](https://github.com/snorkel-team/snorkel) | Programmatically Build and Manage Training Data  | Apache-2 |
| [skweak](https://github.com/NorskRegnesentral/skweak) | skweak: Weak supervision for NLP | MIT |
| [TALEN](https://github.com/CogComp/talen) | A way to do annotations for NER  | Custom |
| [Theme](https://github.com/Oxid15/theme) | Minimalistic CLI labeling tool for text classification  | MIT |
| [YEDDA](https://github.com/jiesutd/YEDDA) | A lightweight collaborative text span annotation tool  | Apache-2 |
| [WeaSEL](https://github.com/autonlab/weasel) | WeaSEL: Weakly Supervised End-to-end Learning | Apache-2 |# Images
| Name | Description | License |
| ---- | ----------- | ------- |
| [3D Slicer](https://www.slicer.org) | Visualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes  | BSD |
| [Annotate Lab](https://github.com/sumn2u/annotate-lab) | Simplifying Image Annotation  | MIT |
| [Annotorious](https://github.com/recogito/annotorious) | A JavaScript library for image annotation  | BSD-3 |
| [AnyLabeling](https://github.com/vietanhdev/anylabeling) | Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything, MobileSAM  | GPL-3 |
| [autodistill](https://github.com/autodistill/autodistill) | Images to inference with no labeling (use foundation models to train supervised models)  | Apache-2 |
| [bbox-visualizer](https://github.com/shoumikchow/bbox-visualizer) | Make drawing and labeling bounding boxes easy as cake  | MIT |
| [Bounding Box Editor](https://github.com/mfl28/BoundingBoxEditor) | A JavaFX desktop application for creating image-object-annotations with bounding boxes  | GPL-3 |
| [CATMAID](https://github.com/catmaid/CATMAID) | The Collaborative Annotation Toolkit for Massive Amounts of Image Data  | GPL-3 |
| [COCO Annotator](https://github.com/jsbroks/coco-annotator) | A web-based image segmentation tool for object detection, localization, and keypoints  | MIT |
| [DeepLabel](https://github.com/jveitchmichaelis/deeplabel) | A cross-platform desktop image annotation tool for machine learning  | MIT |
| [Etichetta](https://github.com/trikko/etichetta) | A YOLO annotator, for human beings  | MIT |
| [ilastik](https://github.com/ilastik/ilastik) | Segment, classify, track and count your cells or other experimental data  | Custom |
| [ImageTagger](https://github.com/bit-bots/imagetagger) | An open source online platform for collaborative image labeling  | MIT |
| [imglab](https://github.com/NaturalIntelligence/imglab) | A web based tool to label images for objects that can be used to train dlib or other object detectors  | MIT |
| [KNOSSOS](https://github.com/knossos-project/knossos) | A software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity  | GPL-2 |
| [labelCloud](https://github.com/ch-sa/labelCloud) | A lightweight tool for labeling 3D bounding boxes in point clouds  | GPL-3 |
| [LabelFlow](https://github.com/labelflow/labelflow) | An open platform for image labeling  | Custom |
| [labelme](https://github.com/wkentaro/labelme) | Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation)  | Custom |
| [LabelImg](https://github.com/tzutalin/labelImg) | A graphical image annotation tool and label object bounding boxes in images  | MIT |
| [LOST](https://github.com/l3p-cv/lost) | A flexible web-based framework for semi-automatic image annotation  | MIT |
| [Make Sense](https://github.com/SkalskiP/make-sense) | A free-to-use online tool for labeling photos  | GPL-3 |
| [MyVision](https://github.com/OvidijusParsiunas/myvision) | Computer vision based ML training data generation tool  | GPL-3 |
| [OHIF Medical Imaging Viewer](https://github.com/OHIF/Viewers) | OHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker  | MIT |
| [OpenLabeler](https://github.com/kinhong/OpenLabeler) | An open source desktop application for annotating objects for AI appplications  | Apache-2 |
| [Pixano](https://github.com/pixano/pixano-app) | A web-based smart-annotation tool for computer vision applications  | CeCILL-C |
| [Scalabel](https://github.com/scalabel/scalabel) | A web-based visual data annotation tool, supporting both 2D and 3D data labeling  | Apache-2 |
| [webKnossos](https://github.com/scalableminds/webknossos) | A fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics  | AGPL-3 |
| [Yolo_Label](https://github.com/developer0hye/Yolo_Label) | GUI for marking bounded boxes of objects in images for training neural network YOLO  | MIT |# Video
| Name | Description | License |
| ---- | ----------- | ------- |
| [DIVE](https://github.com/Kitware/dive) | Media annotation and analysis tools for web and desktop  | Apache-2 |
| [UltimateLabeling](https://github.com/alexandre01/UltimateLabeling) | A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker  | MIT |# Audio
| Name | Description | License |
| ---- | ----------- | ------- |
| [aubio](https://aubio.org) | A library for audio and music analysis  | GPL-3 |
| [audino](https://github.com/midas-research/audino) | Open source audio annotation tool  | MIT |
| [Praat](https://github.com/praat/praat) | Annotation tool for phonetics analysis  | GPL-3 |
| [Peaks.js](https://github.com/bbc/peaks.js) | JavaScript UI component for interacting with audio waveforms  | LGPL-3 |
| [Wavesurfer.js](https://github.com/katspaugh/wavesurfer.js) | Navigable waveform built on Web Audio and Canvas  | BSD-3 |# Time Series
| Name | Description | License |
| ---- | ----------- | ------- |
| [sktime](https://github.com/sktime/sktime) | A framework for machine learning with time series  | BSD-3 |# Other
| Name | Description | License |
| ---- | ----------- | ------- |
| [Compose](https://github.com/alteryx/compose) | Automated prediction engineering. Allows you to easily structure prediction problems and generate labels for supervised learning  | BSD-3 |
| [Encord Active](https://github.com/encord-team/encord-active/) | Toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling  | Apache-2 |
| [Label App](https://github.com/javserjod/label-app) | Application designed to assist in manually editing, visualizing and labelling your moderate-sized datasets  | Apache-2 |
| [NeuroTrALE](https://github.com/mit-ll/NeuroTrALE-data-manager) | Annotation software for brain mapping, supporting 3D imaging and annotation  | BSD-2 |
| [OpenCRAVAT](https://github.com/KarchinLab/open-cravat) | A modular annotation tool for genomic variants  | MIT |
| [PatchSorter](https://github.com/choosehappy/PatchSorter) | An open-source digital pathology tool for histologic object labeling  | BSD-3 |
| [Personal Cancer Genome Reporter (PCGR)](https://github.com/sigven/pcgr) | A stand-alone software package for translation of individual tumor genomes for precision cancer medicine  | MIT |
| [Quepid](https://github.com/o19s/quepid) | Gather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm.  | Apache-2 |# Acknowledgements
Thanks to the creators of
[these](https://github.com/jsbroks/awesome-dataset-tools)
[other](https://github.com/doccano/awesome-annotation-tools)
[repositories](https://github.com/taivop/awesome-data-annotation) (and [this
one](https://github.com/heartexlabs/awesome-data-labeling)!) for getting us
going down the path of creating our own. I used these efforts to get started in
my survey of the space before adding, updating and pruning as per the
open-source and other criteria specified above.