{"id":13687704,"url":"https://github.com/explosion/prodigy-recipes","last_synced_at":"2025-04-04T07:06:51.258Z","repository":{"id":33824406,"uuid":"113669374","full_name":"explosion/prodigy-recipes","owner":"explosion","description":"🍳 Recipes for the Prodigy, our fully scriptable annotation tool","archived":false,"fork":false,"pushed_at":"2024-08-04T07:30:50.000Z","size":16000,"stargazers_count":490,"open_issues_count":6,"forks_count":117,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-03-28T06:05:25.357Z","etag":null,"topics":["active-learning","annotation","annotation-tool","artificial-intelligence","computer-vision","data-annotation","data-science","labeling-tool","machine-learning","machine-teaching","natural-language-processing","nlp","prodigy","spacy"],"latest_commit_sha":null,"homepage":"https://prodi.gy","language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/explosion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-09T12:57:34.000Z","updated_at":"2025-03-08T17:42:17.000Z","dependencies_parsed_at":"2023-01-16T22:45:27.990Z","dependency_job_id":"7b65f27c-662d-4c82-8c12-a9d10b304d98","html_url":"https://github.com/explosion/prodigy-recipes","commit_stats":{"total_commits":170,"total_committers":18,"mean_commits":9.444444444444445,"dds":0.7588235294117647,"last_synced_commit":"428bccb5a04ecfe6211049824aaf735a5907861c"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fprodigy-recipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fprodigy-recipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fprodigy-recipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fprodigy-recipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/explosion","download_url":"https://codeload.github.com/explosion/prodigy-recipes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247135144,"owners_count":20889421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","annotation","annotation-tool","artificial-intelligence","computer-vision","data-annotation","data-science","labeling-tool","machine-learning","machine-teaching","natural-language-processing","nlp","prodigy","spacy"],"created_at":"2024-08-02T15:00:59.086Z","updated_at":"2025-04-04T07:06:51.228Z","avatar_url":"https://github.com/explosion.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"\u003ca href=\"https://explosion.ai\"\u003e\u003cimg src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\u003c/a\u003e\n\n# Prodigy Recipes\n\nThis repository contains a collection of recipes for\n[Prodigy](https://prodi.gy), our scriptable annotation tool for text, images and\nother data. In order to use this repo, you'll need a license for Prodigy –\n[see this page](https://prodi.gy/buy) for more details. For questions and bug\nreports, please use the [Prodigy Support Forum](https://support.prodi.gy). If\nyou've found a mistake or bug, feel free to submit a\n[pull request](https://github.com/explosion/prodigy-recipes/pulls).\n\n\u003e ✨ **Important note:** The recipes in this repository aren't 100% identical to\n\u003e the built-in recipes shipped with Prodigy. They've been edited to include\n\u003e comments and more information, and some of them have been simplified to make\n\u003e it easier to follow what's going on, and to use them as the basis for a custom\n\u003e recipe.\n\n## 📋 Usage\n\nOnce Prodigy is installed, you should be able to run the `prodigy` command from\nyour terminal, either directly or via `python -m`:\n\n```bash\npython -m prodigy\n```\n\nThe `prodigy` command lists the built-in recipes. To use a custom recipe script,\nsimply pass the path to the file using the `-F` argument:\n\n```bash\npython -m prodigy ner.teach your_dataset en_core_web_sm ./data.jsonl --label PERSON -F prodigy-recipes/ner/ner_teach.py\n```\n\nYou can also use the `--help` flag for an overview of the available arguments of a recipe, e.g. `prodigy ner.teach -F ner_teach_.py --help`.\n\n### Some things to try\n\nYou can edit the code in the recipe script to customize how Prodigy behaves.\n\n- Try replacing `prefer_uncertain()` with `prefer_high_scores()`.\n- Try writing a custom sorting function. It just needs to be a generator that\n  yields a sequence of `example` dicts, given a sequence of `(score, example)`\n  tuples.\n- Try adding a filter that drops some questions from the stream. For instance,\n  try writing a filter that only asks you questions where the entity is two\n  words long.\n- Try customizing the `update()` callback, to include extra logging or extra\n  functionality.\n\n## 🍳 Recipes\n\n### Named Entity Recognition\n\n| Recipe                                               | Description                                                                                                                                                                                                                                               |\n| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [`ner.teach`](ner/ner_teach.py)                      | Collect the best possible training data for a named entity recognition model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.                                                                      |\n| [`ner.match`](ner/ner_match.py)                      | Suggest phrases that match a given patterns file, and mark whether they are examples of the entity you're interested in. The patterns file can include exact strings or token patterns for use with spaCy's `Matcher`.                                    |\n| [`ner.manual`](ner/ner_manual.py)                    | Mark spans manually by token. Requires only a tokenizer and no entity recognizer, and doesn't do any active learning. Optionally, pre-highlight spans based on patterns.\n| [`ner.fuzzy_manual`](ner/ner_fuzzy_manual.py)                    | Like `ner.manual` but use `FuzzyMatcher` from [`spaczz`](https://github.com/gandersen101/spaczz) library to pre-highlight candidates.   |\n| [`ner.manual.bert`](other/transformers_tokenizers.py) | Use BERT word piece tokenizer for efficient manual NER annotation for transformer models.                                                                                                                                                                 |\n| [`ner.correct`](ner/ner_correct.py)              | Create gold-standard data by correcting a model's predictions manually. This recipe used to be called [`ner.make_gold`](ner/ner_make_gold.py).                                                                                                                                                                                  |\n| [`ner.silver-to-gold`](ner/ner_silver_to_gold.py)    | Take an existing \"silver\" dataset with binary accept/reject annotations, merge the annotations to find the best possible analysis given the constraints defined in the annotations, and manually edit it to create a perfect and complete \"gold\" dataset. |\n| [`ner.eval_ab`](ner/ner_eval_ab.py)    | Evaluate two NER models by comparing their predictions and building an evaluation set from the stream. |\n| [`ner_fuzzy_manual`](ner/ner_fuzzy_manual.py) | Mark spans manually by token with suggestions from [`spaczz fuzzy`](https://spacy.io/universe/project/spaczz) matcher pre-highlighted.\n\n### Text Classification\n\n| Recipe                                                    | Description                                                                                                                                                                                                                                                                                                                         |\n| --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [`textcat.manual`](textcat/textcat_manual.py)             | Manually annotate categories that apply to a text. Supports annotation tasks with single and multiple labels. Multiple labels can optionally be flagged as exclusive.|\n| [`textcat.correct`](textcat/textcat_correct.py)           | Correct the textcat model's predictions manually. Predictions above the acceptance threshold will be automatically preselected (0.5 by default). Prodigy will infer whether the categories should be mutualy exclusive based on the component configuration. |\n| [`textcat.teach`](textcat/textcat_teach.py)               | Collect the best possible training data for a text classification model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.|\n| [`textcat.custom-model`](textcat/textcat_custom_model.py) | Use active learning-powered text classification with a custom model. To demonstrate how it works, this demo recipe uses a simple dummy model that \"predicts\" random scores. But you can swap it out for any model of your choice, for example a text classification model implementation using PyTorch, TensorFlow or scikit-learn. |\n\n### Terminology\n\n| Recipe                                | Description                                                                                                                                                             |\n| ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [`terms.teach`](terms/terms_teach.py) | Bootstrap a terminology list with word vectors and seeds terms. Prodigy will suggest similar terms based on the word vectors, and update the target vector accordingly. |\n\n### Image\n\n| Recipe                                                      | Description                                                                                                                                                                                                                         |\n| ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [`image.manual`](image/image_manual.py)                     | Manually annotate images by drawing rectangular bounding boxes or polygon shapes on the image.                                                                                                                                      |\n| [`image-caption`](image/image_caption/image_caption.py)     | Annotate images with captions, pre-populate captions with image captioning model implemented in PyTorch and perform error analysis.                                                                                                 |\n| [`image.frozenmodel`](image/tf_odapi/image_frozen_model.py) | Model in loop manual annotation using [Tensorflow's Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).                                                                              |\n| [`image.servingmodel`](image/tf_odapi/image_tf_serving.py)  | Model in loop manual annotation using [Tensorflow's Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). This uses [Tensorflow Serving](https://www.tensorflow.org/tfx/guide/serving) |\n| [`image.trainmodel`](image/tf_odapi/image_train.py)         | Model in loop manual annotation and training using [Tensorflow's Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).                                                                 |\n\n### Other\n\n| Recipe                                              | Description                                                                                                                                                        |\n| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| [`mark`](other/mark.py)                             | Click through pre-prepared examples, with no model in the loop.                                                                                                    |\n| [`choice`](other/choice.py)                         | Annotate data with multiple-choice options. The annotated examples will have an additional property `\"accept\": []` mapping to the ID(s) of the selected option(s). |\n| [`question_answering`](other/question_answering.py) | Annotate question/answer pairs with a custom HTML interface.                                                                                                       |\n\n### Community recipes\n\n| Recipe                           | Author     | Description                                                                                             |\n| -------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------- |\n| `phrases.teach`                  | @kabirkhan | Now part of [`sense2vec`](https://github.com/explosion/sense2vec).                                      |\n| `phrases.to-patterns`            | @kabirkhan | Now part of [`sense2vec`](https://github.com/explosion/sense2vec).                                      |\n| [`records.link`](contrib/dedupe) | @kabirkhan | Link records across multiple datasets using the [`dedupe`](https://github.com/dedupeio/dedupe) library. |\n\n### Tutorial recipes\n\nThese recipes have made an appearance in one of our tutorials. \n\n| Recipe                                                      | Description                                                                                   |\n| ----------------------------------------------------------- | --------------------------------------------------------------------------------------------- |\n| [`span-and-textcat`](tutorials/span-and-textcat/)           | Do both spancat and textcat annotations at the same time. Great for chatbots!                 |\n| [`terms.from-ner`](tutorials/terms-from-ner/)               | Generate terms from previous NER annotations.                                                 |\n| [`audio-with-transcript`](tutorials/audio-with-transcript/) | Handles both manual audio annotation as well as transcription.                                |\n| [`progress`](tutorials/progress-update)                     | Demo of an `update`-callback that tracks annotation speed.                                    |\n\n## 📚 Example Datasets and Patterns\n\nTo make it even easier to get started, we've also included a few\n[`example-datasets`](example-datasets), both raw data as well as data containing\nannotations created with Prodigy. For examples of token-based match patterns to\nuse with recipes like `ner.teach` or `ner.match`, see the\n[`example-patterns`](example-patterns) directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fprodigy-recipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexplosion%2Fprodigy-recipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fprodigy-recipes/lists"}