{"id":21009532,"url":"https://github.com/jwinman91/ai-ner","last_synced_at":"2026-03-06T11:02:28.713Z","repository":{"id":204304847,"uuid":"709988995","full_name":"jWinman91/AI-NER","owner":"jWinman91","description":"An AI-powered, but model-agnostic name-entity recognition toolkit.","archived":false,"fork":false,"pushed_at":"2024-07-12T23:09:26.000Z","size":520,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-12-21T19:48:33.664Z","etag":null,"topics":["anonymization","de-identification","name-entity-recognition","ner","nlp-machine-learning","pii","pii-anonymization","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jWinman91.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-25T19:26:01.000Z","updated_at":"2024-07-12T23:09:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"b048025c-2c59-46f4-8990-17c9cc43f63c","html_url":"https://github.com/jWinman91/AI-NER","commit_stats":null,"previous_names":["jwinman91/ai-extractor"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jWinman91/AI-NER","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jWinman91%2FAI-NER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jWinman91%2FAI-NER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jWinman91%2FAI-NER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jWinman91%2FAI-NER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jWinman91","download_url":"https://codeload.github.com/jWinman91/AI-NER/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jWinman91%2FAI-NER/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30173348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T07:56:45.623Z","status":"ssl_error","status_checked_at":"2026-03-06T07:55:55.621Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization","de-identification","name-entity-recognition","ner","nlp-machine-learning","pii","pii-anonymization","python"],"created_at":"2024-11-19T09:17:08.764Z","updated_at":"2026-03-06T11:02:28.685Z","avatar_url":"https://github.com/jWinman91.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI-Name-Entity-Recognizer (AI-NER): Text Editing with Language Models\n\n\nThis repository is designed for editing input text using a Language Model.\nIt allows users to apply various editing prompts and various models defined in configuration files to modify the input text.\n\nCurrently, the editing prompts are written to recognize and replace name entities such as names or locations from free text\nand replaces all occurrences with a placeholder defined in the prompt config file.\n\nThis project aims to stay model agnostic (i.e. it can be used with a model of the user's choice) and therefore avoid any vendor lock-in. \n\nThis software functions in a way like a smart editor.\nE.g. it can anonymize names in a text or exchange name entities for a batch of emails.\n\n## Table of Contents\n\n- [Installation](#Installation)\n- [Configuration](#Configuration)\n- [Usage](#Usage)\n- [Example](#Example)\n- [License](#license)\n\n## Installation\n\nTo use the AI-NER, follow these steps:\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/jWinman91/ai-extractor.git\ncd ai-extractor\n```\n2. Install the required dependencies:\n```bash\npip install -r requirements.txt\n```\n\n3. Download a model of your choice into `models`. I recommend the following models from [Hugging Face](https://huggingface.co/) for German text:\n   * [flair/ner-german-large](https://huggingface.co/flair/ner-german-large)\n   * [roberta-large-NER](https://huggingface.co/51la5/roberta-large-NER)\n   * [sauerkraut-7b](https://huggingface.co/TheBloke/SauerkrautLM-7B-v1-mistral-GGUF).\n   \nEach model can be downloaded by using wget, e.g.:\n```wget https://huggingface.co/TheBloke/SauerkrautLM-7B-v1-mistral-GGUF/resolve/main/sauerkrautlm-7b-v1-mistral.Q4_0.gguf```\n\n## Configuration\n\nIn order to use this repository, several configuration need to be set for the model as well as the NER tasks to extract name entities.\nThese can be set in two types of configuration files, `config_model` and `config_prompt`.\n\n- `config_model` sets all configurations necessary for the respective model.\n- `config_prompt`sets the configurations for the NER tasks (e.g. which model to choose and with what to replace the identified name entity).\n\n**TODO**\n\n## Usage\n\nAfter setting the configuration and downloading one (or more) of the models, you can simply use AI-NER by running:\n\n```bash\npython main.py $PATH_TO_INPUT $PATH_TO_OUTPUT\n```\n\n## Example\n\nAn example text file is added in `data/input/email_example_de.txt`, which is a self-written email in German.\nThere are also pre-defined `config_model` and `config_prompt` files.\nBy running AI-NER with the `anonymize_example_email.yaml` prompt configuration and `german_mistral.yaml` as well as the `flair.yaml` model configuration,\nwe can now anonymize certain entities in the example email.\n\nBelow are an image of before and after running `python main.py` on the email using the `anonymize_emails-NER.yaml` config file.\n\n\u003cdiv style=\"display: flex; justify-content: space-between;\"\u003e\n  \u003cimg src=\"data/images_examples/email_before_de.png\" alt=\"Email before\" width=\"85%\" /\u003e\n  \u003cimg src=\"data/images_examples/email_after_de.png\" alt=\"Email after\" width=\"85%\" /\u003e\n\u003c/div\u003e\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [NLTK](https://www.nltk.org/) - Natural Language Toolkit used for sentence tokenization.\n- [Hugging Face](https://huggingface.co/) - Framework for working with state-of-the-art natural language processing models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwinman91%2Fai-ner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjwinman91%2Fai-ner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwinman91%2Fai-ner/lists"}