{"id":13595506,"url":"https://github.com/guillaumegenthial/sequence_tagging","last_synced_at":"2025-05-15T16:04:16.402Z","repository":{"id":113442230,"uuid":"86601523","full_name":"guillaumegenthial/sequence_tagging","owner":"guillaumegenthial","description":"Named Entity Recognition (LSTM + CRF) - Tensorflow","archived":false,"fork":false,"pushed_at":"2020-10-16T09:18:22.000Z","size":56,"stargazers_count":1944,"open_issues_count":22,"forks_count":701,"subscribers_count":72,"default_branch":"master","last_synced_at":"2025-04-07T21:12:43.597Z","etag":null,"topics":["bi-lstm","characters-embeddings","conditional-random-fields","crf","glove","named-entity-recognition","ner","state-of-art","tensorflow"],"latest_commit_sha":null,"homepage":"https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guillaumegenthial.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-29T15:53:29.000Z","updated_at":"2025-03-27T15:46:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"7affc5d0-521a-4446-bc30-f90463d960a2","html_url":"https://github.com/guillaumegenthial/sequence_tagging","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillaumegenthial%2Fsequence_tagging","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillaumegenthial%2Fsequence_tagging/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillaumegenthial%2Fsequence_tagging/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillaumegenthial%2Fsequence_tagging/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guillaumegenthial","download_url":"https://codeload.github.com/guillaumegenthial/sequence_tagging/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374404,"owners_count":22060609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bi-lstm","characters-embeddings","conditional-random-fields","crf","glove","named-entity-recognition","ner","state-of-art","tensorflow"],"created_at":"2024-08-01T16:01:51.346Z","updated_at":"2025-05-15T16:04:16.384Z","avatar_url":"https://github.com/guillaumegenthial.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Named Entity Recognition with Tensorflow\n\nThis repo implements a NER model using Tensorflow (LSTM + CRF + chars embeddings).\n\n__A [better implementation is available here, using `tf.data` and `tf.estimator`, and achieves an F1 of 91.21](https://github.com/guillaumegenthial/tf_ner)__\n\nState-of-the-art performance (F1 score between 90 and 91).\n\nCheck the [blog post](https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html)\n\n## Task\n\nGiven a sentence, give a tag to each word. A classical application is Named Entity Recognition (NER). Here is an example\n\n```\nJohn   lives in New   York\nB-PER  O     O  B-LOC I-LOC\n```\n\n\n## Model\n\nSimilar to [Lample et al.](https://arxiv.org/abs/1603.01360) and [Ma and Hovy](https://arxiv.org/pdf/1603.01354.pdf).\n\n- concatenate final states of a bi-lstm on character embeddings to get a character-based representation of each word\n- concatenate this representation to a standard word vector representation (GloVe here)\n- run a bi-lstm on each sentence to extract contextual representation of each word\n- decode with a linear chain CRF\n\n\n\n## Getting started\n\n\n1. Download the GloVe vectors with\n\n```\nmake glove\n```\n\nAlternatively, you can download them manually [here](https://nlp.stanford.edu/projects/glove/) and update the `glove_filename` entry in `config.py`. You can also choose not to load pretrained word vectors by changing the entry `use_pretrained` to `False` in `model/config.py`.\n\n2. Build the training data, train and evaluate the model with\n```\nmake run\n```\n\n\n## Details\n\n\nHere is the breakdown of the commands executed in `make run`:\n\n1. [DO NOT MISS THIS STEP] Build vocab from the data and extract trimmed glove vectors according to the config in `model/config.py`.\n\n```\npython build_data.py\n```\n\n2. Train the model with\n\n```\npython train.py\n```\n\n\n3. Evaluate and interact with the model with\n```\npython evaluate.py\n```\n\n\nData iterators and utils are in `model/data_utils.py` and the model with training/test procedures is in `model/ner_model.py`\n\nTraining time on NVidia Tesla K80 is 110 seconds per epoch on CoNLL train set using characters embeddings and CRF.\n\n\n\n## Training Data\n\n\nThe training data must be in the following format (identical to the CoNLL2003 dataset).\n\nA default test file is provided to help you getting started.\n\n\n```\nJohn B-PER\nlives O\nin O\nNew B-LOC\nYork I-LOC\n. O\n\nThis O\nis O\nanother O\nsentence\n```\n\n\nOnce you have produced your data files, change the parameters in `config.py` like\n\n```\n# dataset\ndev_filename = \"data/coNLL/eng/eng.testa.iob\"\ntest_filename = \"data/coNLL/eng/eng.testb.iob\"\ntrain_filename = \"data/coNLL/eng/eng.train.iob\"\n```\n\n\n\n\n## License\n\nThis project is licensed under the terms of the apache 2.0 license (as Tensorflow and derivatives). If used for research, citation would be appreciated.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillaumegenthial%2Fsequence_tagging","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguillaumegenthial%2Fsequence_tagging","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillaumegenthial%2Fsequence_tagging/lists"}