{"id":16689238,"url":"https://github.com/sskorol/ner-spacy-doccano","last_synced_at":"2025-04-10T00:44:25.233Z","repository":{"id":86918430,"uuid":"262411327","full_name":"sskorol/ner-spacy-doccano","owner":"sskorol","description":"NER using Doccano / Spacy EN","archived":false,"fork":false,"pushed_at":"2020-05-09T07:58:36.000Z","size":5,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-10T00:44:20.668Z","etag":null,"topics":["doccano","ner","python","spacy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sskorol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-08T19:29:11.000Z","updated_at":"2023-12-22T14:30:30.000Z","dependencies_parsed_at":"2023-03-13T19:49:14.873Z","dependency_job_id":null,"html_url":"https://github.com/sskorol/ner-spacy-doccano","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sskorol%2Fner-spacy-doccano","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sskorol%2Fner-spacy-doccano/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sskorol%2Fner-spacy-doccano/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sskorol%2Fner-spacy-doccano/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sskorol","download_url":"https://codeload.github.com/sskorol/ner-spacy-doccano/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137997,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doccano","ner","python","spacy"],"created_at":"2024-10-12T15:47:37.148Z","updated_at":"2025-04-10T00:44:25.221Z","avatar_url":"https://github.com/sskorol.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## NER using Doccano / Spacy EN  \n\nThis repository provides a basic example of a named entity extraction (NER) task for English model.\n\n### Preparation\n\nMake sure you have Python \u003e= 3.6 and Docker installed.\n\nSetup [Doccano](https://github.com/doccano/doccano):\n```shell script\ngit clone https://github.com/doccano/doccano.git\ncd doccano\ndocker-compose -f docker-compose.prod.yml up -d\n```\n\nSetup a sample project based on [Spacy](https://spacy.io/):\n```shell script\ngit clone https://github.com/sskorol/ner-spacy-doccano.git\ncd ner-spacy-doccano\npip install -r requirements.txt\npython3 -m spacy download en_core_web_sm\n```\n\n### Annotating Text\n\n- Open Doccano UI: http://localhost\n- Sign in with default credentials: admin / password\n- Create a new project\n- Watch provided tutorial on a home screen\n- Import `./data_to_be_annotated.txt`\n- Create a new label, e.g. `CATEGORY`\n- Start annotating imported data by marking the following words as `CATEGORY`: product, process, service\n- When all the sentences are labeled, export JSON with text labels\n- Save it into project root and give it `categories.json` name\n\nNote that exported JSON will be in unusual format. You have to convert the content into array: wrap it with `[]` and add commas after each line.\n\n### Model Training\n\nThe following command will run a script which adjusts an existing English model with a new `CATEGORY` label and performs a training based on annotated data.  \n\n```shell script\npython3 ./ner_train.py\n```\n\nYou should see something similar as an output:\n```text\nLoaded model 'en_core_web_sm'\nLosses {'ner': 846.7723659528049}\nLosses {'ner': 623.0931596025007}\nLosses {'ner': 689.6105882608678}\n------------------------\nEntities in 'I'm thinking about several categories. Let me start with the service one.'\nCATEGORY service\n------------------------\nEntities in 'Let’s choose a product'\nCATEGORY product\n------------------------\nEntities in 'It is quite a well-known service'\nCATEGORY service\n------------------------\n```\n\nVerification data should give you a confidence if you model is accurate.\n\nApart from that, there should be a new NER `./model` folder created.\n\n### Model Testing\n\nRun the following command to test the generated model on a custom data:\n\n```shell script\npython3 net_test.py\n```\n\nYou should see a similar output which confirms model's confidence level:\n\n```text\nLoading from ./model\n------------------------\nEntities in 'Sounds interesting. Let's say it will be a product!'\nCATEGORY product\n------------------------\nEntities in 'Very nice idea. I'll pick a service probably.'\nCATEGORY service\n------------------------\nEntities in 'Process'\nCATEGORY Process\n------------------------\n``` ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsskorol%2Fner-spacy-doccano","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsskorol%2Fner-spacy-doccano","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsskorol%2Fner-spacy-doccano/lists"}