{"id":21761951,"url":"https://github.com/ashutoshdongare/softskill-ner","last_synced_at":"2025-03-21T04:41:17.230Z","repository":{"id":201566007,"uuid":"484416364","full_name":"AshutoshDongare/softskill-NER","owner":"AshutoshDongare","description":"Fine tuning 🤗 transformer model for softskill NER task","archived":false,"fork":false,"pushed_at":"2022-06-12T09:46:34.000Z","size":67,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-26T01:19:42.810Z","etag":null,"topics":["bert-fine-tuning","dataset","distilbert","huggingface","ner","softskills","token-classification","training-data","transfer-learning","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AshutoshDongare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-04-22T11:57:36.000Z","updated_at":"2023-03-27T07:30:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"293072b2-fda5-44f3-ab34-98c0731402f5","html_url":"https://github.com/AshutoshDongare/softskill-NER","commit_stats":null,"previous_names":["ashutoshdongare/softskill-ner"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshutoshDongare%2Fsoftskill-NER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshutoshDongare%2Fsoftskill-NER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshutoshDongare%2Fsoftskill-NER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshutoshDongare%2Fsoftskill-NER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AshutoshDongare","download_url":"https://codeload.github.com/AshutoshDongare/softskill-NER/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244739947,"owners_count":20501990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-fine-tuning","dataset","distilbert","huggingface","ner","softskills","token-classification","training-data","transfer-learning","transformers"],"created_at":"2024-11-26T12:10:25.014Z","updated_at":"2025-03-21T04:41:17.204Z","avatar_url":"https://github.com/AshutoshDongare.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":" [![Mailing list : test](http://img.shields.io/badge/Email-gray.svg?style=for-the-badge\u0026logo=gmail)](mailto:ashutosh.dongare@gmail.com) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-GNU%20AGPL%203.0-lightgrey.svg?style=for-the-badge)](https://github.com/AshutoshDongare/convo/blob/main/LICENSE)\n\n\n# Fine-tune a pretrained 🤗 model for SoftSkill NER\n\n![header](https://user-images.githubusercontent.com/18417621/164710324-54f54dbc-797b-4419-823e-3706d60a011f.png)\n\nThis repo shows how to fine tune custom NER model to classify softskills using 🤗 Huggingface pretrained model [distilbert](thttps://huggingface.co/distilbert-base-uncased). The custom training data has some of the typical Softskills like  \"positive attitude\", \"leadership\", \"customer focus\" etc.\n\nWe will Fine-tune the model for softskill NER using 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer). \n\nThis is the simplest way to fine-tune a 🤗 Transformer model. You can however choose to do this using pytorch and tensorflow way which gives you flexibility to write your own custom training loops if you require specific ways to train.\n\nThe custom dataset has around 119 sentences tokenized and annotated the way required by huggingface model for fine-tuning. \n(please drop me a line if you want to know how to prepare tokenized and annotated dataset for NER training)\n\nThis trained model still provides decent performance with such low number of training samples. It is also resilient enough to identify the softskills which are not in the training data. \n\nFor production use cases it is recommended to compile few hundreds to thousands of training samples.\n\nSample dataset format for token classification task is shown below.\n\n```\n{\n \"id\":\"101\", \n \"ner_tags\" :[0,0,0,0,0,0,0,1,0,0,0,0,0], \n \"tokens\":[\"a\",\"good\",\"project\",\"manager\",\"is\",\"able\",\"to\",\"prioritize\",\"from\",\"the\",\"list\",\"of\",\"tasks\"]\n }\n ```\nYou may want to take a look at ```/data/train_ner.json``` to check which all softskills have been annotated in the training data.\n\n### Below is the metrics for this fine-tuning run\n![Metrics](https://user-images.githubusercontent.com/18417621/164762441-2c3103c3-7dfd-4386-add5-b0315ba336d2.png)\n\n\n## Inference\nThe training script takes a sample sentence and runs inference on it to check whether the NER model is trained properly and it can perform softskill NER classification. Model is able to classify unseen softskills such as \"composed\" and \"Professional\". \n\n- NER =  composed\n- NER =  confident\n- NER =  professional\n- NER =  leadership\n\n### load saved model for inference\nyou may also load saved model in the same way you would use any pretrained 🤗 Transformer model using pipeline.\n\nBelow is part of the code indicating how you can load saved model and run inference on it. Note that ```.from_pretrained()``` loads from the directory containing custom trained model.\n\n```\nmodel = AutoModelForTokenClassification.from_pretrained(\"./skillner_model/\")\ntokenizer = AutoTokenizer.from_pretrained(\"./skillner_model/\")\n\nNER_INFERENCE = pipeline(\"ner\", model=model.to(device), tokenizer=tokenizer)\n\nner_results = NER_INFERENCE(\"your sentence for softskill NER inference\")\n```\n\n## Citations\n\nThis repo is based on [Huggingface](https://huggingface.co/), compiled for Custom NER fine-tuning\n\n\n# Future enhancements\n- Compile and annotate more training data for NER. This can be achieved using Web or Wiki dump Scraping for the relevant data. \n- Implement chunking like B-SOFTSKILL/I-SOFTSKILL to recognize beginning of / is inside softskill entity.\n \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashutoshdongare%2Fsoftskill-ner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashutoshdongare%2Fsoftskill-ner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashutoshdongare%2Fsoftskill-ner/lists"}