{"id":22367530,"url":"https://github.com/vishalrk1/skimlit","last_synced_at":"2025-07-09T09:15:16.575Z","repository":{"id":40521061,"uuid":"416314787","full_name":"vishalrk1/SkimLit","owner":"vishalrk1","description":"An NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc..) to enable researchers to skim through the literature and dive deeper when necessary.","archived":false,"fork":false,"pushed_at":"2022-02-03T17:26:21.000Z","size":2195,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T11:48:43.076Z","etag":null,"topics":["machine-learning","nlp","pytorch","streamlit","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vishalrk1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-12T11:48:11.000Z","updated_at":"2025-03-20T21:10:08.000Z","dependencies_parsed_at":"2022-06-29T22:31:08.171Z","dependency_job_id":null,"html_url":"https://github.com/vishalrk1/SkimLit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vishalrk1/SkimLit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vishalrk1%2FSkimLit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vishalrk1%2FSkimLit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vishalrk1%2FSkimLit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vishalrk1%2FSkimLit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vishalrk1","download_url":"https://codeload.github.com/vishalrk1/SkimLit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vishalrk1%2FSkimLit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264428788,"owners_count":23606692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","nlp","pytorch","streamlit","tensorflow"],"created_at":"2024-12-04T18:18:38.415Z","updated_at":"2025-07-09T09:15:16.541Z","avatar_url":"https://github.com/vishalrk1.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SkimLit\nAn NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc..) to enable researchers to skim through the literature and dive deeper when necessary.\n\nTry Demo; **[WEB APP](https://huggingface.co/spaces/Vrk/SkimLit)**\n\n\u003cimg src=\"images/app.png\" width=80% height=80%\u003e\n\n* **More specificially, I'am going to replicate the deep learning model behind the 2017 paper [*PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts*](https://arxiv.org/abs/1710.06071).**\n\n## Dataset Used\n[PubMed 200k RCT dataset](https://github.com/Franck-Dernoncourt/pubmed-rct)\n\n* The PubMed 200k RCT dataset is described in *Franck Dernoncourt, Ji Young Lee. [PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts](https://arxiv.org/abs/1710.06071). International Joint Conference on Natural Language Processing (IJCNLP). 2017.*\n\nSome miscellaneous information:\n- PubMed 20k is a subset of PubMed 200k. I.e., any abstract present in PubMed 20k is also present in PubMed 200k. \n- `PubMed_200k_RCT` is the same as `PubMed_200k_RCT_numbers_replaced_with_at_sign`, except that in the latter all numbers had been replaced by `@`. (same for `PubMed_20k_RCT` vs. `PubMed_20k_RCT_numbers_replaced_with_at_sign`).\n\n- **Count Plot**\n\u003cimg src=\"https://user-images.githubusercontent.com/59719046/138639626-48336732-ca8f-4bfe-8063-0e1f7a7c6ae6.png\" width=50% height=50%\u003e\n\n## Models Tried\nAll the note books are availabel [here](\"https://github.com/vishalrk1/SkimLit/tree/main/Notebooks\")\n\n- NaiveBiase Model -\u003e 72% Accuracy\n- Conv1D Model -\u003e 78% Accuracy\n- Model using pretrained token embedding ( Universal sentence embedding ) -\u003e 75% Accuracy\n- Conv1D Model using character level embedding -\u003e 73% Accuracy \n- Model with both token and charcter level embedding -\u003e 76% Accuracy\n- Model with token, character and position level embedding ( https://arxiv.org/pdf/1612.05251.pdf ) -\u003e 81% Accuracy\n\u003c!--         \u003cimg src=\"https://user-images.githubusercontent.com/59719046/138639849-8bb0dcb4-f307-45cf-82ec-671530680863.png\" width=40% height=40%\u003e --\u003e\n- Model described in [this](https://arxiv.org/pdf/1612.05251.pdf) paper with bert embedding -\u003e 88% Accuracy\n## Final Results\n\n### **Results of all Models**\n\u003cimg src=\"images/modeling results.png\" width=80% height=80%\u003e\n\n### **Best Performong Model**\n\n\u003cimg src=\"images/bert model.png\" width=100% height=100%\u003e\n\n### **Final Outputs** \n\n\u003cimg src=\"images/Streamlit.png\" width=100% height=100%\u003e\n\n\n## Packages Used\n- Tensorflow\n- tensorflow_text\n- tensorflow_hub\n- sklearn\n- Matplotlib\n- numpy\n- pandas\n- spaCy\n\n\n## Contact Me\n\n\n\u003cp align=\"start\"\u003e\n    \u003ca href=\"https://github.com/vishalrk1\" target=\"_blank\"\u003e\n        \u003cimg alt=\"Github\" src=\"https://img.shields.io/badge/Github-%23F37626.svg?style=for-the-badge\u0026logo=github\u0026logoColor=white\" /\u003e\u0026nbsp;\n    \u003c/a\u003e\n\u003c!--     \u003ca href=\"https://twitter.com/ArizArmeidi\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/-Twitter-2CA5E0?logo=twitter\u0026style=for-the-badge\u0026logoColor=white\u0026color=black\" alt=\"Twitter\" /\u003e\n    \u003c/a\u003e --\u003e\n    \u003ca href=\"https://www.linkedin.com/in/vishal-karangale-126492216/\" target=\"_blank\"\u003e\n        \u003cimg alt=\"LinkedIn\" src=\"https://img.shields.io/badge/LinkedIn-%23F37626.svg?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white\" /\u003e\u0026nbsp;\n    \u003c/a\u003e\n     \u003ca href=\"https://www.instagram.com/vishal_rk1/\" target=\"_blank\"\u003e\n       \u003cimg alt=\"Instagram\" src=\"https://img.shields.io/badge/Instagram-%23F37626.svg?style=for-the-badge\u0026logo=instagram\u0026logoColor=white\" /\u003e\u0026nbsp;\n    \u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvishalrk1%2Fskimlit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvishalrk1%2Fskimlit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvishalrk1%2Fskimlit/lists"}