https://github.com/nadhirfr/medical_transcript_keyword_extract
A medical transcription keywords extractor. Extract the possible keyworde from the doctor's medical transcript.
https://github.com/nadhirfr/medical_transcript_keyword_extract
keyword-extraction machine-learning medical-text-mining medical-transcript multilabel-classification
Last synced: about 1 year ago
JSON representation
A medical transcription keywords extractor. Extract the possible keyworde from the doctor's medical transcript.
- Host: GitHub
- URL: https://github.com/nadhirfr/medical_transcript_keyword_extract
- Owner: nadhirfr
- Created: 2021-03-26T09:30:42.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-04-04T19:33:05.000Z (about 5 years ago)
- Last Synced: 2025-02-15T02:15:38.975Z (over 1 year ago)
- Topics: keyword-extraction, machine-learning, medical-text-mining, medical-transcript, multilabel-classification
- Language: Jupyter Notebook
- Homepage:
- Size: 37.4 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# Medical Transcription Keywords Extraction
#### _The aim of this repository is mainly to extract keywords from medical transcription. The dataset obtained from an open medical transcription dataset._
#
[](https://ko-fi.com/H2H146AUD)
#
## Preprocessing
Remove symbols, stopwords, empty spaces after comma, multiple spaces, etc. Basicly it will keep only the words with a single space separator. The clean dataset [here](https://github.com/nadhirfr/medical_transcript_keyword_extract/blob/main/datasets.csv)
## The model
- Pipeline
-- Vectorize the word
-- TF-IDF Transformer
-- OneVsRestClassifier with SGD Classifier
- Input: ```['a string sentences', 'another string sentences']```
- Output: ```['keywords separated by a single space', 'another extracted keywords']```
- Serialized model here [here](https://github.com/nadhirfr/medical_transcript_keyword_extract/blob/main/sgd_pipeline1.pkl)
## Credits
We used a number of open source projects to work properly:
- [Datasets] - Where the story begin!
- [Sklearn] - The most used machine learning Framework
- [NLTK] - Linguistic libray.
- [Pandas], [Keras], [Numpy], and many others
## License
MIT
**Free Software, Hell Yeah!**
[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)
[Datasets]:
[Sklearn]:
[NLTK]:
[Pandas]:
[Keras]:
[Numpy]: