https://github.com/nadhirfr/medical_transcript_keyword_extract

A medical transcription keywords extractor. Extract the possible keyworde from the doctor's medical transcript.
https://github.com/nadhirfr/medical_transcript_keyword_extract

keyword-extraction machine-learning medical-text-mining medical-transcript multilabel-classification

Last synced: about 1 year ago
JSON representation

A medical transcription keywords extractor. Extract the possible keyworde from the doctor's medical transcript.

Host: GitHub
URL: https://github.com/nadhirfr/medical_transcript_keyword_extract
Owner: nadhirfr
Created: 2021-03-26T09:30:42.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2021-04-04T19:33:05.000Z (about 5 years ago)
Last Synced: 2025-02-15T02:15:38.975Z (over 1 year ago)
Topics: keyword-extraction, machine-learning, medical-text-mining, medical-transcript, multilabel-classification
Language: Jupyter Notebook
Homepage:
Size: 37.4 MB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml

Awesome Lists containing this project

README

          # Medical Transcription Keywords Extraction

#### _The aim of this repository is mainly to extract keywords from medical transcription. The dataset obtained from an open medical transcription dataset._

#

[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/H2H146AUD)

#

## Preprocessing

Remove symbols, stopwords, empty spaces after comma, multiple spaces, etc. Basicly it will keep only the words with a single space separator. The clean dataset  [here](https://github.com/nadhirfr/medical_transcript_keyword_extract/blob/main/datasets.csv)

## The model

- Pipeline

-- Vectorize the word

-- TF-IDF Transformer

-- OneVsRestClassifier with SGD Classifier

- Input: ```['a string sentences', 'another string sentences']```

- Output: ```['keywords separated by a single space', 'another extracted keywords']```

- Serialized model here [here](https://github.com/nadhirfr/medical_transcript_keyword_extract/blob/main/sgd_pipeline1.pkl)

## Credits

We used a number of open source projects to work properly:

- [Datasets] - Where the story begin!

- [Sklearn] - The most used machine learning Framework

- [NLTK] - Linguistic libray.

- [Pandas], [Keras], [Numpy], and many others

## License

MIT

**Free Software, Hell Yeah!**

[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)

   [Datasets]: 

   [Sklearn]: 

   [NLTK]: 

   [Pandas]: 

   [Keras]: 

   [Numpy]:

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nadhirfr/medical_transcript_keyword_extract

Awesome Lists containing this project

README