An open API service indexing awesome lists of open source software.

https://github.com/shreyaskarnik/pubmedlda

Making it easy to perform LDA on PubMed abstracts.
https://github.com/shreyaskarnik/pubmedlda

Last synced: 9 months ago
JSON representation

Making it easy to perform LDA on PubMed abstracts.

Awesome Lists containing this project

README

          

This is code for performing topic modeling on PubMed abstracts.
Depends on package gensim and nltk.
The package contains two parts:
1. Retrieve PubMed abstracts using the script getpmAbstracts.py
Usage: usage:python getpmAbstracts.py -q [query] -o [output] -s [flag for steming]

Options:
-h, --help show this help message and exit
-q QUERY, --query=QUERY
Enter the PubMed query (PubMed style queries
supported)
-o OFILE, --output=OFILE
Enter the output file name to store result
-s, --stem To stem result
2. Once you have the PubMed abstracts in a file run LDA over them:

Usage: python gensim_lda_pubmed.py -i [inputfile] -k [number of topics to extract] -v [verbose output FALSE by default] -t [TRUE/ FALSE for TFIDF weights] -r [return topics per document TRUE/FALSE (default FALSE)]

Options:
-h, --help show this help message and exit
-i IFILE, --inputfile=IFILE
Enter the file containing PubMed abstracts
-k NTOP, --numtopics=NTOP
Number of topics
-t TFIDF, --tfidf=TFIDF
TFIDF weignting (default TRUE)
-v VERBOSE Verbose Output TRUE/FALSE (default FALSE)
-r FIT Return topics per document TRUE/FALSE (default FALSE)

All the code in this project is under Creative Commons License.