https://github.com/shreyaskarnik/pubmedlda
Making it easy to perform LDA on PubMed abstracts.
https://github.com/shreyaskarnik/pubmedlda
Last synced: 9 months ago
JSON representation
Making it easy to perform LDA on PubMed abstracts.
- Host: GitHub
- URL: https://github.com/shreyaskarnik/pubmedlda
- Owner: shreyaskarnik
- Created: 2011-07-03T00:02:10.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2013-06-01T02:03:54.000Z (about 13 years ago)
- Last Synced: 2025-01-05T03:19:37.576Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 120 KB
- Stars: 5
- Watchers: 3
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
This is code for performing topic modeling on PubMed abstracts.
Depends on package gensim and nltk.
The package contains two parts:
1. Retrieve PubMed abstracts using the script getpmAbstracts.py
Usage: usage:python getpmAbstracts.py -q [query] -o [output] -s [flag for steming]
Options:
-h, --help show this help message and exit
-q QUERY, --query=QUERY
Enter the PubMed query (PubMed style queries
supported)
-o OFILE, --output=OFILE
Enter the output file name to store result
-s, --stem To stem result
2. Once you have the PubMed abstracts in a file run LDA over them:
Usage: python gensim_lda_pubmed.py -i [inputfile] -k [number of topics to extract] -v [verbose output FALSE by default] -t [TRUE/ FALSE for TFIDF weights] -r [return topics per document TRUE/FALSE (default FALSE)]
Options:
-h, --help show this help message and exit
-i IFILE, --inputfile=IFILE
Enter the file containing PubMed abstracts
-k NTOP, --numtopics=NTOP
Number of topics
-t TFIDF, --tfidf=TFIDF
TFIDF weignting (default TRUE)
-v VERBOSE Verbose Output TRUE/FALSE (default FALSE)
-r FIT Return topics per document TRUE/FALSE (default FALSE)
All the code in this project is under Creative Commons License.