https://github.com/shreyaskarnik/pubmedlda

Making it easy to perform LDA on PubMed abstracts.
https://github.com/shreyaskarnik/pubmedlda

Last synced: 10 months ago
JSON representation

Making it easy to perform LDA on PubMed abstracts.

Host: GitHub
URL: https://github.com/shreyaskarnik/pubmedlda
Owner: shreyaskarnik
Created: 2011-07-03T00:02:10.000Z (about 15 years ago)
Default Branch: master
Last Pushed: 2013-06-01T02:03:54.000Z (about 13 years ago)
Last Synced: 2025-01-05T03:19:37.576Z (over 1 year ago)
Language: Python
Homepage:
Size: 120 KB
Stars: 5
Watchers: 3
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README

Awesome Lists containing this project

README

          This is code for performing topic modeling on PubMed abstracts. 

Depends on package gensim and nltk.

The package contains two parts:

1. Retrieve PubMed abstracts using the script getpmAbstracts.py

Usage: usage:python getpmAbstracts.py -q [query] -o [output] -s [flag for steming]

Options:

  -h, --help            show this help message and exit

  -q QUERY, --query=QUERY

                        Enter the PubMed query (PubMed style queries

                        supported)

  -o OFILE, --output=OFILE

                        Enter the output file name to store result

  -s, --stem            To stem result

2. Once you have the PubMed abstracts in a file run LDA over them:

Usage: python gensim_lda_pubmed.py -i [inputfile] -k [number of topics to extract] -v [verbose output FALSE by default] -t [TRUE/ FALSE for TFIDF weights] -r [return topics per document TRUE/FALSE (default FALSE)]

Options:

  -h, --help            show this help message and exit

  -i IFILE, --inputfile=IFILE

                        Enter the file containing PubMed abstracts

  -k NTOP, --numtopics=NTOP

                        Number of topics

  -t TFIDF, --tfidf=TFIDF

                        TFIDF weignting (default TRUE)

  -v VERBOSE            Verbose Output TRUE/FALSE (default FALSE)

  -r FIT                Return topics per document TRUE/FALSE (default FALSE)

All the code in this project is under Creative Commons License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shreyaskarnik/pubmedlda

Awesome Lists containing this project

README