https://github.com/smartcat-labs/topicmodels
https://github.com/smartcat-labs/topicmodels
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/smartcat-labs/topicmodels
- Owner: smartcat-labs
- Created: 2017-02-14T21:24:55.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-02-20T10:18:26.000Z (almost 9 years ago)
- Last Synced: 2025-02-07T11:14:36.043Z (11 months ago)
- Language: R
- Size: 15.3 MB
- Stars: 0
- Watchers: 13
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TopicModels
Description: using {maptpx} to perform fast topic modeling in R.
=================================================================
Files:
**Data sets**
+ *DataSciSearch.csv*
Google News search queries.
+ *DataSciTermModel.csv*
A terminological model.
+ *tdMatrix_RAW.csv*
Raw term-document matrix.
+ *tdMatrix.csv*
Clean term-document matrix.
+ *dsCorpus.Rds*
{tm} Corpus as Rds file.
**R**
+ *SC_TopicModelsR_Part1.R*
Scrapes Google News results obtained from querying from {tm.plugin.webmining} with terms in DataSciSearch.csv
Corpus formation takes place here.
+ *SC_TopicModelsR_Part2.R*
Text pre-processing w. {tm} and routines to adjust the counts for embedded terms.
Uses the following terminological model: DataSciTermModel.csv
+ *SC_TopicModelsR_Part3.R*
Topic models, k = 2,.., 20, w. {maptpx}.
Visualizations w. {ggplot2} and {igraph}