https://github.com/ahmedbesbes/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python
A practical guide to topic mining and interactive visualizations
https://github.com/ahmedbesbes/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python
bokeh crontab gensim kmeans latent-dirichlet-allocation natural-language-processing newsapi newsapi-python nlp nlp-keywords-extraction nlp-machine-learning plots sklearn text-mining tf-idf topic-modeling tsne-algorithm tsne-plot
Last synced: 3 months ago
JSON representation
A practical guide to topic mining and interactive visualizations
- Host: GitHub
- URL: https://github.com/ahmedbesbes/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python
- Owner: ahmedbesbes
- Created: 2017-03-11T09:03:06.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-29T11:06:37.000Z (over 7 years ago)
- Last Synced: 2024-05-08T00:26:05.822Z (over 1 year ago)
- Topics: bokeh, crontab, gensim, kmeans, latent-dirichlet-allocation, natural-language-processing, newsapi, newsapi-python, nlp, nlp-keywords-extraction, nlp-machine-learning, plots, sklearn, text-mining, tf-idf, topic-modeling, tsne-algorithm, tsne-plot
- Language: HTML
- Homepage: https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html
- Size: 17.2 MB
- Stars: 75
- Watchers: 8
- Forks: 50
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# End-to-end tutorial to tackle topic mining and interactive visualizations in python
In this tutorial we'll dive in topic mining. We'll analyze a dataset of newsfeeds extracted from more than 60 sources thanks to a web service called newsapi.org .
![]()
We'll show how to process this text data, analyze it and automatically extract visual clusters of topics from it.
We'll show how to put in practice great python tools for interactive visualization, topic mining and text analytics: **scikit-learn**, **gensim** for the modeling, **Bokeh** and **PyLDAvis** for the plots.
All the code is available to you to run and test.
You can either visualize the notebook on github or on my website .
### Environment setup
In this tutorial, I'll be using python 2.7
One thing I recommend is downloading the Anaconda distribution for python 2.7 from this link. This distribution wraps python with the necessary packages used in data science like Numpy, Pandas, Scipy or Scikit-learn.
```shell
pip install tqdm
conda install -c anaconda nltk=3.2.2
conda install bokeh
pip install --upgrade gensim
pip install pyldavis
pip install wordcloud```
If you have any question or recommendation regarding the content of this article, please refer to the website's comment section.