Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rasbt/datacollect
A collection of tools to collect and download various data.
https://github.com/rasbt/datacollect
collect-lyrics python twitter-timeline
Last synced: 18 days ago
JSON representation
A collection of tools to collect and download various data.
- Host: GitHub
- URL: https://github.com/rasbt/datacollect
- Owner: rasbt
- License: gpl-3.0
- Created: 2014-10-19T00:13:00.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2017-10-16T18:09:14.000Z (over 7 years ago)
- Last Synced: 2024-12-26T19:07:28.629Z (25 days ago)
- Topics: collect-lyrics, python, twitter-timeline
- Language: Jupyter Notebook
- Size: 3.29 MB
- Stars: 210
- Watchers: 27
- Forks: 95
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# datacollect
**A collection of tools to collect and download various data.**
Often, I write simple scripts and tools to collect data for various "data science" tasks. I thought that it might be worthwhile to collect them in a central repository since they might be useful to others!
#### Contents
- [Collect Lyrics](./collect_lyrics)
- [Twitter Timeline](./twitter_timeline)
- [Collect Popular Music Tags](./collect_music_tags)
- [PDB Info Table](./pdb_infotable)
- [ZINC Molecule Downloader](./zinc_downloader)
- [Collect English Premier League Soccer Data](./collect_fantasysoccer)
**Important Note**
Please note that I developed and tested these tools in Python 3.x, and it could be possible that the scripts do not work flawlessly in Python 2.7.x due to the more challenging unicode handling.
## [Collect Lyrics](./collect_lyrics)
[[back to top](#contents)]
A [command line tool](./collect_lyrics) to download song lyrics given artist names and song titles.
![](./collect_lyrics/images/example_out.png)
## [Twitter Timeline](./twitter_timeline)
[[back to top](#contents)]
A [command line tool](./twitter_timeline) that downloads your personal twitter timeline in CSV format with optional keyword filter.
![](./twitter_timeline/images/python_tweets.png)
[Tutorial](http://nbviewer.ipython.org/github/rasbt/datacollect/blob/master/dataviz/twitter_cloud/twitter_wordcloud.ipynb) for turning your twitter timeline into a word cloud.
![](./dataviz/twitter_cloud/my_twitter_wordcloud_2_lowres.jpg)
## [Collect Popular Music Tags](./collect_music_tags)
[[back to top](#contents)]
A [command line tool](./collect_music_tags) to download popular tags for a list of songs from [last.fm](http://www.last.fm), e.g., for various data mining projects.
![](./collect_music_tags/images/example.png)
## [PDB Info Table](./pdb_infotable)
[[back to top](#contents)]A [command line tool](./pdb_infotable) that creates an info table from a list of PDB files.
![](./pdb_infotable/images/example.png)
## [ZINC Molecule Downloader](./zinc_downloader)
[[back to top](#contents)]
A [command line tool](./zinc_downloader) for downloading 3D structures of small chemical molecules from http://zinc.docking.org.
![](./zinc_downloader/images/example-1.png)
## [Collect English Premier League Soccer Data](./collect_fantasysoccer)
[[back to top](#contents)]A [command line tool](./collect_fantasysoccer) to Collect Fantasy Soccer data from the Premier League.
![](./collect_fantasysoccer/images/example_table.png)