{"id":13935188,"url":"https://github.com/rasbt/datacollect","last_synced_at":"2025-05-12T04:08:04.189Z","repository":{"id":22083994,"uuid":"25413533","full_name":"rasbt/datacollect","owner":"rasbt","description":"A collection of tools to collect and download various data.","archived":false,"fork":false,"pushed_at":"2017-10-16T18:09:14.000Z","size":3450,"stargazers_count":212,"open_issues_count":1,"forks_count":94,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-05-12T04:07:55.538Z","etag":null,"topics":["collect-lyrics","python","twitter-timeline"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rasbt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-10-19T00:13:00.000Z","updated_at":"2025-04-26T12:22:23.000Z","dependencies_parsed_at":"2022-08-20T20:10:11.787Z","dependency_job_id":null,"html_url":"https://github.com/rasbt/datacollect","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fdatacollect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fdatacollect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fdatacollect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fdatacollect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rasbt","download_url":"https://codeload.github.com/rasbt/datacollect/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672707,"owners_count":21945481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collect-lyrics","python","twitter-timeline"],"created_at":"2024-08-07T23:01:27.479Z","updated_at":"2025-05-12T04:08:04.142Z","avatar_url":"https://github.com/rasbt.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# datacollect\n\n\n**A collection of tools to collect and download various data.**\n\nOften, I write simple scripts and tools to collect data for various \"data science\" tasks. I thought that it might be worthwhile to collect them in a central repository since they might be useful to others!\n\n#### Contents\n- [Collect Lyrics](./collect_lyrics)\n- [Twitter Timeline](./twitter_timeline)\n- [Collect Popular Music Tags](./collect_music_tags)\n- [PDB Info Table](./pdb_infotable)\n- [ZINC Molecule Downloader](./zinc_downloader)\n- [Collect English Premier League Soccer Data](./collect_fantasysoccer)\n\n\u003cbr\u003e\n\n**Important Note**  \nPlease note that I developed and tested these tools in Python 3.x, and it could be possible that the scripts do not work flawlessly in Python 2.7.x due to the more challenging unicode handling.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## [Collect Lyrics](./collect_lyrics)\n\n[[back to top](#contents)]\n\nA [command line tool](./collect_lyrics) to download song lyrics given artist names and song titles.\n\n![](./collect_lyrics/images/example_out.png)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## [Twitter Timeline](./twitter_timeline)\n\n[[back to top](#contents)]\n\nA [command line tool](./twitter_timeline) that downloads your personal twitter timeline in CSV format with optional keyword filter.\n\n![](./twitter_timeline/images/python_tweets.png)\n\n[Tutorial](http://nbviewer.ipython.org/github/rasbt/datacollect/blob/master/dataviz/twitter_cloud/twitter_wordcloud.ipynb) for turning your twitter timeline into a word cloud.\n![](./dataviz/twitter_cloud/my_twitter_wordcloud_2_lowres.jpg)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## [Collect Popular Music Tags](./collect_music_tags)\n\n[[back to top](#contents)]\n\nA [command line tool](./collect_music_tags) to download popular tags for a list of songs from [last.fm](http://www.last.fm), e.g., for various data mining projects.\n\n![](./collect_music_tags/images/example.png)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## [PDB Info Table](./pdb_infotable)\n[[back to top](#contents)]\n\nA [command line tool](./pdb_infotable) that creates an info table from a list of PDB files.\n\n![](./pdb_infotable/images/example.png)\n\n## [ZINC Molecule Downloader](./zinc_downloader)\n\n[[back to top](#contents)]\n\nA [command line tool](./zinc_downloader) for downloading 3D structures of small chemical molecules from http://zinc.docking.org.\n\n![](./zinc_downloader/images/example-1.png)\n\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## [Collect English Premier League Soccer Data](./collect_fantasysoccer)\n[[back to top](#contents)]\n\nA [command line tool](./collect_fantasysoccer) to Collect Fantasy Soccer data  from the Premier League.\n![](./collect_fantasysoccer/images/example_table.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fdatacollect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frasbt%2Fdatacollect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fdatacollect/lists"}