Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bact/socialmedia
Play with data from social media
https://github.com/bact/socialmedia
social-media twitter
Last synced: 14 days ago
JSON representation
Play with data from social media
- Host: GitHub
- URL: https://github.com/bact/socialmedia
- Owner: bact
- License: gpl-3.0
- Created: 2017-12-06T13:20:05.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-17T11:13:13.000Z (almost 6 years ago)
- Last Synced: 2025-01-12T05:44:28.008Z (18 days ago)
- Topics: social-media, twitter
- Language: Python
- Homepage:
- Size: 149 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# get-stream.py
A script to collect tweets from Twitter stream- This is done by using Twitter's Stream API to get tweets with a language specified, filtered with 400 commons words in that language
## Commons words
- Burmese http://1000mostcommonwords.com/tag/burmese-words/
- English http://world-english.org/english500.htm
- Vietnamese http://www.101languages.net/vietnamese/vietnamese-word-list/
- Thai common words list, partially drawn from Chulalongkorn University's 400 most used Thai words. http://womenlearnthai.com/index.php/thai-frequency-lists-with-english-definitions/## Plan
- Check if it's a retweet or not
- If it is a retweet, does it has an additional text (check id_str)
- Have to get the full tweet, no truncation
- Check truncated=True
- Keep these attributes: id, retweet_count, favorite_count, retweeted_status(id, favorite_count, retweet_count)# make-train-data.py
Convert tweets in JSON format to ```__label__X text text tex text``` format as required by fastText.