Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pratikpakhale/tweets-scrapper

These are just simple scripts to scrape tweets and then do some analysis. Here, we try to search for IPL tweets and then analyse them using Gemini LLM. You can use the js snippets for scraping and further do your own analysis.
https://github.com/pratikpakhale/tweets-scrapper

js scrape tweet twitter

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/pratikpakhale/tweets-scrapper
Owner: pratikpakhale
Created: 2024-05-03T21:06:50.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-05-03T21:07:22.000Z (8 months ago)
Last Synced: 2024-05-03T22:25:23.792Z (8 months ago)
Topics: js, scrape, tweet, twitter
Language: Python
Homepage:
Size: 3.24 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Manual Tweets Scrapper

## Description

## Steps

1. Goto Twitter [Explore](https://twitter.com/explore) section - https://twitter.com/explore

2. Add the snippets `scrape.js` and `auto_scroll.js` in your chrome devtools as snippets under `sources` section.

3. Search for your query, try to use twitter advanced search to filter out spam tweets and to use other filters. Highly recommended.

4. Run the `scrape` snippet.

5. Run the `auto_scroll` snipper.

6. Wait until you feel satisfied with the number of tweets scrapped. You can look at console to see the logs.

7. Once you get rate limited or you search bot, try log the variable `tweets` in the console. You can then right click and choose copy object.

8. In the `data/` folder create a new JSON file and paste your object in there.

9. Now you can merge all the files into one by running `merge.py` script.

10. Run the `run_genai.py` file after entering your Gemini API key in it. This will run through the tweets and create a file `analysed.json` in `results/` directory.

11. Use `preprocess.py` to make sure the results data is in consistent format.