{"id":18156341,"url":"https://github.com/thedivtagguy/boycott-scraper","last_synced_at":"2025-04-07T01:52:27.290Z","repository":{"id":240157510,"uuid":"582548799","full_name":"thedivtagguy/boycott-scraper","owner":"thedivtagguy","description":"🔋 Batteries-included scraper and analysis tool for Twitter. Works without Twitter API. No upper-limits and a bunch of nifty tools.","archived":false,"fork":false,"pushed_at":"2022-12-27T18:46:00.000Z","size":8560,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-01T16:13:10.701Z","etag":null,"topics":["scraper","snscrape","twitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thedivtagguy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-27T07:10:07.000Z","updated_at":"2023-04-18T09:58:50.000Z","dependencies_parsed_at":"2024-05-17T03:06:09.226Z","dependency_job_id":null,"html_url":"https://github.com/thedivtagguy/boycott-scraper","commit_stats":null,"previous_names":["thedivtagguy/boycott-scraper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thedivtagguy%2Fboycott-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thedivtagguy%2Fboycott-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thedivtagguy%2Fboycott-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thedivtagguy%2Fboycott-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thedivtagguy","download_url":"https://codeload.github.com/thedivtagguy/boycott-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247578623,"owners_count":20961270,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scraper","snscrape","twitter"],"created_at":"2024-11-02T05:06:09.772Z","updated_at":"2025-04-07T01:52:27.271Z","avatar_url":"https://github.com/thedivtagguy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 😡 Twitter Boycott Tracker\n\nThere's always something to be outraged about. What is it this week? \n\nThis scraper searches for tweets containing the term 'boycott' and returns 1000 results for each week. It also cleans the data and returns a csv file with the following columns:\n\n- `user`: Contains a JSON object with the user's information. Includes the user's name, screen name, location, description, followers, friends, and favourites count.\n- `rawContent`: Contains the raw text of the tweet.\n- `created`: Contains the date and time the tweet was created.\n- `named_entities`: Contains a comma-separated list of named entities found in the tweet.\n- `hashtags`: Contains a comma-separated list of hashtags found in the tweet.\n- `urls`: Contains a comma-separated list of urls found in the tweet.\n- `mentions`: Contains a comma-separated list of mentions found in the tweet.\n- `followers`: Contains the number of followers the user has.\n\n## ✂️ Customization \n\nDon't care about boycotts? I gotchu. This scraper can be easily modified to search for any term you want, no upper limit on the number of results, and can search for tweets from any location.\n\n## 🖨️ Usage\n\nOnce downloaded and set up (installation instructions below), there are two ways to use this scraper. You can either run it from the command line, or use the Python script.\n\n### 📋 Script\n\nTo run the scraper, simply run the following command:\n```bash\n    python3 scraper/runner.py\n```\n\nAll settings, including the term to search for and the number of results to return, are set in the `config.json` file.\n\n```js\n{\n    \"coordinates\": {\n        \"long\": 20.5937, // longitude\n        \"lat\": 78.9629,  // latitude\n        \"radius\": 10000  // radius, in kilometers\n    },\n    \"search_term\": \"boycott\", // term to search for\n    \"clean\": \"true\",          // clean the data? (makes lowercase, removes punctuation, removes stopwords, etc.)\n    \"twitterExtract\": \"true\", // extract named entities, hashtags, urls, and mentions from the tweets? (slows down the process just a bit)\n    \"limit\": 1000,           // number of results to return per search\n    \"analysis\": \"true\",      // perform sentiment analysis on the tweets? (slows down the process just a bit)\n    \"scrape_type\": \"weekly\", // type of search to perform. Options: weekly, fullYear, custom\n    \"year\": 2016,           // year to start searching from, if the above is set to fullYear\n    \"start_date\": \"2016-01-01\", // date to start searching from, if the above is set to custom\n    \"end_date\": \"2016-12-31\"    // date to end searching at, if the above is set to custom\n}\n```\n\n### 🖍️ CLI Access\n\nScript arguments can be passed to the script using the CLI. The following arguments are available:\n\n```js\n    --help            show this help message and exit\n    --search_term     Search term to use\n    --limit           Number of results to return per search\n    --radius          Radius to search around, in kilometers\n    --lat             Latitude to search around\n    --long            Longitude to search around\n    --type            Type of search to perform. Options: weekly, fullYear, custom\n    --year            Year to start searching from, if the above is set to fullYear\n    --start_date      Date to start searching from, if the above is set to custom\n    --end_date        Date to end searching at, if the above is set to custom\n```\n\nAny of these arguments override the settings in the `config.json` file. \n\nFor example, to search around London (51.5074° N, 0.1278° W) for the term 'monty python' and return 100 results per search, run the following command:\n\n```bash\n    python3 scraper/runner.py --search_term \"monty python\" --limit 100 --lat 51.5074 --long -0.1278\n```\n\nIf you want to search for the term 'boycott' every week, starting from the current week, but limit to just 5 results per search, run the following command:\n\n```bash\n    python3 scraper/runner.py --search_term \"boycott\" --limit 5 --type weekly\n```\n\n## 📁 Data\n\nAfter each run, the updates of the week are saved in the `data` folder. For the latest data, see the `data/current_week` folder.\nDetails about the run are logged in `data/logger.json`.\n\nAfter the first run, the old data is saved in the `data/YEAR` folder. The old data is overwritten. The filename is of the format\n    \n        tweets-YEAR-WEEK.csv\n\n## 📦 Installation\n\nThis project uses Python 3.6. Here's how to get it up and running:\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/thedivtagguy/boycott-scraper.git\n```\n2. Change into the repository directory:\n\n```bash\ncd boycott-scraper\n```\n3. (Optional) Create and activate a virtual environment. Can be skipped if you don't want to use a virtual environment.\n\n```bash\n# create a virtual environment\npython -m venv env\n# activate the virtual environment (Linux/MacOS)\nsource env/bin/activate\n# activate the virtual environment (Windows)\nenv/Scripts/activate\n```\n4. Install the required packages:\n```bash\npip install -r requirements.txt\n```\n\nDone! You're ready to go. Refer to the Usage section above for more information.\n\n## 🧠 Contributing\n\nIf you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.\n\n## 🧾 Licensing\n\nThe code in this project is licensed under MIT license.\n\n## 📮 Contact\n\nIf you have any questions, please contact me at [my email](mailto:amanbhargava2001@gmail.com).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthedivtagguy%2Fboycott-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthedivtagguy%2Fboycott-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthedivtagguy%2Fboycott-scraper/lists"}