{"id":17726536,"url":"https://github.com/nullqwertyuiop/tweet-crawler","last_synced_at":"2025-06-19T08:37:15.191Z","repository":{"id":258672469,"uuid":"871138888","full_name":"nullqwertyuiop/tweet-crawler","owner":"nullqwertyuiop","description":"Python tool using Playwright to intercept Twitter responses and parse tweets.","archived":false,"fork":false,"pushed_at":"2024-10-19T14:29:07.000Z","size":39,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-21T04:23:53.380Z","etag":null,"topics":["playwright","playwright-python","twitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nullqwertyuiop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-11T10:53:17.000Z","updated_at":"2024-10-19T14:29:11.000Z","dependencies_parsed_at":"2024-10-20T03:52:45.802Z","dependency_job_id":null,"html_url":"https://github.com/nullqwertyuiop/tweet-crawler","commit_stats":null,"previous_names":["nullqwertyuiop/tweet-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nullqwertyuiop/tweet-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullqwertyuiop%2Ftweet-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullqwertyuiop%2Ftweet-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullqwertyuiop%2Ftweet-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullqwertyuiop%2Ftweet-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nullqwertyuiop","download_url":"https://codeload.github.com/nullqwertyuiop/tweet-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullqwertyuiop%2Ftweet-crawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260717270,"owners_count":23051624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["playwright","playwright-python","twitter"],"created_at":"2024-10-25T17:05:43.673Z","updated_at":"2025-06-19T08:37:10.173Z","avatar_url":"https://github.com/nullqwertyuiop.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tweet Crawler\n\n![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)\n![Imports](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)\n![GitHub License](https://img.shields.io/github/license/nullqwertyuiop/tweet-crawler)\n![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fnullqwertyuiop%2Ftweet-crawler%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)\n[![Codecov](https://img.shields.io/codecov/c/github/nullqwertyuiop/tweet-crawler)](https://codecov.io/gh/nullqwertyuiop/tweet-crawler)\n\nTweet Crawler is a Python-based web scraping tool that leverages Playwright to intercept responses from Twitter and parse them into manipulable dataclasses. This project allows users to extract comprehensive tweet data either in guest mode or authenticated mode (via cookies).\n\n## Features\n\n- **Guest Mode**:\n    - Fetch basic tweet details from a given status link without authentication.\n    - Extract data such as tweet content, user details, media, and reaction statistics.\n\n- **Authenticated Mode** (requires cookies):\n    - Access additional tweet details including reply threads.\n    - Provides a more extensive dataset by using user-specific cookie information.\n\n## Installation\n\n### Install as a VCS Dependency\n\nTweet Crawler can be installed as a VCS dependency in your project.\n\nHere is how you can add it to your project using [PDM](https://pdm-project.org/):\n\n1. **Install Dependencies**\n\n   Ensure you have Python (version 3.10 or higher) installed and [PDM](https://pdm-project.org/). Then, run:\n\n   ```bash\n   pdm add \"git+https://github.com/nullqwertyuiop/tweet-crawler.git@main\"\n   ```\n\n2. **Set Up Playwright**\n\n   Initialize Playwright by running:\n\n   ```bash\n   pdm run playwright install\n   ```\n\n### Clone directly from GitHub\n\n1. **Clone the Repository**\n\n   ```bash\n   git clone https://github.com/nullqwertyuiop/tweet-crawler.git\n   cd tweet-crawler\n   ```\n\n2. **Install Dependencies**\n\n   Ensure you have Python (version 3.10 or higher) installed and [PDM](https://pdm-project.org/). Then, run:\n\n   ```bash\n   pdm install\n   ```\n\n3. **Set Up Playwright**\n\n   Initialize Playwright by running:\n\n   ```bash\n   pdm run playwright install\n   ```\n\n## Usage\n\n### Spinning Up an Async Playwright Instance\n\nTweet Crawler needs an instance of async playwright to interact with the browser.\n\nHere's an example of how to create one:\n\n```python\nfrom playwright.async_api import async_playwright\n\nurl: str = ...  # URL of the tweet to crawl\n\nasync with async_playwright() as p:\n    browser = await p.chromium.launch()\n    context = await browser.new_context()\n    page = await browser.new_page()\n    crawler = TwitterStatusCrawler(page, url)\n```\n\n### Running in Guest Mode\n\nTo crawl tweets as a guest (without replies), simply run:\n\n```python\nawait crawler.run()\n```\n\n### Running with Cookies\n\nFor fetching replies and extended information, you need to provide your Twitter cookies.\n\nHere shows an example of how to add cookies to the crawler from environment variables:\n\n\u003e [!CAUTION]\n\u003e Never hardcode your cookies directly in the code. Doing so can expose your sensitive information.\n\u003e Use environment variables or a secure method to store them.\n\n```python\ncontext: BrowserContext\n\nawait context.add_cookies(\n    [\n        {\n            \"name\": \"auth_token\",\n            \"value\": os.environ[\"AUTH_TOKEN\"],\n            \"domain\": \".x.com\",\n            \"path\": \"/\",\n            \"expires\": float(os.environ[\"AUTH_TOKEN_EXPIRES\"]),\n            \"httpOnly\": True,\n            \"sameSite\": \"None\",\n            \"secure\": True,\n        },\n        {\n            \"name\": \"ct0\",\n            \"value\": os.environ[\"CT0\"],\n            \"domain\": \".x.com\",\n            \"path\": \"/\",\n            \"expires\": float(os.environ[\"CT0_EXPIRES\"]),\n            \"httpOnly\": False,\n            \"sameSite\": \"Lax\",\n            \"secure\": True,\n        },\n    ]\n)\n```\n\nThen, you can run the crawler as usual:\n\n```python\nawait crawler.run()\n```\n\n## Data Output\n\nThe data is parsed into Python dataclasses for easy handling and manipulation. The following information can be extracted:\n\n- **Tweet Content**: The text of the tweet.\n- **User Information**: Username and profile details of the tweet author.\n- **Media**: Links to any media (images, videos, etc.) included in the tweet.\n- **Statistics**: Number of likes, retweets, and other reaction metrics.\n- **Replies**: (Authenticated mode only) Full threads of replies to the tweet.\n\n## Contributing\n\nContributions are welcome! Feel free to open issues or submit pull requests with improvements. For major changes, please open an issue first to discuss what you would like to change.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/YourFeature`)\n3. Commit your Changes (`git commit -m 'Add some feature'`)\n4. Push to the Branch (`git push origin feature/YourFeature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Disclaimer\n\nThis tool is intended for educational and research purposes only. Please ensure you comply with Twitter's terms of service and any applicable laws before using this tool to scrape data from their platform. Use responsibly.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnullqwertyuiop%2Ftweet-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnullqwertyuiop%2Ftweet-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnullqwertyuiop%2Ftweet-crawler/lists"}