https://github.com/muneeb1030/webscrapper_politifact
This initiative seeks to extract and analyze fact-checking data from Politifact.com, providing valuable insights into political statements, rulings, and the evolving information landscape.
https://github.com/muneeb1030/webscrapper_politifact
data data-collection dataanalysis python3 scrapy scrapy-spider webscraping
Last synced: 3 months ago
JSON representation
This initiative seeks to extract and analyze fact-checking data from Politifact.com, providing valuable insights into political statements, rulings, and the evolving information landscape.
- Host: GitHub
- URL: https://github.com/muneeb1030/webscrapper_politifact
- Owner: Muneeb1030
- Created: 2024-02-06T19:34:09.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-28T09:23:57.000Z (10 months ago)
- Last Synced: 2025-01-11T14:48:39.961Z (4 months ago)
- Topics: data, data-collection, dataanalysis, python3, scrapy, scrapy-spider, webscraping
- Language: Python
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Politifact Web Scraping Project
## Overview
Unveiling the intricacies of political discourse, the Politifact Web Scraping Project is a Python-powered endeavor utilizing the Scrapy framework. This initiative seeks to extract and analyze fact-checking data from [Politifact.com](https://politifact.com), providing valuable insights into political statements, rulings, and the evolving information landscape.
## Key Features
1. **Data Extraction:** Scraps author names, saying dates, headlines, rulings, publishers, and article URLs for a comprehensive dataset.
2. **File Management:** Dynamically creates directories for organized storage of scraped data, ensuring a systematic approach from the project's outset.
3. **Image Downloads:** Utilizes Scrapy's image pipeline for downloading header images, enhancing the visual context of each article.
4. **Efficient CSV Handling:** Implements regular write intervals to prevent data loss and alleviate memory burden during asynchronous requests.## Requirements
- **Python 3.x**
- **Scrapy**
- **Requests**
- **Pandas**## Getting Started
1. **Clone the Repository:**
```
git clone https://github.com/Muneeb1030/WebScrapper_Politifact.git
```2. **Install Dependencies:**
```
pip install scrapy pandas requests
```3. **Run the Scraper:**
```
scrapy crawl politifact
```## Additional Information
- **Customization:**
- Tailor the scraper to your needs by modifying the Scrapy spiders.
- **GitHub Repository:**
- Explore, contribute, and stay updated on the [GitHub repository](\https://github.com/Muneeb1030/WebScrapper_Politifact.git).## Disclaimer
This project is intended for educational purposes and strictly adheres to Politifact's terms of service. Users are advised to deploy the scraper responsibly and in compliance with platform policies.## Additional Resources
Explore the project in detail through my [Medium blog](https://medium.com/@m.muneeb.ur.rehman.2000/fact-checking-the-fact-checkers-scraping-politifact-com-for-political-truths-with-pythons-scrapy-fcfa42f5bcf2), where I share insights, motivation, and in-depth explanations about the Politifact Scraper.
## Contributors
- M Muneeb ur RehmanFeel free to fork, contribute, and enhance the capabilities of this Politifact scraper. Happy scraping! 🌐💻