https://github.com/dms-codes/scrape_dprgoid

Indonesian Parliament Member Data Scraper This Python script is a web scraper designed to extract and collect information about members of the Indonesian Parliament (DPR) from their official website. It utilizes the requests library to fetch web pages, BeautifulSoup for parsing HTML, and writes the collected data to a CSV file.
https://github.com/dms-codes/scrape_dprgoid

beautifulsoup4 parliament python requests webscraper webscraping

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/dms-codes/scrape_dprgoid
Owner: dms-codes
Created: 2023-10-11T16:01:46.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-10-11T16:03:04.000Z (about 2 years ago)
Last Synced: 2025-01-18T21:20:05.036Z (10 months ago)
Topics: beautifulsoup4, parliament, python, requests, webscraper, webscraping
Language: Python
Homepage: https://github.com/dms-codes/scrape_dprgoid
Size: 288 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Indonesian Parliament Member Data Scraper

This Python script is a web scraper designed to extract and collect information about members of the Indonesian Parliament (DPR) from their official website. It utilizes the requests library to fetch web pages, BeautifulSoup for parsing HTML, and writes the collected data to a CSV file.

## Prerequisites

Before running this script, ensure you have the following:

- Python installed on your system.
- The required Python libraries installed, which can be installed using pip:
```bash
pip install requests beautifulsoup4
```

## Usage

1. Clone this repository or download the script.

2. Open the script in a text editor or IDE.

3. Customize the script as needed, such as changing the output filename, headers, or other settings.

4. Run the script using Python:
```bash
python script_name.py
```

## Description

- The script starts by sending an HTTP GET request to the Indonesian Parliament's official member listing page for each province.

- It uses a session for better performance when making multiple requests.

- The `fetch_details` function is used to extract the details of each parliament member from their individual page. It extracts information such as name, email, birthplace, religion, membership number, fraction, electoral district, and various biographical data.

- The script iterates through the pages for each province and scrapes data for all parliament members.

- The collected data is written to a CSV file named 'data_anggota_dpr.csv' with predefined field names.

## Customization

Customize the script by adjusting the following variables:

- `BASE_URL`: The base URL of the Indonesian Parliament's member listing page.
- `TIMEOUT`: The timeout value for HTTP requests.
- `HEADERS`: The user-agent header for HTTP requests.
- The structure of the CSV output, including the field names, can be adjusted in the `fieldnames` list.

## License

This code is provided under the MIT License. You can find the full license details in the `LICENSE` file.

## Disclaimer

This web scraping script is intended for educational and personal use. Ensure that you respect the website's terms of service and privacy policy. Unauthorized scraping may be against the website's terms of use. Always comply with copyright, privacy, and website usage regulations.

Feel free to check out, use, and provide feedback on this scraper. Happy scraping! 🚀 #Python #WebScraping #DataCollection #Parliament #Indonesia #GitHub

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dms-codes/scrape_dprgoid

Awesome Lists containing this project

README