https://github.com/dms-codes/scrape-stei-itb-ac-id

Web Scraping with Python This Python script performs web scraping on a website to extract links, emails, and WhatsApp links from the specified domain (stei.itb.ac.id). It uses the requests library to fetch web pages and BeautifulSoup for parsing HTML content.
https://github.com/dms-codes/scrape-stei-itb-ac-id

itb python stei webscrape webscraper

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/dms-codes/scrape-stei-itb-ac-id
Owner: dms-codes
Created: 2022-10-24T08:54:54.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-10-01T10:26:59.000Z (almost 3 years ago)
Last Synced: 2025-12-26T05:02:08.497Z (7 months ago)
Topics: itb, python, stei, webscrape, webscraper
Language: Python
Homepage: https://github.com/dms-codes/scrape-stei-itb-ac-id
Size: 6.84 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web Scraping with Python

This Python script performs web scraping on a website to extract links, emails, and WhatsApp links from the specified domain (stei.itb.ac.id). It uses the `requests` library to fetch web pages and `BeautifulSoup` for parsing HTML content.

## Usage

1. Ensure you have the required libraries installed:

```bash
pip install requests beautifulsoup4
```

2. Modify the script to specify the target domain (`DOMAIN`), home URL (`HOME_URL`), and other settings as needed.

3. Run the script:

```bash
python script.py
```

4. The script will perform the following actions:

- Visit the home URL (`HOME_URL`) and extract all links from the specified domain (`DOMAIN`).
- Collect email addresses (`mailto:` links) and WhatsApp links (`api.whatsapp.com`).
- Save the extracted data to separate log files (`scrape-links-stei.log`, `scrape-email-stei.log`, `scrape-whatsapp-stei.log`).

5. The script will recursively follow links within the specified domain to gather additional URLs.

6. The extracted links, emails, and WhatsApp links will be saved in their respective log files.

## Customization

- You can modify the `HOME_URL`, `DOMAIN`, `TIMEOUT`, or other settings in the script to target different websites or adjust the scraping behavior.

- To specify a different starting URL, change the value of `HOME_URL` in the script.

## Output

- Extracted links from the specified domain are saved in `scrape-links-stei.log`.
- Extracted email addresses are saved in `scrape-email-stei.log`.
- Extracted WhatsApp links are saved in `scrape-whatsapp-stei.log`.

## License

This script is provided under the [MIT License](LICENSE).
```

Please adapt the script and README.md to your specific use case or requirements.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dms-codes/scrape-stei-itb-ac-id

Awesome Lists containing this project

README