An open API service indexing awesome lists of open source software.

https://github.com/chathumiamarasinghe/web-scraping

A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.
https://github.com/chathumiamarasinghe/web-scraping

beautifulsoup csv-export dataextraction phyton pythonwebscraper webscraping

Last synced: 6 months ago
JSON representation

A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.

Awesome Lists containing this project

README

          

# Academic Staff Scraper

This Python script scrapes academic staff information from the Faculty of Science, University of Kelaniya's website, specifically the staff details page. The script retrieves each staff member's name, position, room number, phone, fax, email, and specialization (if available) and exports the information into a CSV file.

## Prerequisites

Make sure you have the following Python packages installed before running the script:

- `requests`: For sending HTTP requests to fetch the webpage.
- `beautifulsoup4`: For parsing the HTML content of the webpage.
- `csv`: For writing the extracted data to a CSV file.

You can install the required packages using `pip`:
## How It Works

- **Extract Data from URL**: The script sends a request to the webpage containing the academic staff details.
- **Parse HTML**: It uses BeautifulSoup to parse the HTML and identify the relevant sections for staff data.
- **Retrieve Staff Information**: For each academic staff member, the script extracts:
- Name
- Position
- Room number
- Phone number
- Fax
- Email
- Specialization (scraped from a link if available)
- **CSV Output**: The data is written to a CSV file named `academic_staff.csv`.

## Example Output

-


Name
Position
Room
Phone
Fax
Email
Specialization




Prof.Janaka Wijanayake
Professorr
Room 201
011-2233445
011-2233446
janaka@stu.kln.ac.lk
Computer Science


Dr. Thilini Mahanama
Senior Lecture
Room 202
011-1234567
Not available
thilinie@uni.lk
Physics

## Usage

1. Clone or download the repository containing this script.
2. Make sure you have Python installed on your system.
3. Install the required Python libraries using the following command:
```bash
pip install requests beautifulsoup4