https://github.com/chathumiamarasinghe/web-scraping
A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.
https://github.com/chathumiamarasinghe/web-scraping
beautifulsoup csv-export dataextraction phyton pythonwebscraper webscraping
Last synced: 6 months ago
JSON representation
A versatile Python script for scraping data from websites. This script automates data extraction, processes the information, and saves it in a structured format like CSV. Ideal for data collection, research, and analysis tasks.
- Host: GitHub
- URL: https://github.com/chathumiamarasinghe/web-scraping
- Owner: chathumiamarasinghe
- Created: 2024-09-16T18:57:48.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-17T02:46:47.000Z (about 1 year ago)
- Last Synced: 2025-04-12T19:07:41.629Z (6 months ago)
- Topics: beautifulsoup, csv-export, dataextraction, phyton, pythonwebscraper, webscraping
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Academic Staff Scraper
This Python script scrapes academic staff information from the Faculty of Science, University of Kelaniya's website, specifically the staff details page. The script retrieves each staff member's name, position, room number, phone, fax, email, and specialization (if available) and exports the information into a CSV file.
## Prerequisites
Make sure you have the following Python packages installed before running the script:
- `requests`: For sending HTTP requests to fetch the webpage.
- `beautifulsoup4`: For parsing the HTML content of the webpage.
- `csv`: For writing the extracted data to a CSV file.You can install the required packages using `pip`:
## How It Works- **Extract Data from URL**: The script sends a request to the webpage containing the academic staff details.
- **Parse HTML**: It uses BeautifulSoup to parse the HTML and identify the relevant sections for staff data.
- **Retrieve Staff Information**: For each academic staff member, the script extracts:
- Name
- Position
- Room number
- Phone number
- Fax
- Specialization (scraped from a link if available)
- **CSV Output**: The data is written to a CSV file named `academic_staff.csv`.## Example Output
-
Name
Position
Room
Phone
Fax
Specialization
Prof.Janaka Wijanayake
Professorr
Room 201
011-2233445
011-2233446
janaka@stu.kln.ac.lk
Computer Science
Dr. Thilini Mahanama
Senior Lecture
Room 202
011-1234567
Not available
thilinie@uni.lk
Physics
## Usage
1. Clone or download the repository containing this script.
2. Make sure you have Python installed on your system.
3. Install the required Python libraries using the following command:
```bash
pip install requests beautifulsoup4