https://github.com/relostar-devil/web-scraping-using-python
Wikipedia Web Scraper: Python script to extract and display Asian countries by area. Demonstrates web scraping techniques using Beautiful Soup and Pandas.
https://github.com/relostar-devil/web-scraping-using-python
beautifulsoup4 pandas python
Last synced: about 2 months ago
JSON representation
Wikipedia Web Scraper: Python script to extract and display Asian countries by area. Demonstrates web scraping techniques using Beautiful Soup and Pandas.
- Host: GitHub
- URL: https://github.com/relostar-devil/web-scraping-using-python
- Owner: Relostar-Devil
- Created: 2025-02-11T07:09:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-11T07:15:04.000Z (over 1 year ago)
- Last Synced: 2025-04-05T18:13:17.629Z (about 1 year ago)
- Topics: beautifulsoup4, pandas, python
- Language: Jupyter Notebook
- Homepage:
- Size: 128 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Scraping Top Asian Countries by Area from Wikipedia
## Overview
This repository contains a Python script that performs web scraping to extract a list of Asian countries by area from a Wikipedia page. The script retrieves the data, cleans and formats it, and presents the top countries in a user-friendly format.
## Project Goals
* Scrape data of Asian countries area from the Wikipedia page.
* Clean and process the extracted data.
* Present the data in a clear and organized manner.
## Code Description
The Python script `Web-Scraping-Using-Python.ipynb` uses the following libraries:
* **requests:** To send HTTP requests to the Wikipedia page.
* **Beautiful Soup:** To parse the HTML content and extract the table data.
* **Pandas:** To create and manipulate dataframes for cleaning and analysis.
## Methodology
1. **Send Request:** The script sends an HTTP request to the specified Wikipedia page URL.
2. **Parse HTML:** Beautiful Soup is used to parse the HTML content of the page.
3. **Extract Table:** The script locates the table containing the list of Asian countries by area.
4. **Create DataFrame:** The extracted table data is converted into a Pandas DataFrame.
5. **Data Cleaning:** The data is cleaned by removing unwanted characters, handling missing values, and converting data types.
6. **Display Results:** The cleaned and processed data is displayed in a tabular format.
## Usage
1. Make sure you have Python installed.
2. Install the required libraries:
```
pip install requests beautifulsoup4 pandas
```
3. Run the script:
```
jupyter notebook Web-Scraping-Using-Python.ipynb
```
4. Follow the instructions in the Jupyter Notebook to execute the code and view the results.
## Sample Output
The script outputs a list of country names by area present in asia
## Skills Demonstrated
* Web Scraping
* Data Extraction
* HTML Parsing
* Data Cleaning
* Pandas
* Python Programming