Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nisch-mhrzn/scraping
This project scrapes data from Wikipedia about the largest U.S. companies by revenue using Python's requests and BeautifulSoup libraries.
https://github.com/nisch-mhrzn/scraping
beautifulsoup python requests webscrapping
Last synced: 19 days ago
JSON representation
This project scrapes data from Wikipedia about the largest U.S. companies by revenue using Python's requests and BeautifulSoup libraries.
- Host: GitHub
- URL: https://github.com/nisch-mhrzn/scraping
- Owner: nisch-mhrzn
- Created: 2024-12-04T12:37:57.000Z (29 days ago)
- Default Branch: main
- Last Pushed: 2024-12-04T14:37:57.000Z (29 days ago)
- Last Synced: 2024-12-04T15:34:37.570Z (29 days ago)
- Topics: beautifulsoup, python, requests, webscrapping
- Language: Jupyter Notebook
- Homepage:
- Size: 43 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Scraping Project:
Largest Companies in the United States by Revenue## Overview
This project involves scraping data from the Wikipedia page listing the largest companies in the United States by revenue. The data can be used for analysis, visualization, or other purposes.## Requirements
To run this project, you'll need the following Python libraries:
- `requests`
- `beautifulsoup4`You can install these libraries using pip:
```bash
pip install requests beautifulsoup4
```## Usage
1. **Clone the repository** (if applicable):
```bash
git clone https://github.com/nisch-mhrzn/Scraping.git
cd Scraping
```
2. **Data Extraction**:
The script fetches the HTML content from the specified Wikipedia URL and parses it to extract relevant information about the largest companies.## Code Explanation
The main components of the script include:- **Fetching the page**:
```python
url = 'https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue'
page = requests.get(url)
```- **Parsing the HTML**:
```python
soup = BeautifulSoup(page.text, 'html')
```