An open API service indexing awesome lists of open source software.

https://github.com/ankman007/web-scraping-practice


https://github.com/ankman007/web-scraping-practice

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

        

# Web Scraping Practice Resources

This repository contains a collection of websites that are ideal for practicing web scraping skills. These websites cover various categories, allowing you to build your skills progressively. Below are the website categories, examples, and links to get started.

---

## Table of Contents
1. [Books to Scrape](#books-to-scrape)
2. [Quotes to Scrape](#quotes-to-scrape)
3. [Real Python](#real-python)
4. [IMDb](#imdb)
5. [Indeed](#indeed)
6. [Weather.com](#weathercom)
7. [Wikipedia](#wikipedia)
8. [News Websites](#news-websites)
9. [GitHub](#github)
10. [E-commerce Websites](#e-commerce-websites)

---

## 1. Books to Scrape
- **Website**: [Books to Scrape](http://books.toscrape.com/)
- **Description**: A simple website for practicing scraping book details, including titles, authors, prices, and ratings.

## 2. Quotes to Scrape
- **Website**: [Quotes to Scrape](http://quotes.toscrape.com/)
- **Description**: A website dedicated to quotes, where you can scrape quotes, authors, and tags. It also has multiple pages to practice pagination.

## 3. Real Python
- **Website**: [Real Python](https://realpython.com/)
- **Description**: A Python-focused blog that provides tutorials and articles. You can scrape articles, titles, and links for practice.

## 4. IMDb
- **Website**: [IMDb](https://www.imdb.com/)
- **Description**: A popular website with a vast database of movies and TV shows. Practice scraping movie titles, ratings, reviews, and actor details.

## 5. Indeed
- **Website**: [Indeed](https://www.indeed.com/)
- **Description**: A job listing website. You can scrape job titles, companies, locations, and descriptions. Pagination is available for large data sets.

## 6. Weather.com
- **Website**: [Weather.com](https://weather.com/)
- **Description**: Offers weather data like current conditions, forecasts, and more. You can practice scraping weather reports for different locations.

## 7. Wikipedia
- **Website**: [Wikipedia](https://www.wikipedia.org/)
- **Description**: The world’s largest online encyclopedia, perfect for scraping data such as text, tables, and infoboxes.

## 8. News Websites
- **Website**: [BBC News](https://www.bbc.com/news)
- **Description**: A dynamic news website. Scrape headlines, article content, and images from multiple news articles.
- **Other Examples**: [CNN](https://edition.cnn.com/), [The New York Times](https://www.nytimes.com/)

## 9. GitHub
- **Website**: [GitHub](https://github.com/)
- **Description**: A platform for developers to share code. You can scrape data about repositories, users, stars, and forks.

## 10. E-commerce Websites
- **Website**: [Amazon](https://www.amazon.com/)
- **Description**: A huge e-commerce website. Scrape product data like prices, ratings, and reviews from product pages.
- **Other Examples**: [eBay](https://www.ebay.com/), [Walmart](https://www.walmart.com/)

---

## Summary of Website Examples:
1. **Books to Scrape**: [books.toscrape.com](http://books.toscrape.com/)
2. **Quotes to Scrape**: [quotes.toscrape.com](http://quotes.toscrape.com/)
3. **Real Python**: [realpython.com](https://realpython.com/)
4. **IMDb**: [imdb.com](https://www.imdb.com/)
5. **Indeed**: [indeed.com](https://www.indeed.com/)
6. **Weather.com**: [weather.com](https://weather.com/)
7. **Wikipedia**: [wikipedia.org](https://www.wikipedia.org/)
8. **News Websites**: [bbc.com/news](https://www.bbc.com/news), [cnn.com](https://edition.cnn.com/)
9. **GitHub**: [github.com](https://github.com/)
10. **Amazon**: [amazon.com](https://www.amazon.com/)

---

### Conclusion:
By practicing on these websites, you will gain hands-on experience with different types of data extraction techniques. Start with simpler static websites and gradually move to more complex dynamic sites as you improve your scraping skills.