https://github.com/prajwalsrinvas/learn_scrapy

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/prajwalsrinvas/learn_scrapy
Owner: Prajwalsrinvas
Created: 2024-09-06T18:16:21.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-09-06T18:54:19.000Z (9 months ago)
Last Synced: 2024-09-06T22:14:08.551Z (9 months ago)
Language: Python
Size: 332 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Learn Scrapy 🕷️

My practice code while doing this course:
[Scrapy Unleashed: Master Python Web Scraping & Data Pipeline](https://www.udemy.com/course/scrapy-masterclass-python-web-scraping-and-data-pipelines)

## Project Overview

This collection of projects demonstrates various web scraping techniques using Scrapy, a powerful Python framework for extracting data from websites. Each project focuses on different aspects of web scraping, from basic concepts to advanced techniques.

## Projects

1. [Real Estate Scraper](./1_real_estate)
- A basic Scrapy spider for scraping real estate listings.
- Demonstrates fundamental Scrapy concepts and XPath usage.

2. [Quotes Login](./2_quotes_login)
- Scraper that handles website login before scraping quotes.
- Showcases how to deal with authentication in web scraping.

3. [Naukri Job Scraper](./3_naukri)
- Scrapes job listings from Naukri.com.
- Demonstrates handling of AJAX requests in web scraping.

4. [Free Images Downloader](./4_free_images)
- Spider for downloading free images from a stock photo website.
- Illustrates image harvesting and storage techniques.

5. [Classifieds Scraper](./5_classifieds)
- Scrapes classified ads from a website.
- Focuses on data transformation using Scrapy Pipelines.

6. [Phone Models Scraper](./6_phone_models)
- Scrapes information about various phone models.
- Implements rate limiting and other middleware techniques to avoid bans.

7. [HTTPbin Tester](./7_httpbin)
- A project for testing various HTTP scenarios using httpbin.org.
- Useful for understanding HTTP interactions in web scraping.

8. [Wikipedia Scraper](./8_wikipedia)
- Scrapes data from Wikipedia pages.
- Demonstrates techniques for handling large-scale scraping projects.

9. [Splash Quotes Scraper](./9_splash_quotes)
- Uses Splash to scrape a JavaScript-rendered quotes website.
- Shows integration of Scrapy with Splash for handling dynamic content.

10. [Medium Article Scraper](./10_medium)
- Scrapes articles from Medium.com using Selenium.
- Demonstrates handling of infinitely scrolling pages.

11. [Yahoo Finance Scraper](./11_yahoofinance)
- Scrapes financial data from Yahoo Finance.
- Showcases advanced Selenium usage for interacting with web elements.

## Skills Demonstrated

- Basic and advanced Scrapy usage
- XPath and CSS selectors for data extraction
- Handling website login and authentication
- Working with AJAX requests
- Image downloading and processing
- Data transformation and cleaning
- Rate limiting and ban avoidance techniques
- Integration with Splash for JavaScript rendering
- Using Selenium for browser automation
- Scraping infinitely scrolling pages
- Extracting data from complex financial websites

## Certificate

![certificate](https://udemy-certificate.s3.amazonaws.com/image/UC-bfbfce76-bb95-4ed1-8b74-de212e142318.jpg?v=1705862980000)

Note: All websites used for educational purposes in this course

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prajwalsrinvas/learn_scrapy

Awesome Lists containing this project

README