https://github.com/prajwalsrinvas/learn_scrapy
https://github.com/prajwalsrinvas/learn_scrapy
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/prajwalsrinvas/learn_scrapy
- Owner: Prajwalsrinvas
- Created: 2024-09-06T18:16:21.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T18:54:19.000Z (9 months ago)
- Last Synced: 2024-09-06T22:14:08.551Z (9 months ago)
- Language: Python
- Size: 332 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Learn Scrapy 🕷️
My practice code while doing this course:
[Scrapy Unleashed: Master Python Web Scraping & Data Pipeline](https://www.udemy.com/course/scrapy-masterclass-python-web-scraping-and-data-pipelines)## Project Overview
This collection of projects demonstrates various web scraping techniques using Scrapy, a powerful Python framework for extracting data from websites. Each project focuses on different aspects of web scraping, from basic concepts to advanced techniques.
## Projects
1. [Real Estate Scraper](./1_real_estate)
- A basic Scrapy spider for scraping real estate listings.
- Demonstrates fundamental Scrapy concepts and XPath usage.2. [Quotes Login](./2_quotes_login)
- Scraper that handles website login before scraping quotes.
- Showcases how to deal with authentication in web scraping.3. [Naukri Job Scraper](./3_naukri)
- Scrapes job listings from Naukri.com.
- Demonstrates handling of AJAX requests in web scraping.4. [Free Images Downloader](./4_free_images)
- Spider for downloading free images from a stock photo website.
- Illustrates image harvesting and storage techniques.5. [Classifieds Scraper](./5_classifieds)
- Scrapes classified ads from a website.
- Focuses on data transformation using Scrapy Pipelines.6. [Phone Models Scraper](./6_phone_models)
- Scrapes information about various phone models.
- Implements rate limiting and other middleware techniques to avoid bans.7. [HTTPbin Tester](./7_httpbin)
- A project for testing various HTTP scenarios using httpbin.org.
- Useful for understanding HTTP interactions in web scraping.8. [Wikipedia Scraper](./8_wikipedia)
- Scrapes data from Wikipedia pages.
- Demonstrates techniques for handling large-scale scraping projects.9. [Splash Quotes Scraper](./9_splash_quotes)
- Uses Splash to scrape a JavaScript-rendered quotes website.
- Shows integration of Scrapy with Splash for handling dynamic content.10. [Medium Article Scraper](./10_medium)
- Scrapes articles from Medium.com using Selenium.
- Demonstrates handling of infinitely scrolling pages.11. [Yahoo Finance Scraper](./11_yahoofinance)
- Scrapes financial data from Yahoo Finance.
- Showcases advanced Selenium usage for interacting with web elements.## Skills Demonstrated
- Basic and advanced Scrapy usage
- XPath and CSS selectors for data extraction
- Handling website login and authentication
- Working with AJAX requests
- Image downloading and processing
- Data transformation and cleaning
- Rate limiting and ban avoidance techniques
- Integration with Splash for JavaScript rendering
- Using Selenium for browser automation
- Scraping infinitely scrolling pages
- Extracting data from complex financial websites## Certificate

Note: All websites used for educational purposes in this course