Projects in Awesome Lists tagged with python-crawler

https://github.com/xishandong/crawlproject

python爬虫项目合集，从基础到js逆向，包含基础篇、自动化篇、进阶篇以及验证码篇。案例涵盖各大网站(xhs douyin weibo ins boss job，jd...)，你将会学到有关爬虫以及反爬虫、自动化和验证码的各方面知识

captcha ddddocr javascript playwright python python-crawler reverse-engineering

Last synced: 06 Apr 2025

https://github.com/zhuozhuocrayon/pythoncrawler

python3网络爬虫笔记与实战源码。记录python爬虫学习全程笔记、参考资料和常见错误，约40个爬取实例与思路解析，涵盖urllib、requests、bs4、jsonpath、re、 pytesseract、PIL等常用库的使用。

python-crawler python3

Last synced: 13 Apr 2025

https://github.com/elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

crawler python python-crawler spider web-crawler web-spider

Last synced: 13 Apr 2025

https://github.com/ityouknow/python-crawler

Python Crawler

crawler python python-crawler

Last synced: 04 Jul 2025

https://github.com/thewebscraping/tls-requests

TLS Requests is a powerful Python library for secure HTTP requests, offering browser-like TLS client, fingerprinting, anti-bot page bypass, and high performance.

anti-bot anti-bot-detection anti-bot-page cf-clearance cloudflare-bypass cloudflare-scraper crawling-python python-crawler python-scraper python-spider python-tls-client python-web-crawler python-web-scraper python-web-scraping scraping-python tls-client web-crawler-python web-scraping-api web-scraping-python web-spider

Last synced: 24 Jan 2026

https://github.com/taseikyo/crawler

:snake:A collection of simple Python crawlers.

baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku

Last synced: 19 Oct 2025

https://github.com/omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 23 Apr 2025

https://github.com/superbrucejia/dynamic-web-crawlering-python

This repo is mainly for dynamic web (Ajax Tech) crawling using Python, taking China's NSTL websites as an example.

dynamic-web-crawler dynamic-website nstl python python-crawler web-crawler-python web-crawling

Last synced: 21 Apr 2025

https://github.com/password123456/huntr-com-bug-bounties-collector

keep watching new bug bounty (vulnerability) postings.

bug-bounty bug-bounty-crawling chrome-webdriver huntr python-crawler selenium-python

Last synced: 16 Apr 2025

https://github.com/xishandong/weibo_crawler

支持多种爬取方式，下载用户相册，爬取用户帖子，爬取实时搜索帖子等，欢迎下载使用和补充功能

python-crawler weibo weibo-spider

Last synced: 09 Apr 2025

https://github.com/imarvinle/douban_movie_crawler

豆瓣电影爬虫: 电影信息 + 影评 + 短评

douban douban-crawler douban-movie douban-movie-spider python-crawler python3 spider

Last synced: 28 Jul 2025

https://github.com/charles-hsiao/python-flightradar

Python airline/flights data crawler

airlines crawler flightradar flightradar24 flights python python-crawler python3

Last synced: 08 Jul 2025

https://github.com/xishandong/data_visualization

a simple web of data visualization

data-visualization flask flask-sqlalchemy python-crawler

Last synced: 23 Apr 2025

https://github.com/basemax/stackoverflowcrawler

A web crawler which crawls the stackoverflow website.

crawler crawler-detector crawler-python crawler-testing crawlers crawling python-crawler stackoverflow stackoverflow-analyse stackoverflow-answer stackoverflow-api stackoverflow-crawler stackoverflow-get stackoverflow-questions stackoverflow-tags test-crawler text-processing text-processor web-crawler web-crawler-python

Last synced: 15 Sep 2025

https://github.com/eugen1j/aioscrapy

Python asynchronous library for web scrapping

asyncio crawler python-crawler python37 webscraper

Last synced: 09 Oct 2025

https://github.com/xishandong/music_player

基于tkinter的音乐播放器

python-crawler tkinter tkinter-python wangyiyunmusic

Last synced: 12 Sep 2025

https://github.com/omkarcloud/web-scraping-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 24 Oct 2025

https://github.com/oldkingcone/pbandj

PasteBin Crawler, crawls the url https://pastebin.com/archive

crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver

Last synced: 26 Sep 2025

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 14 Apr 2025

https://github.com/zebbern/reconx

🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!

crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security

Last synced: 03 Jul 2025

https://github.com/basemax/jadi-net-blog

This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.

blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp

Last synced: 13 Oct 2025

https://github.com/iampukar/url_crawler

A Python library to crawl the details of a URL.

page-crawler python-crawler python-webcrawler url-crawler webpage-crawler

Last synced: 12 Apr 2025

https://github.com/yjg30737/onepiece-database

Watching One Piece characters info in ONE PIECE WIKI(FANDOM) with PyQt GUI

anime database fandom one-piece onepiece onepiece-database onepiece-db pandas pyqt pyqt-examples pyqt-tutorial pyqt-web-crawler pyqt5 pyqt5-gui python python-crawler scrapy wiki-crawler wikia

Last synced: 07 Apr 2026

https://github.com/xishandong/music_web

A simple Web system of music

flask flask-sqlalchemy python-crawler wangyiyunmusic

Last synced: 10 Mar 2026

https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

crawler python-crawler python-scraper python27 scrapper

Last synced: 12 Jun 2025

https://github.com/basemax/my-site-url-finders

A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.

crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder

Last synced: 15 Oct 2025

https://github.com/simonpierreboucher/crawler

A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.

concurrent-crawling content-extraction data-collection data-extraction-pipeline data-preservation-and-recovery data-scraping error-handling html-parsing http-requests metadata-storage modular-design pdf-text-extraction python-crawler rate-limiting structured-data-storage text-processing url-normalization web-crawling yaml-configuration