Projects in Awesome Lists tagged with python-crawler
A curated list of projects in awesome lists tagged with python-crawler .
https://github.com/xishandong/crawlproject
python爬虫项目合集,从基础到js逆向,包含基础篇、自动化篇、进阶篇以及验证码篇。案例涵盖各大网站(xhs douyin weibo ins boss job,jd...),你将会学到有关爬虫以及反爬虫、自动化和验证码的各方面知识
captcha ddddocr javascript playwright python python-crawler reverse-engineering
Last synced: 06 Apr 2025
https://github.com/zhuozhuocrayon/pythoncrawler
python3网络爬虫笔记与实战源码。记录python爬虫学习全程笔记、参考资料和常见错误,约40个爬取实例与思路解析,涵盖urllib、requests、bs4、jsonpath、re、 pytesseract、PIL等常用库的使用。
Last synced: 13 Apr 2025
https://github.com/elliotxx/zhihu-crawler-people
A simple distributed crawler for zhihu && data analysis
crawler python python-crawler spider web-crawler web-spider
Last synced: 13 Apr 2025
https://github.com/thewebscraping/tls-requests
TLS Requests is a powerful Python library for secure HTTP requests, offering browser-like TLS client, fingerprinting, anti-bot page bypass, and high performance.
anti-bot anti-bot-detection anti-bot-page cf-clearance cloudflare-bypass cloudflare-scraper crawling-python python-crawler python-scraper python-spider python-tls-client python-web-crawler python-web-scraper python-web-scraping scraping-python tls-client web-crawler-python web-scraping-api web-scraping-python web-spider
Last synced: 09 Apr 2025
https://github.com/taseikyo/crawler
:snake:A collection of simple Python crawlers.
baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku
Last synced: 19 Oct 2025
https://github.com/omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 23 Apr 2025
https://github.com/superbrucejia/dynamic-web-crawlering-python
This repo is mainly for dynamic web (Ajax Tech) crawling using Python, taking China's NSTL websites as an example.
dynamic-web-crawler dynamic-website nstl python python-crawler web-crawler-python web-crawling
Last synced: 21 Apr 2025
https://github.com/password123456/huntr-com-bug-bounties-collector
keep watching new bug bounty (vulnerability) postings.
bug-bounty bug-bounty-crawling chrome-webdriver huntr python-crawler selenium-python
Last synced: 16 Apr 2025
https://github.com/xishandong/weibo_crawler
支持多种爬取方式,下载用户相册,爬取用户帖子,爬取实时搜索帖子等,欢迎下载使用和补充功能
python-crawler weibo weibo-spider
Last synced: 09 Apr 2025
https://github.com/imarvinle/douban_movie_crawler
豆瓣电影爬虫: 电影信息 + 影评 + 短评
douban douban-crawler douban-movie douban-movie-spider python-crawler python3 spider
Last synced: 28 Jul 2025
https://github.com/charles-hsiao/python-flightradar
Python airline/flights data crawler
airlines crawler flightradar flightradar24 flights python python-crawler python3
Last synced: 08 Jul 2025
https://github.com/xishandong/data_visualization
a simple web of data visualization
data-visualization flask flask-sqlalchemy python-crawler
Last synced: 23 Apr 2025
https://github.com/eugen1j/aioscrapy
Python asynchronous library for web scrapping
asyncio crawler python-crawler python37 webscraper
Last synced: 09 Oct 2025
https://github.com/basemax/stackoverflowcrawler
A web crawler which crawls the stackoverflow website.
crawler crawler-detector crawler-python crawler-testing crawlers crawling python-crawler stackoverflow stackoverflow-analyse stackoverflow-answer stackoverflow-api stackoverflow-crawler stackoverflow-get stackoverflow-questions stackoverflow-tags test-crawler text-processing text-processor web-crawler web-crawler-python
Last synced: 15 Sep 2025
https://github.com/xishandong/music_player
基于tkinter的音乐播放器
python-crawler tkinter tkinter-python wangyiyunmusic
Last synced: 12 Sep 2025
https://github.com/omkarcloud/web-scraping-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 24 Oct 2025
https://github.com/oldkingcone/pbandj
PasteBin Crawler, crawls the url https://pastebin.com/archive
crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver
Last synced: 26 Sep 2025
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 14 Apr 2025
https://github.com/zebbern/reconx
🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!
crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security
Last synced: 03 Jul 2025
https://github.com/basemax/jadi-net-blog
This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.
blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp
Last synced: 13 Oct 2025
https://github.com/xishandong/music_web
A simple Web system of music
flask flask-sqlalchemy python-crawler wangyiyunmusic
Last synced: 31 Oct 2025
https://github.com/omkarcloud/multiple-account-generation-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING MULTIPLE ACCOUNTS ON A WEBSITE. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 21 Feb 2025
https://github.com/iampukar/url_crawler
A Python library to crawl the details of a URL.
page-crawler python-crawler python-webcrawler url-crawler webpage-crawler
Last synced: 12 Apr 2025
https://github.com/yjg30737/onepiece-database
Watching One Piece characters info in ONE PIECE WIKI(FANDOM) with PyQt GUI
anime database fandom one-piece onepiece onepiece-database onepiece-db pandas pyqt pyqt-examples pyqt-tutorial pyqt-web-crawler pyqt5 pyqt5-gui python python-crawler scrapy wiki-crawler wikia
Last synced: 20 Jul 2025
https://github.com/basemax/my-site-url-finders
A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.
crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder
Last synced: 15 Oct 2025
https://github.com/sreejoy/crawlerfriend
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
crawler python-crawler python-scraper python27 scrapper
Last synced: 12 Jun 2025
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 09 Apr 2025
https://github.com/simonpierreboucher/crawler
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
concurrent-crawling content-extraction data-collection data-extraction-pipeline data-preservation-and-recovery data-scraping error-handling html-parsing http-requests metadata-storage modular-design pdf-text-extraction python-crawler rate-limiting structured-data-storage text-processing url-normalization web-crawling yaml-configuration
Last synced: 30 Mar 2025
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 25 Feb 2025
https://github.com/viper373/gsc-kit
🚀 GSC-Kit旨在自动化从 Google Search Console (GSC) 提取数据,帮助高效地收集和整理网站的性能指标。
chrome-extension google-console gsc-script javascript python python-crawler
Last synced: 04 Apr 2025