An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with python-crawler

A curated list of projects in awesome lists tagged with python-crawler .

https://github.com/xishandong/crawlproject

python爬虫项目合集,从基础到js逆向,包含基础篇、自动化篇、进阶篇以及验证码篇。案例涵盖各大网站(xhs douyin weibo ins boss job,jd...),你将会学到有关爬虫以及反爬虫、自动化和验证码的各方面知识

captcha ddddocr javascript playwright python python-crawler reverse-engineering

Last synced: 06 Apr 2025

https://github.com/zhuozhuocrayon/pythoncrawler

python3网络爬虫笔记与实战源码。记录python爬虫学习全程笔记、参考资料和常见错误,约40个爬取实例与思路解析,涵盖urllib、requests、bs4、jsonpath、re、 pytesseract、PIL等常用库的使用。

python-crawler python3

Last synced: 13 Apr 2025

https://github.com/elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

crawler python python-crawler spider web-crawler web-spider

Last synced: 13 Apr 2025

https://github.com/taseikyo/crawler

:snake:A collection of simple Python crawlers.

baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku

Last synced: 19 Oct 2025

https://github.com/superbrucejia/dynamic-web-crawlering-python

This repo is mainly for dynamic web (Ajax Tech) crawling using Python, taking China's NSTL websites as an example.

dynamic-web-crawler dynamic-website nstl python python-crawler web-crawler-python web-crawling

Last synced: 21 Apr 2025

https://github.com/xishandong/weibo_crawler

支持多种爬取方式,下载用户相册,爬取用户帖子,爬取实时搜索帖子等,欢迎下载使用和补充功能

python-crawler weibo weibo-spider

Last synced: 09 Apr 2025

https://github.com/eugen1j/aioscrapy

Python asynchronous library for web scrapping

asyncio crawler python-crawler python37 webscraper

Last synced: 09 Oct 2025

https://github.com/xishandong/music_player

基于tkinter的音乐播放器

python-crawler tkinter tkinter-python wangyiyunmusic

Last synced: 12 Sep 2025

https://github.com/oldkingcone/pbandj

PasteBin Crawler, crawls the url https://pastebin.com/archive

crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver

Last synced: 26 Sep 2025

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 14 Apr 2025

https://github.com/zebbern/reconx

🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!

crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security

Last synced: 03 Jul 2025

https://github.com/basemax/jadi-net-blog

This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.

blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp

Last synced: 13 Oct 2025

https://github.com/iampukar/url_crawler

A Python library to crawl the details of a URL.

page-crawler python-crawler python-webcrawler url-crawler webpage-crawler

Last synced: 12 Apr 2025

https://github.com/basemax/my-site-url-finders

A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.

crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder

Last synced: 15 Oct 2025

https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

crawler python-crawler python-scraper python27 scrapper

Last synced: 12 Jun 2025

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/simonpierreboucher/crawler

A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.

concurrent-crawling content-extraction data-collection data-extraction-pipeline data-preservation-and-recovery data-scraping error-handling html-parsing http-requests metadata-storage modular-design pdf-text-extraction python-crawler rate-limiting structured-data-storage text-processing url-normalization web-crawling yaml-configuration

Last synced: 30 Mar 2025

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 25 Feb 2025

https://github.com/viper373/gsc-kit

🚀 GSC-Kit旨在自动化从 Google Search Console (GSC) 提取数据,帮助高效地收集和整理网站的性能指标。

chrome-extension google-console gsc-script javascript python python-crawler

Last synced: 04 Apr 2025