Projects in Awesome Lists tagged with crawler-python
A curated list of projects in awesome lists tagged with crawler-python .
https://github.com/lorey/mlscraper
🤖 Scrape data from HTML websites automatically by just providing examples
crawler crawler-python crawling extraction-engine html machine-learning scraper scraping
Last synced: 15 May 2025
https://github.com/wonderfulsuccess/weixin_crawler
稳定工作4年的微信公众号爬虫 Based on python and vuejs 微信公众号采集 Python爬虫 公众号采集 公众号爬虫 公众号备份
crawler-python python vuejs weixin-crawler
Last synced: 26 Mar 2025
https://github.com/6677-ai/tap4-ai-crawler
The crawler opened source by tap4.ai
aitoolkit aitools crawler crawler-engine crawler-python
Last synced: 16 May 2025
https://github.com/amerkurev/scrapper
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
crawler crawler-python crawling headless readability scraper scraping web-parsers web-parsing web-scraping
Last synced: 08 May 2025
https://github.com/nuhmanpk/webscrapper
Powerful Telegram bot for web scraping and crawling. Fast, easy, and loved by thousands!
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 12 Apr 2025
https://github.com/nuhmanpk/WebScrapper
Simple and powerfull all in one Telegram Bot to scrap / crawl webpages using Requests, html5lib and Beautifulsoup
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 29 Nov 2024
https://github.com/wwwwwydev/crawlist
A universal solution for web crawling lists. 抓取网页列表的通用解决方案
crawl crawler crawler-python crawling-python crawlist python reptile
Last synced: 01 May 2025
https://github.com/jimouchen/bing-chat-fxxk
newbing api by PlayWright
bing-api crawler crawler-python gpt
Last synced: 16 Jun 2025
https://github.com/viper373/jd-comments
爬取京东商品评论数据
crawler-python data-analysis python spider
Last synced: 15 Apr 2025
https://github.com/basemax/stackoverflowcrawler
A web crawler which crawls the stackoverflow website.
crawler crawler-detector crawler-python crawler-testing crawlers crawling python-crawler stackoverflow stackoverflow-analyse stackoverflow-answer stackoverflow-api stackoverflow-crawler stackoverflow-get stackoverflow-questions stackoverflow-tags test-crawler text-processing text-processor web-crawler web-crawler-python
Last synced: 05 May 2025
https://github.com/michaelradu/web-crawler
A Web Crawler developed in Python.
crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling
Last synced: 01 Dec 2024
https://github.com/xunzhuo/airspider
A Fast and Light Python Spider Framework 🕷️
asynchronous crawler crawler-python distributed python3 redis spider spider-framework web
Last synced: 23 Mar 2025
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 10 Apr 2025
https://github.com/zebbern/dezcrwl
🕷️ | dezcrwl is a website history crawler gather hidden information and check vulnerabilities for extracted .js endpoints & much more!
crawl crawler crawler-python crawlers ctf-tools hacking historical-data information information-gathering information-retrieval information-security infosec osint osint-tool pentesting-tools python reconnaissance tool web website
Last synced: 14 Apr 2025
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 13 Jun 2025
https://github.com/basemax/instagramseleniumhashtagimagepython
Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.
crawler crawler-python crawlers instagram python python-selenium selenium selenium-python
Last synced: 03 Apr 2025
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 15 Apr 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 24 Mar 2025
https://github.com/truethari/fcrawler
Python application that can be used to copy files of a given file type from a folder directory.
copy copy-files crawl crawler crawler-python file files
Last synced: 25 Feb 2025
https://github.com/viper373/lol-deepwinpredictor
基于双向双层、引入注意力机制的LSTM对英雄联盟比赛胜率进行预测。
attention-mechanism crawler-python deep-learning flask lol lstm mongodb prediction python rocketmq spider
Last synced: 30 Mar 2025
https://github.com/nb-group/the-earth
Real time viewing of the Earth's top view!
crawler-python remote-sensing-satellite satellite-images web-page
Last synced: 13 Mar 2025
https://github.com/noarche/crawler
Url crawler spider
crawl crawler-python noisy python-script python3 spider
Last synced: 25 Mar 2025
https://github.com/viper373/163-buff
爬取网易BUFF平台CS:GO武器皮肤交易数据
163 arima crawler-python csgo data-analysis prediction python
Last synced: 30 Mar 2025
https://github.com/sinipelto/repo-license-crawler
Collects and summarizes license information on Python and NPM packages into output files.
crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3
Last synced: 30 Mar 2025
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 21 Mar 2025
https://github.com/dan3002/tiktok-crawler
This is a simple Tiktok crawler that can be used to download videos from Tiktok. It uses the Tiktok API to get the video URL and then downloads the video using the requests library. It can download video from multiple hashtags or download by sound.
crawler-python data-engineer playwright python tiktok
Last synced: 03 Mar 2025
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 19 Feb 2025
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 20 Feb 2025
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 23 Mar 2025
https://github.com/aj-tap/cyclops
Python scripts including network enumeration, scanning, tracerouting, probing and simple attacks like ssh brute forcing.
crawler-python enumeration osint-python webapplication
Last synced: 21 Feb 2025
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 28 Mar 2025
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 12 Mar 2025
https://github.com/rflcnunes/crawler_email_py
In this project I'm creating a web crawler to check email boxes and handle incoming messages.
aws-bucket aws-bucket-s3 aws-s3 crawler crawler-python email python rabbitmq
Last synced: 26 Mar 2025
https://github.com/probro27/vision-search
A search engine built from scratch including - crawler, indexer and a ranker. Please check out the code here :)
crawler-python express parser search-engine
Last synced: 23 Feb 2025
https://github.com/pawsanie/steam_statistics_etl
This pipeline can be used to collect statistical information about all games, distributed through the Steam platform.
crawler-python data-crawler etl etl-pipeline extract-transform-load games luigi python python-3 python3 scraper scraping scraping-websites statistics steam steam-games steam-store steam-web-api
Last synced: 22 Feb 2025
https://github.com/filsuin/linkedin-crawler
A Python tool for automating job searches on LinkedIn based on user-defined keywords.
crawler crawler-python linkedin offer
Last synced: 16 Jun 2025
https://github.com/floressek/web-crawler-gc
Using python's webcrawler to gather data regarding academic schedule and importing it to google calendar
Last synced: 05 Apr 2025
https://github.com/basemax/jadi-net-blog
This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.
blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp
Last synced: 30 Jan 2025
https://github.com/viper373/lol-dataanalytics
腾讯游戏-英雄联盟赛事20/21/22年数据综合分析预测
crawler-python data-analysis jupyter-notebook lol python spider
Last synced: 30 Mar 2025
https://github.com/zepolimer/python-crawler
Python crawler - implementing Google and Bing browsers
crawler-python playwright playwright-python python
Last synced: 05 Apr 2025
https://github.com/zezs/ice-breaker-powered-by-llm
Ice Breaker is comprehensive fullstack app leveraging generative AI and LangChain to find LinkedIn profiles and generate engaging ice breakers. LangChain ReAct agents ensure accurate URL retrieval and JSON cleaning, identifying a summary, facts, topics, and ice breakers. The frontend is built with HTML/CSS, and Flask powers the backend development.
agent chainofthought chains crawler-python css flask generative-ai html langchain langsmith llm prompt-engineering proxy-scraper python react tools web
Last synced: 09 Apr 2025
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 27 Mar 2025
https://github.com/viper373/chengdu-emotion
网易云音乐《成都》评论的文本聚类与情感分析
163music chengdu crawler-python emotion-analysis python text-classification text-clustering
Last synced: 15 Jun 2025
https://github.com/eesunmoon/spam_review_detection
[Project] Capstone Design - Spam Detection
crawler-python data-analysis konlpy natural-language-processing python sorting-algorithms spam-detection
Last synced: 04 Mar 2025
https://github.com/stylepatrick/point-staking-monitor
Point Staking Monitor with Telegram notification accomplished through web crawler. Can be used for e very Cryptocurrency Explorer.
chromdriver crawler-python cryptocurrency docker point-network python3
Last synced: 21 Mar 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 13 Mar 2025