An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with crawler-python

A curated list of projects in awesome lists tagged with crawler-python .

https://github.com/lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

crawler crawler-python crawling extraction-engine html machine-learning scraper scraping

Last synced: 15 May 2025

https://github.com/wonderfulsuccess/weixin_crawler

稳定工作4年的微信公众号爬虫 Based on python and vuejs 微信公众号采集 Python爬虫 公众号采集 公众号爬虫 公众号备份

crawler-python python vuejs weixin-crawler

Last synced: 26 Mar 2025

https://github.com/6677-ai/tap4-ai-crawler

The crawler opened source by tap4.ai

aitoolkit aitools crawler crawler-engine crawler-python

Last synced: 16 May 2025

https://github.com/amerkurev/scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

crawler crawler-python crawling headless readability scraper scraping web-parsers web-parsing web-scraping

Last synced: 08 May 2025

https://github.com/wwwwwydev/crawlist

A universal solution for web crawling lists. 抓取网页列表的通用解决方案

crawl crawler crawler-python crawling-python crawlist python reptile

Last synced: 01 May 2025

https://github.com/jimouchen/bing-chat-fxxk

newbing api by PlayWright

bing-api crawler crawler-python gpt

Last synced: 16 Jun 2025

https://github.com/viper373/jd-comments

爬取京东商品评论数据

crawler-python data-analysis python spider

Last synced: 15 Apr 2025

https://github.com/xunzhuo/airspider

A Fast and Light Python Spider Framework 🕷️

asynchronous crawler crawler-python distributed python3 redis spider spider-framework web

Last synced: 23 Mar 2025

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 10 Apr 2025

https://github.com/zebbern/dezcrwl

🕷️ | dezcrwl is a website history crawler gather hidden information and check vulnerabilities for extracted .js endpoints & much more!

crawl crawler crawler-python crawlers ctf-tools hacking historical-data information information-gathering information-retrieval information-security infosec osint osint-tool pentesting-tools python reconnaissance tool web website

Last synced: 14 Apr 2025

https://github.com/basemax/instagramseleniumhashtagimagepython

Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.

crawler crawler-python crawlers instagram python python-selenium selenium selenium-python

Last synced: 03 Apr 2025

https://github.com/chenmozhijin/mediawikiextractor

一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.

crawler crawler-python crawling extractor json mediawiki python regex web-crawler

Last synced: 15 Apr 2025

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 24 Mar 2025

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 25 Feb 2025

https://github.com/viper373/lol-deepwinpredictor

基于双向双层、引入注意力机制的LSTM对英雄联盟比赛胜率进行预测。

attention-mechanism crawler-python deep-learning flask lol lstm mongodb prediction python rocketmq spider

Last synced: 30 Mar 2025

https://github.com/nb-group/the-earth

Real time viewing of the Earth's top view!

crawler-python remote-sensing-satellite satellite-images web-page

Last synced: 13 Mar 2025

https://github.com/viper373/163-buff

爬取网易BUFF平台CS:GO武器皮肤交易数据

163 arima crawler-python csgo data-analysis prediction python

Last synced: 30 Mar 2025

https://github.com/opda0887/bahamut-crawler-to-gmail

發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 21 Mar 2025

https://github.com/dan3002/tiktok-crawler

This is a simple Tiktok crawler that can be used to download videos from Tiktok. It uses the Tiktok API to get the video URL and then downloads the video using the requests library. It can download video from multiple hashtags or download by sound.

crawler-python data-engineer playwright python tiktok

Last synced: 03 Mar 2025

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 19 Feb 2025

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 20 Feb 2025

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 23 Mar 2025

https://github.com/aj-tap/cyclops

Python scripts including network enumeration, scanning, tracerouting, probing and simple attacks like ssh brute forcing.

crawler-python enumeration osint-python webapplication

Last synced: 21 Feb 2025

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 28 Mar 2025

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 12 Mar 2025

https://github.com/rflcnunes/crawler_email_py

In this project I'm creating a web crawler to check email boxes and handle incoming messages.

aws-bucket aws-bucket-s3 aws-s3 crawler crawler-python email python rabbitmq

Last synced: 26 Mar 2025

https://github.com/probro27/vision-search

A search engine built from scratch including - crawler, indexer and a ranker. Please check out the code here :)

crawler-python express parser search-engine

Last synced: 23 Feb 2025

https://github.com/pawsanie/steam_statistics_etl

This pipeline can be used to collect statistical information about all games, distributed through the Steam platform.

crawler-python data-crawler etl etl-pipeline extract-transform-load games luigi python python-3 python3 scraper scraping scraping-websites statistics steam steam-games steam-store steam-web-api

Last synced: 22 Feb 2025

https://github.com/filsuin/linkedin-crawler

A Python tool for automating job searches on LinkedIn based on user-defined keywords.

crawler crawler-python linkedin offer

Last synced: 16 Jun 2025

https://github.com/floressek/web-crawler-gc

Using python's webcrawler to gather data regarding academic schedule and importing it to google calendar

crawler-python

Last synced: 05 Apr 2025

https://github.com/basemax/jadi-net-blog

This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.

blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp

Last synced: 30 Jan 2025

https://github.com/viper373/lol-dataanalytics

腾讯游戏-英雄联盟赛事20/21/22年数据综合分析预测

crawler-python data-analysis jupyter-notebook lol python spider

Last synced: 30 Mar 2025

https://github.com/zepolimer/python-crawler

Python crawler - implementing Google and Bing browsers

crawler-python playwright playwright-python python

Last synced: 05 Apr 2025

https://github.com/zezs/ice-breaker-powered-by-llm

Ice Breaker is comprehensive fullstack app leveraging generative AI and LangChain to find LinkedIn profiles and generate engaging ice breakers. LangChain ReAct agents ensure accurate URL retrieval and JSON cleaning, identifying a summary, facts, topics, and ice breakers. The frontend is built with HTML/CSS, and Flask powers the backend development.

agent chainofthought chains crawler-python css flask generative-ai html langchain langsmith llm prompt-engineering proxy-scraper python react tools web

Last synced: 09 Apr 2025

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 27 Mar 2025

https://github.com/viper373/chengdu-emotion

网易云音乐《成都》评论的文本聚类与情感分析

163music chengdu crawler-python emotion-analysis python text-classification text-clustering

Last synced: 15 Jun 2025

https://github.com/stylepatrick/point-staking-monitor

Point Staking Monitor with Telegram notification accomplished through web crawler. Can be used for e very Cryptocurrency Explorer.

chromdriver crawler-python cryptocurrency docker point-network python3

Last synced: 21 Mar 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 13 Mar 2025