Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-10 00:06:02 UTC
- JSON Representation
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 20 Nov 2024
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 18 Nov 2024
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 18 Nov 2024
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 18 Nov 2024
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 06 Jan 2025
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 28 Dec 2024
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 22 Dec 2024
https://github.com/amirsorouri00/crawler
Page-Rank Public python2 projects whice have been turned into python3.
Last synced: 18 Nov 2024
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 30 Nov 2024
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 07 Dec 2024
https://github.com/bujosa/aldebaran
Example use APP ENGINE with Python3, ThreadPool and webScraping
appengine crawler flask gcp python3 thread-pool
Last synced: 20 Nov 2024
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 21 Dec 2024
https://github.com/semoal/pythoncrawler
Python crawler with XMLRPC & BeautifulSoap
beautifulsoup crawler python wordpress xmlrpc
Last synced: 15 Dec 2024
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 20 Nov 2024
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 29 Dec 2024
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 14 Nov 2024
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 22 Nov 2024
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 24 Nov 2024
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 20 Nov 2024
https://github.com/lin-jun-xiang/python-crawler
Using CloudScraper, Requests, API, Thread, Async... for scrape the data
async cloudscraper crawler multithreading python requests scraper selenium
Last synced: 21 Dec 2024
https://github.com/liebki/githubnet
This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.
crawler crawling github github-trending htmlagilitypack microsoft
Last synced: 24 Nov 2024
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 18 Nov 2024
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 18 Nov 2024
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 20 Dec 2024
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 07 Dec 2024
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 21 Nov 2024
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 23 Nov 2024
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 21 Nov 2024
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Nov 2024
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 21 Nov 2024
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 21 Dec 2024
https://github.com/homuchen/instagram-crawler
Instagram crawler
crawler instagram nodejs-crawler
Last synced: 01 Dec 2024
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 01 Dec 2024
https://github.com/tetreum/xupopter_client
Simple interface to manage Xupopter recipes aswell as it's runners.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 21 Nov 2024
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 26 Dec 2024
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 21 Nov 2024
https://github.com/kyagara/lol-match-crawler
Very simple crawler for League of Legends matches.
crawler league-of-legends pgx postgres riot-games sql
Last synced: 01 Dec 2024
https://github.com/vishaalpkumar/skysift
A distributed search engine from scratch
aws crawler css distributed-systems html java search-engine
Last synced: 22 Dec 2024
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 02 Jan 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 24 Dec 2024
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 22 Nov 2024
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 09 Jan 2025
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 22 Nov 2024
https://github.com/bradsec/gomine
A Go CLI tool to quickly crawl and mine (download) specific file types from websites.
cli crawler golang terminal-based
Last synced: 22 Dec 2024
https://github.com/lucasbotang/project_financial_markets_text_mining
Predict stock market movement based on news
crawler data-science natural-language-processing python
Last synced: 25 Nov 2024
https://github.com/shaoxiongdu/skyeye
一个基于SpringBoot的全网热点爬虫项目,原始热搜数据会入库,分词统计会存入Redis。方便之后的数据分析。
crawler crawlers mysql redis spring spring-boot
Last synced: 16 Nov 2024
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 29 Dec 2024
https://github.com/seart-group/github-keyword-crawler
A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints
api-mining crawler dockerized github-api miner mongodb-database python-script
Last synced: 07 Dec 2024
https://github.com/mohitk05/drstrange
A simple breadth-first search web crawler
Last synced: 05 Dec 2024
https://github.com/tigercosmos/web-crawler
Web Crawler in Java Maven Project
Last synced: 05 Dec 2024
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 22 Nov 2024
https://github.com/thecloer/crawler-himym
How I met your mother script PDF generator for learning English
crawler pdf pdf-generation typescript web-scraping webscraping
Last synced: 10 Dec 2024
https://github.com/im-perativa/public_crawler
A collection of crawler project for Indonesia dataset
crawler indonesia indonesia-api scrapy
Last synced: 25 Nov 2024
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 05 Dec 2024
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 15 Oct 2024
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 05 Dec 2024
https://github.com/gesiscss/github_traffic_crawler
Retrieve the data information from the repositories (insight, usage, commits)
Last synced: 03 Jan 2025
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 22 Dec 2024
https://github.com/apexcaptain/allergy-alert
오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.
crawler docker expressjs puppeteer reactjs sqlite typescript
Last synced: 01 Dec 2024
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 17 Dec 2024
https://github.com/bytejoseph/osintgit
GitHub to email
crawler email github github-to-email hacking hacking-tool latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 26 Nov 2024
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 19 Nov 2024
https://github.com/dmarcosl/upshelf-technical-test
Technical test for Upshelf
crawler interview python scraping scrapy spider technical-test web-scraping
Last synced: 22 Dec 2024
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 13 Dec 2024
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 09 Jan 2025
https://github.com/oleksandr-moik/spring-boot-web-crawler
Web Crawler app on Spring Boot. Getting categories and relevant news category.
crawler gradle java spring-boot
Last synced: 08 Dec 2024
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 08 Dec 2024
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 13 Dec 2024
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 27 Nov 2024
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 27 Nov 2024
https://github.com/pinpox/go-random-downloader
Download Html using "Random Page"
Last synced: 29 Nov 2024
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 06 Dec 2024
https://github.com/tomfran/crawler
A web crawler written in Rust
bloom-filter crawler rust simhash
Last synced: 06 Jan 2025