Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-18 00:06:04 UTC
- JSON Representation
https://github.com/vishaalpkumar/skysift
A distributed search engine from scratch
aws crawler css distributed-systems html java search-engine
Last synced: 05 Nov 2024
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 13 Oct 2024
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 14 Oct 2024
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 13 Oct 2024
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 16 Nov 2024
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 09 Nov 2024
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 03 Nov 2024
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 07 Nov 2024
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 07 Nov 2024
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 09 Oct 2024
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 14 Nov 2024
https://github.com/hsiehbocheng/usa-tourist-recommend
crawler mongodb python tableau
Last synced: 14 Nov 2024
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 14 Nov 2024
https://github.com/mohammadrezaamani/squirrel
Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.
Last synced: 04 Nov 2024
https://github.com/bockstaller/europarl-crawler
Crawler for the documents published by the European Parliament
crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union
Last synced: 10 Nov 2024
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 28 Oct 2024
https://github.com/genfuture/cryptocurrency-scraper
Cryptocurrency Data Crawler 🚀 High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, social metrics, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 16 Nov 2024
https://github.com/joeri-abbo/python-credly-scraper
This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an
badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling
Last synced: 15 Nov 2024
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 15 Nov 2024
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 14 Nov 2024
https://github.com/lin-jun-xiang/python-crawler
Using CloudScraper, Requests, API, Thread, Async... for scrape the data
async cloudscraper crawler multithreading python requests scraper selenium
Last synced: 03 Nov 2024
https://github.com/kernelerr/pixivurls
An awesome tool to get Pixiv image URLs.
Last synced: 12 Oct 2024
https://github.com/lysagxra/eromedownloader
Erome albums and profile downloader
bulk bulk-downloader concurrent-processing crawler downloader erome erome-downloader parallel-processing profile-downloader python python3
Last synced: 16 Nov 2024
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 18 Nov 2024
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 18 Nov 2024
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 08 Nov 2024
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 02 Nov 2024
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 18 Nov 2024
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 06 Nov 2024
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 13 Oct 2024
https://github.com/zhaotianff/crawler-line
C# command-line crawler
command-line command-line-tool crawler csharp dotnet-core
Last synced: 15 Nov 2024
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 01 Oct 2024
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 25 Oct 2024
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 15 Nov 2024
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 25 Oct 2024
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 16 Oct 2024
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Nov 2024
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 14 Oct 2024
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 13 Oct 2024
https://github.com/rcmilan/ex-web-scraping
Web Scraping com F#
crawler f-sharp fsharp fsharp-data scraper web-scraping xplot
Last synced: 17 Nov 2024
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 11 Nov 2024
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 11 Nov 2024
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 15 Nov 2024
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 09 Nov 2024
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 12 Nov 2024
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 26 Oct 2024
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 21 Oct 2024
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 25 Oct 2024
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 14 Oct 2024
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 15 Oct 2024
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 23 Oct 2024
https://github.com/amirsorouri00/crawler
Page-Rank Public python2 projects whice have been turned into python3.
Last synced: 18 Nov 2024
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 15 Oct 2024
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 06 Nov 2024
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 15 Oct 2024
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 09 Nov 2024
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 10 Nov 2024
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 10 Nov 2024
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 09 Nov 2024
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 15 Nov 2024
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 09 Nov 2024
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 10 Nov 2024
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 10 Nov 2024
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 10 Nov 2024
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 09 Nov 2024
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 09 Nov 2024
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 12 Nov 2024
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 15 Nov 2024
https://github.com/shaoxiongdu/skyeye
一个基于SpringBoot的全网热点爬虫项目,原始热搜数据会入库,分词统计会存入Redis。方便之后的数据分析。
crawler crawlers mysql redis spring spring-boot
Last synced: 16 Nov 2024