Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-07 00:05:58 UTC
- JSON Representation
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 30 Oct 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 06 Nov 2024
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 20 Oct 2024
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 11 Oct 2024
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 15 Oct 2024
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 15 Oct 2024
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 24 Oct 2024
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 29 Oct 2024
https://github.com/jfcherng/wiki-cgroup-crawler
此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。
crawler php-71 wiki-cgroup-crawler wikipedia
Last synced: 28 Sep 2024
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 12 Oct 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 12 Oct 2024
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 13 Oct 2024
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 04 Nov 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 13 Oct 2024
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 12 Oct 2024
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 11 Oct 2024
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 12 Oct 2024
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 22 Oct 2024
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 22 Oct 2024
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 19 Oct 2024
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 24 Oct 2024
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 28 Oct 2024
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 28 Oct 2024
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 11 Oct 2024
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Oct 2024
https://github.com/carloocchiena/python_url_crawler
A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website
beautifulsoup crawler python python3
Last synced: 14 Oct 2024
https://github.com/khadkarajesh/aptoide
Aptoide app crawler using beautifulsoup
beautifulsoup4 crawler flask python3 web-application
Last synced: 11 Oct 2024
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 02 Aug 2024
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 13 Oct 2024
https://github.com/spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping
Last synced: 05 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 05 Nov 2024
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 11 Oct 2024
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 12 Oct 2024
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 06 Nov 2024
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 06 Nov 2024
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 11 Oct 2024
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 13 Oct 2024
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 11 Oct 2024
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 11 Oct 2024
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 11 Oct 2024
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 13 Oct 2024
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 11 Oct 2024
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 14 Oct 2024
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 11 Oct 2024
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 11 Oct 2024
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 12 Oct 2024
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 12 Oct 2024
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 11 Oct 2024
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 14 Oct 2024
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 11 Oct 2024
https://github.com/tom-draper/wiki-crawl
A game of path finding through Wikipedia topics.
api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search
Last synced: 11 Oct 2024
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 13 Oct 2024
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 17 Oct 2024
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 13 Oct 2024
https://github.com/bennettdams/vace-it-crawler
Python (Scrapy) crawler to access data of FACEIT.com
Last synced: 11 Oct 2024
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 15 Oct 2024
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 13 Oct 2024
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 14 Oct 2024
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 28 Oct 2024
https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler
StackOverFlow Tag Generator Using a WebCrawler.
Last synced: 05 Nov 2024
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 13 Oct 2024
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 28 Oct 2024
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 13 Oct 2024
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 03 Nov 2024
https://github.com/iarsham/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: 24 Oct 2024
https://github.com/becky-dai/flower-knowledge-graph-visualization
A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目
crawler css django echarts html js knowledge-graph neo4j python
Last synced: 03 Nov 2024
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 06 Nov 2024
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 13 Oct 2024
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 25 Oct 2024
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 25 Oct 2024
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 11 Oct 2024
https://github.com/sahaavi/web-scraping
Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!
beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping
Last synced: 07 Nov 2024
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 14 Oct 2024
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
Last synced: 25 Oct 2024
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 05 Nov 2024