Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-10 00:06:28 UTC
- JSON Representation
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 12 Jan 2025
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 20 Jan 2025
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 12 Jan 2025
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 12 Jan 2025
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 12 Jan 2025
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 28 Dec 2024
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 12 Jan 2025
https://github.com/ndoolan360/go-crawler
A simple web crawling program written in Go in an afternoon. 🕷️🕸️
afternoon-project crawler scraper
Last synced: 18 Jan 2025
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/ssv445/js-rendering-proxy-docker
JS Rendering Proxy API to Handle JS Website in Your Crawler.
Last synced: 18 Jan 2025
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 30 Jan 2025
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 21 Jan 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 24 Dec 2024
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 19 Jan 2025
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 24 Jan 2025
https://github.com/amirsorouri00/crawler
Page-Rank Public python2 projects whice have been turned into python3.
Last synced: 19 Jan 2025
https://github.com/mattmoony/webcrawler.py
A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍
beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler
Last synced: 19 Jan 2025
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 23 Jan 2025
https://github.com/onetail/crawler-with-kafka-docker
homework to crawler and anaylsis
Last synced: 24 Jan 2025
https://github.com/indrasaputra/sulong
Simple application that crawls a specific fundraising website and notifies users if there is a new project
bot crawler go golang telegram telegram-bot
Last synced: 19 Jan 2025
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 19 Jan 2025
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 19 Jan 2025
https://github.com/avsbharadwaj/web_crawler
A basic web crawler that prints out the links and description present on a website rescursively
Last synced: 19 Jan 2025
https://github.com/lightbeem3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 19 Jan 2025
https://github.com/triekai/review-radar
An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.
crawler google-maps nextjs openai react
Last synced: 24 Jan 2025
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 20 Jan 2025
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 10 Jan 2025
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 06 Jan 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 02 Jan 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 26 Jan 2025
https://github.com/bennettdams/vace-it-crawler
Python (Scrapy) crawler to access data of FACEIT.com
Last synced: 13 Jan 2025
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 28 Jan 2025
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 17 Jan 2025
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 20 Jan 2025
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 28 Jan 2025
https://github.com/isaqueveras/scrape-google-results
Scrape Google Results in Golang
crawler golang google scraper webcrawler
Last synced: 26 Jan 2025
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 14 Jan 2025
https://github.com/hsiehbocheng/usa-tourist-recommend
crawler mongodb python tableau
Last synced: 14 Jan 2025
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 14 Jan 2025
https://github.com/eghuro/crawlcheck
Extensible web crawler
configuration crawler http plugin python robots-txt sitemap
Last synced: 12 Jan 2025
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 22 Dec 2024
https://github.com/mstephen19/apify-click-events
Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to
apify apify-sdk crawler scraper web-automation
Last synced: 04 Feb 2025
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 25 Jan 2025
https://github.com/mahdijamebozorg/cryptonewscrawler
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 16 Jan 2025
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 06 Jan 2025
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 31 Jan 2025
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 22 Jan 2025
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 13 Oct 2024
https://github.com/humbertodias/go-nie-crawler
Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.
Last synced: 13 Jan 2025
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 05 Feb 2025
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 28 Jan 2025
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 28 Dec 2024
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 28 Jan 2025
https://github.com/matheusfaustino/phrawl
Phrawl: A web crawling framework in PHP (or it seems so)
crawler crawling crawling-framework php scraper wip
Last synced: 28 Dec 2024
https://github.com/matheusfaustino/jazzmaster_crawler
It is a crawling for getting the audio programs from a specific radio program called Jazzmaster
Last synced: 28 Dec 2024
https://github.com/licoy/win4000-images-crawler
基于scrapy爬取&下载win4000.com的图片壁纸
Last synced: 02 Feb 2025
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 31 Dec 2024
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 24 Jan 2025
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 12 Jan 2025
https://github.com/ahsouza/iquizz-api
API RESTfull developed in Node.Js with MongoDB
animations cluster crawler docker docker-compose ejs-templates es8 font-awesome grunt-task helmet-detection heroku javascript jquery material-design mongodb nodejs passport-strategy passportjs pusher token-authetication
Last synced: 05 Feb 2025
https://github.com/nagilum/focus
Simple CLI tool, written in C#, to crawl a site and log the responses.
cli crawl crawler csharp playwright
Last synced: 16 Jan 2025
https://github.com/rcmilan/ex-web-scraping
Web Scraping com F#
crawler f-sharp fsharp fsharp-data scraper web-scraping xplot
Last synced: 17 Jan 2025
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 05 Feb 2025
https://github.com/surister/scrupy
Python library to create web Crawlers which aims to be powerful yet simple.
crawler crawling-framework crawling-python http library python scraping
Last synced: 18 Jan 2025
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 13 Jan 2025
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 14 Nov 2024
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 20 Jan 2025
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 22 Jan 2025
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 16 Jan 2025
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 13 Jan 2025