Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-24 00:06:40 UTC
- JSON Representation
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 09 Nov 2024
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Jan 2025
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 30 Nov 2024
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 05 Jan 2025
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 08 Jan 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 08 Jan 2025
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 28 Dec 2024
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 25 Dec 2024
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 15 Jan 2025
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 08 Jan 2025
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 07 Jan 2025
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 09 Jan 2025
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 25 Dec 2024
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 28 Dec 2024
https://github.com/tinoco/ticapsoriginal_website_score_overview
Ticapsoriginal website sitemaps checker score overview
advertools beautifulsoup behave bs4 chart crawler linkbuilding matplotlib metrics metrics-visualization parser python requests score sitemaps ticapsoriginal tqdm unittesting urllib
Last synced: 09 Jan 2025
https://github.com/zahraarshia/cti_crawl
This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.
crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper
Last synced: 09 Jan 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 14 Jan 2025
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 16 Jan 2025
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 16 Jan 2025
https://github.com/shentengtu/cht-yp-crawler
Simple Crawler of www.iyp.com.tw.
crawler node-js nodejs yellow-pages yellowpages
Last synced: 11 Jan 2025
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 16 Jan 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 22 Jan 2025
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 21 Jan 2025
https://github.com/mustafadalga/website-crawler
Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.
crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling
Last synced: 18 Jan 2025
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 06 Jan 2025
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 28 Dec 2024
https://github.com/notreeceharris/webstalker
🕸 A Powerful Relational Web Crawler
Last synced: 14 Jan 2025
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 16 Jan 2025
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 16 Jan 2025
https://github.com/thiiagoms/car-stealth
REST API to all cars that were stolen
Last synced: 16 Jan 2025
https://github.com/erickj3/strike-api
this is a web scraping api with nestsj
api crawler flow nestjs scraping typescript
Last synced: 24 Jan 2025
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 21 Dec 2024
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 16 Jan 2025
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 22 Jan 2025
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 14 Nov 2024
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 30 Nov 2024
https://github.com/timpletin/comming-soon
Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event
crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript
Last synced: 21 Jan 2025
https://github.com/nagilum/focus
Simple CLI tool, written in C#, to crawl a site and log the responses.
cli crawl crawler csharp playwright
Last synced: 16 Jan 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 29 Dec 2024
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 12 Jan 2025
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 13 Jan 2025
https://github.com/lin-jun-xiang/python-crawler
Using CloudScraper, Requests, API, Thread, Async... for scrape the data
async cloudscraper crawler multithreading python requests scraper selenium
Last synced: 21 Dec 2024
https://github.com/mahdijamebozorg/cryptonewscrawler
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 16 Jan 2025
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 22 Dec 2024
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 20 Dec 2024
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 17 Jan 2025
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 20 Jan 2025
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 23 Jan 2025
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 14 Jan 2025
https://github.com/hsiehbocheng/usa-tourist-recommend
crawler mongodb python tableau
Last synced: 14 Jan 2025
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 14 Jan 2025
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 21 Jan 2025
https://github.com/qqxs/usda_pomological_watercolors
爬取美国农业部果树水彩的数据
crawler koa2 nodejs watercolors
Last synced: 18 Jan 2025
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 22 Jan 2025
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 21 Dec 2024
https://github.com/billy0402/python-application
A learning project from the book 'Python 技術者們'.
course crawler matplotlib opencv pandas python requests selenium sklearn
Last synced: 14 Jan 2025
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 14 Jan 2025
https://github.com/tetreum/xupopter_client
Simple interface to manage Xupopter recipes aswell as it's runners.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 26 Dec 2024
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 31 Dec 2024
https://github.com/sedrubal/webcrawler
Crawl sites and search for security issues.
crawler script security website-auditing
Last synced: 24 Jan 2025
https://github.com/abx123/crawler
Simple lambda function to crawl daily web novel updates.
crawler firebase-database golang lambda-functions
Last synced: 07 Dec 2024
https://github.com/abx123/coronachan
Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.
crawler lambda-functions python
Last synced: 07 Dec 2024
https://github.com/rcmilan/ex-web-scraping
Web Scraping com F#
crawler f-sharp fsharp fsharp-data scraper web-scraping xplot
Last synced: 17 Jan 2025
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 02 Jan 2025