Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-11 00:06:38 UTC
- JSON Representation
https://github.com/mirocow/yii2-crawler
Http concurrent crawler for Yii2
concurrency crawler guzzle yii2-extension
Last synced: 17 Jan 2025
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 12 Dec 2024
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source
Last synced: 10 Dec 2024
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 19 Dec 2024
https://github.com/doroudi/imdb-crawler
imdb.com movies crawler in scrapy
crawler data-mining python scrapy
Last synced: 06 Feb 2025
https://github.com/xcrypt0r/hyacinth
🌸 Dcinside image crawler with deadly simple structure
beautifulsoup4 crawler dcinside parsing pyqt5 pyside2
Last synced: 09 Jan 2025
https://github.com/natlee/myanimelist-comment-crawler
Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite
Last synced: 21 Jan 2025
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 09 Nov 2024
https://github.com/omerdogan3/kitapp-crawler
Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.
book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler
Last synced: 06 Feb 2025
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 27 Oct 2024
https://github.com/hxr16f/ss-grabber
Automation script for downloading user screenshots.
automation crawler downloader grabber lightshot screenshot script
Last synced: 27 Nov 2024
https://github.com/trudi-group/mc-crawler
A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).
Last synced: 02 Dec 2024
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 12 Nov 2024
https://github.com/basemax/firstselenium
Some sample codes for using selenium in Python just for fun.
crawl crawler crawlers crawling python python-selenium python3 selenium selenium-example selenium-py selenium-python selenium-sample selenium-tests selenium-website
Last synced: 09 Feb 2025
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 27 Oct 2024
https://github.com/basemax/instagramseleniumhashtagimagepython
Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.
crawler crawler-python crawlers instagram python python-selenium selenium selenium-python
Last synced: 09 Feb 2025
https://github.com/eished/tujigu_crawler
tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler
Last synced: 29 Jan 2025
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 15 Oct 2024
https://github.com/franjid/filmaffinity-crawler
Crawl and scrape films from filmaffinity.com (with nodejs)
crawler filmaffinity javascript node nodejs scraper
Last synced: 27 Jan 2025
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/alishahbazi81/jobcrawler
Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel
asp-net-core crawler jobs jobsearch telegram telegram-bot
Last synced: 11 Nov 2024
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 17 Nov 2024
https://github.com/hktalent/scrapysite
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
crawler elasticsearch go scraping site spider web
Last synced: 19 Nov 2024
https://github.com/yjyoon-dev/nara-crawler
Crawler for National Archives Catalog
Last synced: 20 Nov 2024
https://github.com/wenyalintw/job-scraper-bot
幫朋友做好玩的Telegram機器人,已部署到Heroku
amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping
Last synced: 11 Nov 2024
https://github.com/librecodecoop/querido-diario-php
Brazilian government gazettes, accessible to everyone.
civic-tech crawler data-science gazette-crawler governments-gazettes govtech hacktoberfest open-data php php7 politics spider
Last synced: 29 Nov 2024
https://github.com/cr0hn/feed-to-exporter
Get RSS Feed and export as Wordpress Post
Last synced: 07 Nov 2024
https://github.com/frectonz/rampilo
A telegram crawler
crawler rust telegram telegram-crawler
Last synced: 14 Nov 2024
https://github.com/manuel-lang/autonomous-semantic-search-engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract
Last synced: 13 Nov 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 22 Jan 2025
https://github.com/achannarasappa/locust-cli
Developer tools to accelerate development of Locust jobs
cli crawler headless-chrome puppeteer scraper
Last synced: 19 Jan 2025
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/capturr/price-extract
Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.
amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript
Last synced: 07 Jan 2025
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/joshuaquek/docusite-to-pdf
Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.
crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper
Last synced: 12 Jan 2025
https://github.com/elektrostudios/fhm-crawler-freehardmusic.com
Crawls download urls of albums from freehardmusic.com website
albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms
Last synced: 29 Jan 2025
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 13 Jan 2025
https://github.com/ribeirogab/technology-insights
Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.
crawler python scraping typescript
Last synced: 19 Nov 2024
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 06 Jan 2025
https://github.com/YektaDev/Krawler
A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.
crawl crawler crawlers crawling
Last synced: 06 Feb 2025
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/basemax/film2serial-api-service-crawler
Crawling content and Movies of a Persian site using PHP.
crawler crawler-movie crawler-php crawlers movie-crawler movie-database php php-crawler php7 php74
Last synced: 23 Jan 2025
https://github.com/simoninithomas/news-crawler-parse-backend
This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend
Last synced: 17 Jan 2025
https://github.com/alexmili/reachable
Check if a URL exists and is reachable
crawler health-check monitoring reachability webscraping
Last synced: 10 Dec 2024
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 21 Dec 2024
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 08 Feb 2025
https://github.com/obaskly/kikfriender.com-bot
A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.
ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist
Last synced: 08 Jan 2025
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 13 Nov 2024
https://github.com/spa5k/quick-scraper
An easy, lightweight scraper built using typescript for good developer experience.
crawler dx easy-to-use esbuild scraper typescript
Last synced: 13 Nov 2024
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 10 Feb 2025
https://github.com/mikirasora/osuplayedbeatmapscrawler
A crawler that fetch and download osu!beatmaps which you had played
Last synced: 01 Jan 2025
https://github.com/pyaesoneaungrgn/2d-crawler
2D crawler for set.or.th
2d 2d-crawler crawler myanmar php
Last synced: 09 Nov 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 28 Dec 2024
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 23 Dec 2024
https://github.com/georgea93/crawley
nodejs web crawler
crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn
Last synced: 21 Jan 2025
https://github.com/indatawetrust/reporter
Crawler queue creation tool for paging
Last synced: 13 Dec 2024
https://github.com/ktont/curlas
a nodejs spider tool
chrome-extension crawler spider
Last synced: 13 Jan 2025
https://github.com/hyeockjinkim/baekjoon-management
Management program of BoJ
Last synced: 25 Jan 2025
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 25 Dec 2024
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 10 Feb 2025
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 22 Jan 2025
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 07 Jan 2025
https://github.com/sergioburdisso/solidscraper
Easy to use JQuery-Like API for Web Scraping/Crawling.
crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping
Last synced: 23 Nov 2024
https://github.com/huzecong/film-spider
Spiders crawling for film listing websites.
Last synced: 11 Jan 2025
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 26 Jan 2025
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 12 Jan 2025
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 24 Jan 2025
https://github.com/mmqnym/etherscan_tracker
Show how to tacker wallet on etherscan.io
Last synced: 18 Jan 2025
https://github.com/igeligel/teamfortressoutpostapi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 19 Nov 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 14 Nov 2024
https://github.com/mrmarble/mineseek
Minecraft server scanner
crawler minecraft minecraft-server scanner slp
Last synced: 17 Jan 2025
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 14 Nov 2024
https://github.com/hctilg/pinterest-crawler
Downloads all images suitable for search
Last synced: 07 Nov 2024