Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-05 00:06:41 UTC
- JSON Representation
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 03 Aug 2024
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 31 Oct 2024
https://github.com/amirhoseinsb/Cloud_Player_V2
You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.
cloud-player crawler crawling music music-player programming python url-player
Last synced: 04 Aug 2024
https://github.com/omilab/internet-archive-link-extractor
Tool for extracting external links of a URL from Internet Archive snapshots
Last synced: 07 Aug 2024
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 13 Oct 2024
https://github.com/thesp0nge/nightcrawler
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 12 Oct 2024
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 11 Oct 2024
https://github.com/the1812/bingwallpapers
A tool for downloading wallpapers from Bing.
Last synced: 04 Nov 2024
https://github.com/fedebotu/neurips2022-openreviewdata
Crawl & Visualize NeurIPS 2022 Data from OpenReview
crawler dataset neurips neurips-2022 openreview peer-review review scraper
Last synced: 06 Nov 2024
https://github.com/nakabonne/webcrawlerforserps
Web crawler that scrapes Google search results
Last synced: 24 Oct 2024
https://github.com/windfarer/biu
biubiubiu~~ I'm a tiny web crawler framework
crawler python spider spider-framework web-crawler
Last synced: 28 Oct 2024
https://github.com/softmarshmallow/inked-news-crawler
🕷 korean news source crawler (realtime & bulk)
crawler naver-news python3 scrapy
Last synced: 11 Oct 2024
https://github.com/feedeo/youtube-channel-crawler
YouTube Channel :tv: Crawler
crawler youtube youtube-channel
Last synced: 11 Oct 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 28 Oct 2024
https://github.com/poyea/coronaflight-hkg
😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)
corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs
Last synced: 28 Oct 2024
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 28 Oct 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/capturr/jsonld-extract
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).
cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript
Last synced: 28 Oct 2024
https://github.com/simin75simin/libgencrawl
crawl all books from a library genesis search
crawler free-software libgen python3 scraper
Last synced: 05 Nov 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 24 Oct 2024
https://github.com/dotenorio/freeloader-of-data
A simple crawler or scraper to get open graph and other meta data from any website.
crawler graph hacktoberfest meta-data open-graph scraper
Last synced: 25 Oct 2024
https://github.com/jean-baptiste-camps/iiif-crawler
Interrogate IIIF servers and get images of manuscripts
crawler iiif iiif-image manuscripts
Last synced: 11 Oct 2024
https://github.com/bugfishtm/bugfish-image-downloader
💾 Bugfish Image Downloader: Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage. Download now!
bugfish bugfish-software bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop
Last synced: 06 Nov 2024
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 28 Oct 2024
https://github.com/AmirAref/Torobot
an inline telegram robot to easy access and search in torob.com products from telegram.
crawler python python-telegram-bot scraper telegtam-bot
Last synced: 05 Aug 2024
https://github.com/AmirAref/DivarCrawler
an script to crawl divar.ir and extract phone numbers
Last synced: 05 Aug 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 01 Nov 2024
https://github.com/birkhofflee/blizzard_forum.js
An unofficial Node.js API for Blizzard Forums. (works in 2019)
Last synced: 08 Oct 2024
https://github.com/mrrfv/webarchive
Crawls websites and saves found URLs to a file.
archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping
Last synced: 27 Oct 2024
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 12 Oct 2024
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 11 Oct 2024
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 12 Oct 2024
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 15 Oct 2024
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 27 Oct 2024
https://github.com/leomaurodesenv/smm-course-search
A package to searching courses - Super Mario Maker
bookmark-site crawler javascript json mario-game mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 27 Oct 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 31 Oct 2024
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 05 Nov 2024
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Oct 2024
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 13 Oct 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/ozansz/github-crawler
A basic utility for crawling users and e-mails of users
Last synced: 16 Oct 2024
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 23 Oct 2024
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 11 Oct 2024
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 04 Nov 2024
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 05 Nov 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 09 Oct 2024
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 28 Oct 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 02 Aug 2024
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 02 Aug 2024
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 10 Oct 2024
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 02 Aug 2024
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 31 Oct 2024
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 12 Oct 2024
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 17 Oct 2024
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 06 Nov 2024
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 17 Oct 2024
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 17 Oct 2024
https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse
[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.
browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms
Last synced: 13 Oct 2024
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 11 Oct 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Oct 2024
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 11 Oct 2024
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 11 Oct 2024
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 11 Oct 2024