Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-05 00:06:41 UTC
- JSON Representation
https://github.com/Knovour/json-web-crawler
Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.
crawler javascript jquery json web-crawler
Last synced: 03 Aug 2024
https://github.com/pourmand1376/persiancrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 11 Oct 2024
https://github.com/racinmat/premium-downloader
crawler pornhub pornhub-downloader python
Last synced: 06 Nov 2024
https://github.com/vignif/crawler-google-scholar
This bot crawls and downloads statistics and pictures from google scholar's researchers.
crawler downloading-statistics google-scholar indexes statistics
Last synced: 06 Nov 2024
https://github.com/neuralegion/bright-cli
Command Line Interface (CLI) tool for NeuraLegion's solutions.
api cli crawler cyber-security devops har nexploit oas secops security typescript
Last synced: 14 Oct 2024
https://github.com/pourmand1376/PersianCrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 04 Aug 2024
https://github.com/kasthack-labs/kasthack.osp
Генератор сырых дампов пользователей VK.
crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte
Last synced: 26 Sep 2024
https://github.com/selmi-karim/img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage
Last synced: 15 Oct 2024
https://github.com/ruichongliu/Crawler_pubg.op.gg
This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取
beautifulsoup4 crawler pubg python3 scrape selenium
Last synced: 29 Oct 2024
https://github.com/fanyong920/crawlitem
用于爬取淘宝天猫网页的谷歌插件
crawler javascript taobao tmall
Last synced: 27 Oct 2024
https://github.com/omarhashem123/venom
Tool designed for fast crawl and extract endpoints
Last synced: 04 Aug 2024
https://github.com/sigoden/rag-crawler
Crawl a website to generate knowledge file for RAG
Last synced: 27 Oct 2024
https://github.com/MontFerret/worker
Containerized Ferret worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: 04 Nov 2024
https://github.com/fooock/robots.txt
:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot
Last synced: 27 Oct 2024
https://github.com/Selbi182/SpotifyDiscoveryBot
A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
bot crawler java music spotify spring-boot springboot sqlite
Last synced: 27 Oct 2024
https://github.com/ikergarcia1996/questionclustering
Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY
clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings
Last synced: 27 Oct 2024
https://github.com/kirillplatonov/proxy_manager
Ruby proxy manager. Gem for easy usage proxy in parser/web bots.
Last synced: 21 Oct 2024
https://github.com/danhje/dead-link-crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
async broken-links crawler dead-links python python3
Last synced: 04 Nov 2024
https://github.com/saltyshiomix/web-master
Web mastering tools for my personal services
crawler javascript nodejs scraper typescript web
Last synced: 27 Oct 2024
https://github.com/valmisson/ytubes
Search for videos, playlists, channels, movies. live and musics on youtube without api key.
channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube
Last synced: 11 Oct 2024
https://github.com/maxgio92/krawler
A crawler for kernel releases distributed by the major Linux distributions.
Last synced: 28 Oct 2024
https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy
Scraping Amazon website using Proxies for extracting Mobile details
amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents
Last synced: 27 Oct 2024
https://github.com/floschnell/flatcrawl-processors
A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.
bot crawler flatcrawl flats real-estate rentals-search telegram
Last synced: 12 Aug 2024
https://github.com/refraction-ray/wos-statistics
The crawler for data on web of science, especially focus on the analysis of citation data
aiohttp citation crawler webofscience
Last synced: 15 Oct 2024
https://github.com/kodjunkie/node-raspar
🕷️ Easily scrap the web for torrent and media files.
api api-rest api-wrapper cli crawler crawling crawling-tool docker expressjs javascript movies mp3 music node-js nodejs scraper series torrent torrent-downloader video
Last synced: 15 Oct 2024
https://github.com/gridaco/figma-archives
Figma Files Scraper for Research & Studies
crawler dataset design-database figma machine-learning scrapy selenium
Last synced: 27 Oct 2024
https://github.com/xiaoluoboding/metafy-svg
Easily crawl a website's metadata and generate SVG as a service.
crawler metadata saas serverless-functions svg vercel-serverless
Last synced: 28 Oct 2024
https://github.com/a3r0id/httpscan
Scan a host for open HTTP ports and gain information about the services present.
crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery
Last synced: 06 Nov 2024
https://github.com/ototot/judgegirl-scoreboard
A Fancy Scoreboard for JudgeGirl
crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2
Last synced: 17 Oct 2024
https://github.com/cybercongress/crawler
A toolchain for bringing web2 to web3
cosmos-sdk crawler cyber cyberd ipfs web3 wiki
Last synced: 03 Aug 2024
https://github.com/begrossi/anp-price-collector
ANP Price Collector
crawler experiment not-maintained scrapy-crawler
Last synced: 23 Oct 2024
https://github.com/redco/goose-starter-kit
This is a starter kit for redco/goose-parser
crawler docker goose goose-parser parser starter-kit
Last synced: 05 Nov 2024
https://github.com/stefanocudini/node-fetch-dom
Magic utility that extract javascript global variables from a remote html page.
crawler dom nodejs scraping webscraping
Last synced: 19 Oct 2024
https://github.com/matheuscas/pynfce
Busca e extrai dados de uma NFCe dada sua URL de acesso.
Last synced: 02 Oct 2024
https://github.com/thesoenke/news-crawler
Crawler that collects and extracts content of daily published news articles
Last synced: 23 Oct 2024
https://github.com/petrpatek/airbnb-scraper
Apify public actor for scraping Airbnb homes.
airbnb airbnb-api apify crawler data-extraction scrape
Last synced: 27 Oct 2024
https://github.com/johansatge/psi-report
Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.
cli crawler html-report pagespeed-insights
Last synced: 30 Oct 2024
https://github.com/BroNils/GoogleSearch-CLI
Search anything on Google without captcha
captcha crawler google googlesearch googlesearch-cli recaptcha search-engine
Last synced: 30 Oct 2024
https://github.com/doreanbyte/katswiri
A crawler to find job listings and aggregate them from multiple sources
assistant crawler employment-opportunities job-aggreg job-finder time-management
Last synced: 07 Sep 2024
https://github.com/byt3n33dl3/crawler_v2
remote access trojan, RAT tools for penetration testing on a devices, access real time with client devices after the malware hits the kernels. Trust attack
Last synced: 31 Oct 2024
https://github.com/yggverse/yggo
YGGo! Distributed Web Search Engine
alt-web crawler curl distributed federative fts5 js-less mysql open-source parser pdo php privacy-oriented search-engine sphinx sphinxsearch spider web web-archive yggdrasil
Last synced: 06 Nov 2024
https://github.com/louis70109/pleaguebot
P+ League Chatbot(unofficial)(deprecated)
basketball chatbot crawler line
Last synced: 15 Oct 2024
https://github.com/sobak/scrawler
Declarative, scriptable web robot (crawler) and scrapper
crawler crawler-engine robots-txt scraper scraping-websites
Last synced: 29 Oct 2024
https://github.com/davideviolante/socialblade-com-api
Unofficial APIs for socialblade.com website.
crawler scraper scraping social social-media socialblade
Last synced: 02 Nov 2024
https://github.com/whitejoce/Get_Weather
通过获取IP定位,爬取当地的天气(不需要API)
crawler python3 spider weather-forecast
Last synced: 01 Aug 2024
https://github.com/cristipufu/scrapy-net
Scrapy the web scraping tool - a naive implementation in C#
Last synced: 11 Oct 2024
https://github.com/nadar/crawler
A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.
crawler hacktoberfest html pdf php
Last synced: 15 Oct 2024
https://github.com/lysandrejik/omegle-crawler-node
Node library to connect to and interact with the Omegle website.
Last synced: 23 Oct 2024
https://github.com/hoc081098/comic_app_server_nodejs
Node.js sever for android comic app | https://comic-app-081098.herokuapp.com/
comic-app crawler nodejs nodejs-crawler nodejs-typescript typescript
Last synced: 31 Oct 2024
https://github.com/rodyherrera/codexdrake
An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀
adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia
Last synced: 06 Nov 2024
https://github.com/ne-lexa/roach-php-bundle
Symfony bundle for roach-php/core
crawler php roach-php scrapy spider symfony symfony-bundle
Last synced: 12 Oct 2024
https://github.com/jtiala/wpdl
⬇️ Scrape pages, posts, images and other data from a WordPress instance.
crawler downloader scraper scraping wordpress
Last synced: 23 Oct 2024
https://github.com/theritikchoure/crawlyx
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping
Last synced: 12 Oct 2024
https://github.com/ivan-sincek/scrapy-scraper
Web crawler and scraper based on Scrapy and Playwright's headless browser.
bug-bounty crawler crawling downloader downloading ethical-hacking headless-browser javascript offensive-security penetration-testing python red-team-engagement scraper scraping scrapy security spider spidering web web-penetration-testing
Last synced: 16 Oct 2024
https://github.com/leonzucchini/Recipes
Project to get and analyse data on recipes from chefkoch.de
Last synced: 04 Nov 2024
https://github.com/matheuscas/pycnpj-crawler
Mais um módulo para extrair dados de empresas a partir do CNPJ
Last synced: 02 Oct 2024
https://github.com/Ivan-Alone/InstaStories-Saver
Program to saving Instagram Stories
api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories
Last synced: 05 Aug 2024
https://github.com/hironsan/japanese-news-crawler
A complete automated japanese news crawler built on the top of Scrapy framework
Last synced: 27 Oct 2024
https://github.com/xunzhuo/airspider
A Fast and Light Python Spider Framework 🕷️
asynchronous crawler crawler-python distributed python3 redis spider spider-framework web
Last synced: 28 Oct 2024
https://github.com/ivan-alone/instastories-saver
Program to saving Instagram Stories
api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories
Last synced: 27 Oct 2024
https://github.com/gbolmier/newspaper-crawler
:spider: An autonomous French newspaper crawler based on Scrapy framework
Last synced: 13 Oct 2024
https://github.com/sanix-darker/ziim
Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself
cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions
Last synced: 14 Oct 2024
https://github.com/mithro/fastsvncrawler
fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content
crawler export import subversion svn vcs
Last synced: 14 Oct 2024
https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 23 Oct 2024
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
apartment-rents berlin crawler gis immobilienscout24
Last synced: 04 Nov 2024
https://github.com/eight04/ptt-mail-backup
一個用來抓取 PTT 站內信的 BBS Bot
bbs cli crawler ptt ptt-crawler python python3
Last synced: 28 Oct 2024
https://github.com/sebobo/shel.crawler
Neos based crawler for nodes and sites
Last synced: 14 Oct 2024
https://github.com/luyadev/luya-module-crawler
Crawle a Website and provide intelligent search results
crawler hacktoberfest intelligent-search luya search yii2
Last synced: 10 Oct 2024
https://github.com/tosone/githubtraveler
Travel all of the GitHub users, orgs, repos.
Last synced: 06 Nov 2024
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 31 Oct 2024
https://github.com/petersonjr/MetadataCrawler
A simple tool to extract metadata from relational databases
avro crawler database-schemas java jdbc metadata rdms relational-databases
Last synced: 02 Aug 2024
https://github.com/igeligel/BackpackLogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 02 Aug 2024
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 21 Oct 2024
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 03 Aug 2024
https://github.com/activatedgeek/winemag-dataset
Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:
crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting
Last synced: 14 Oct 2024
https://github.com/bbc2/discolinks
Command-line tool which checks a website for broken links.
broken-links crawler html http link-checker link-checkers link-checking validator web
Last synced: 28 Oct 2024
https://github.com/pps-22-scooby/pps-22-scooby
Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.
crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers
Last synced: 14 Oct 2024
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 11 Oct 2024
https://github.com/thesp0nge/nightcrawler
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 12 Oct 2024
https://github.com/softmarshmallow/inked-news-crawler
🕷 korean news source crawler (realtime & bulk)
crawler naver-news python3 scrapy
Last synced: 11 Oct 2024