Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-12-25 00:05:56 UTC
- JSON Representation
https://github.com/bgadrian/warmcache
A simple tool to scan your website to keep your cache hot & ready. Helper tool for Prerender, Squid, CDN etc..
cache cdn crawler go golang prerender prerenderio squid
Last synced: 15 Nov 2024
https://github.com/charles-hsiao/python-flightradar
Python airline/flights data crawler
airlines crawler flightradar flightradar24 flights python python-crawler python3
Last synced: 11 Nov 2024
https://github.com/minicli/curly
Simple Curl Client
crawler curl hacktoberfest php
Last synced: 19 Dec 2024
https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy
Scraping Amazon website using Proxies for extracting Mobile details
amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents
Last synced: 27 Oct 2024
https://github.com/gridaco/figma-archives
Figma Files Scraper for Research & Studies
crawler dataset design-database figma machine-learning scrapy selenium
Last synced: 27 Oct 2024
https://github.com/floschnell/flatcrawl-processors
A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.
bot crawler flatcrawl flats real-estate rentals-search telegram
Last synced: 02 Dec 2024
https://github.com/saltyshiomix/web-master
Web mastering tools for my personal services
crawler javascript nodejs scraper typescript web
Last synced: 27 Oct 2024
https://github.com/wearetyomsmnv/gptbuster
Generative web directory fuzzer,crawling and subdomain checker based on chatgpt
crawler gpt hacking pentesting python3 reconnaissance web
Last synced: 07 Nov 2024
https://github.com/postman-open-technologies/openapi-web-search
OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀
crawler dataset gsoc gsoc-2023 openapi search-engine swagger
Last synced: 07 Nov 2024
https://github.com/fanhuaandluomu/qqzoneparse
模拟登陆QQ空间,获取好友信息,并做分析(年龄分布、性别分布、地址分布等)具体参见说明文档及1049755192文件夹下的分析结果展示。
crawler python27 qqzone spider
Last synced: 12 Nov 2024
https://github.com/ezzcodeezzlife/scraper-instagram
Scrape data from Instagram without applying for the authenticated API 🎯
auth authentication crawler ig instagram instagram-api instagram-client instagram-scraper javascript js nodejs npm scraper scraper-instagram scraping wrapper
Last synced: 17 Dec 2024
https://github.com/ototot/judgegirl-scoreboard
A Fancy Scoreboard for JudgeGirl
crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2
Last synced: 08 Nov 2024
https://github.com/stefanocudini/node-fetch-dom
Magic utility that extract javascript global variables from a remote html page.
crawler dom nodejs scraping webscraping
Last synced: 08 Nov 2024
https://github.com/amirzenoozi/insta-downloader
You Can Download Instagram Post With This Script
crawler crawling downloader instagram
Last synced: 20 Nov 2024
https://github.com/dev-chenxing/jjwxc-crawler
A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy | 基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档
chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word
Last synced: 13 Nov 2024
https://github.com/krolow/marsvin
Structural Crawler framework written in PHP
Last synced: 28 Nov 2024
https://github.com/a3r0id/httpscan
Scan a host for open HTTP ports and gain information about the services present.
crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery
Last synced: 06 Nov 2024
https://github.com/binaryify/express-middleware-seo
Webpage pre-rendering middleware, base on headless chrome⚡️
chrome crawler express express-middleware nodejs seo
Last synced: 08 Nov 2024
https://github.com/redco/goose-starter-kit
This is a starter kit for redco/goose-parser
crawler docker goose goose-parser parser starter-kit
Last synced: 05 Nov 2024
https://github.com/rsoury/serverless-web-crawler
Serverless Web Crawler that executes for an indefinite amount of time. Perfect for Crawling Jobs that last longer than a minute and only need to be executed once or twice a month.
boilerplate crawler fargate serverless serverless-framework template
Last synced: 10 Nov 2024
https://github.com/begrossi/anp-price-collector
ANP Price Collector
crawler experiment not-maintained scrapy-crawler
Last synced: 23 Oct 2024
https://github.com/scrapingant/scrapingant-client-js
ScrapingAnt API client for JavaScript / Node.js.
crawler scraper scraping scrapingant webscraping
Last synced: 16 Dec 2024
https://github.com/cybercongress/crawler
A toolchain for bringing web2 to web3
cosmos-sdk crawler cyber cyberd ipfs web3 wiki
Last synced: 15 Nov 2024
https://github.com/jsrei/javascript-window-listener-library
javascript逆向开发基础组件,监听window的变化
crawler js-library js-reverse reverse-engineering web-security-research
Last synced: 16 Nov 2024
https://github.com/spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping
Last synced: 23 Dec 2024
https://github.com/elektrostudios/google-search-url-crawler
Desktop app that crawls urls from Google's search engine results
crawl crawler crawlers crawling dotnet google google-crawler google-search googlesearch hacking search search-engine searcher tool tools url url-crawler vbnet windows winforms
Last synced: 01 Dec 2024
https://github.com/niloysikdar/go-imdb-crawler
Want to know which celebrities have a common birthday with yours? 👀 Get the full data about them. Made using Go + Colly
Last synced: 07 Nov 2024
https://github.com/betta-cyber/netease_music_api
netease cloud music api for python
crawler data-analysis netease-cloud-music
Last synced: 04 Dec 2024
https://github.com/geminidsystems/googlenewsscraper
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper
Last synced: 19 Nov 2024
https://github.com/thesoenke/news-crawler
Crawler that collects and extracts content of daily published news articles
Last synced: 09 Nov 2024
https://github.com/embeddinglayer/awesome-fingerprinting
A collection of browser fingerprinting projects, research, and resources. Intended as a way to aggregate research surrounding the subject.
automation browser-fingerprinting crawler device-fingerprint fingerprinting scraper security
Last synced: 17 Dec 2024
https://github.com/wux1an/fake-useragent
Provide random user agent
crawler random spider ua user-agent useragent
Last synced: 20 Nov 2024
https://github.com/johansatge/psi-report
Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.
cli crawler html-report pagespeed-insights
Last synced: 30 Oct 2024
https://github.com/BroNils/GoogleSearch-CLI
Search anything on Google without captcha
captcha crawler google googlesearch googlesearch-cli recaptcha search-engine
Last synced: 30 Oct 2024
https://github.com/viclafouch/fetch-crawler
📌 A Node.JS Web crawler using the API Fetch to scrap static websites
cheerio crawler crawling-sites fetch-api nodejs promises scrapping
Last synced: 02 Dec 2024
https://github.com/petrpatek/airbnb-scraper
Apify public actor for scraping Airbnb homes.
airbnb airbnb-api apify crawler data-extraction scrape
Last synced: 27 Oct 2024
https://github.com/freekatz/jd_sentiment_analysis
一个简单的京东商品评论爬虫、处理、可视化、情感分析与模型评估实践
Last synced: 07 Dec 2024
https://github.com/dxsooo/shortvideocrawl
Short video crawler based on scrapy
crawler kuaishou scrapy spider video-crawler
Last synced: 15 Nov 2024
https://github.com/gimnathperera/abans-lk-webscraping
Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan web sites
Last synced: 12 Nov 2024
https://github.com/jacraig/spidey
A multi threaded web crawler library that is generic enough to allow different engines to be swapped in.
Last synced: 14 Dec 2024
https://github.com/louis70109/pleaguebot
P+ League Chatbot(unofficial)(deprecated)
basketball chatbot crawler line
Last synced: 15 Oct 2024
https://github.com/hfrost0/simple-baidu-image-download
只有30行的百度图片爬虫,只用最简单的语句
Last synced: 14 Nov 2024
https://github.com/nadar/crawler
A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.
crawler hacktoberfest html pdf php
Last synced: 15 Oct 2024
https://github.com/cristipufu/scrapy-net
Scrapy the web scraping tool - a naive implementation in C#
Last synced: 11 Oct 2024
https://github.com/discovai/discovai-crawl
🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.
ai api crawler embedding vector-database web-scraping
Last synced: 12 Nov 2024
https://github.com/yggverse/yggo
YGGo! Distributed Web Search Engine
alt-web crawler curl distributed federative fts5 js-less mysql open-source parser pdo php privacy-oriented search-engine sphinx sphinxsearch spider web web-archive yggdrasil
Last synced: 06 Nov 2024
https://github.com/doreanbyte/katswiri
A crawler to find job listings and aggregate them from multiple sources
assistant crawler employment-opportunities job-aggreg job-finder time-management
Last synced: 07 Sep 2024
https://github.com/twtrubiks/google-play-store-spider-bs4-excel
Google-Play-Store-spider use Beautiful Soup on Python to EXCEL
beautifulsoup crawler google-play-store pyexcel python sql-database xlsx
Last synced: 16 Nov 2024
https://github.com/sobak/scrawler
Declarative, scriptable web robot (crawler) and scrapper
crawler crawler-engine robots-txt scraper scraping-websites
Last synced: 29 Oct 2024
https://github.com/byt3n33dl3/crawler_v2
remote access trojan, RAT tools for penetration testing on a devices, access real time with client devices after the malware hits the kernels. Trust attack
Last synced: 31 Oct 2024
https://github.com/whitejoce/Get_Weather
通过获取IP定位,爬取当地的天气(不需要API)
crawler python3 spider weather-forecast
Last synced: 08 Nov 2024
https://github.com/wangy8961/python3-concurrency-pics-01
爬虫多线程或异步下载 http://gank.io/api/data/%E7%A6%8F%E5%88%A9/1000/1 所分享的美女图片
aiohhtp asyncio coroutine crawler progressbar python3 requests threadpool
Last synced: 11 Nov 2024
https://github.com/catalyst/moodle-tool_crawler
A moodle link crawling robot, find broken, slow and oversized links
Last synced: 11 Nov 2024
https://github.com/davideviolante/socialblade-com-api
Unofficial APIs for socialblade.com website.
crawler scraper scraping social social-media socialblade
Last synced: 02 Nov 2024
https://github.com/tca166/ck3-history-extractor
A program designed for creating an encyclopedia of sorts containing your ck3 history
ck3 crawler python3 rust save-file save-files
Last synced: 14 Dec 2024
https://github.com/theritikchoure/crawlyx
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping
Last synced: 12 Oct 2024
https://github.com/ne-lexa/roach-php-bundle
Symfony bundle for roach-php/core
crawler php roach-php scrapy spider symfony symfony-bundle
Last synced: 12 Oct 2024
https://github.com/bjoern-hempel/php-web-crawler
A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.
crawler mit-license php recursive webcrawler webscraper xpath
Last synced: 07 Nov 2024
https://github.com/rodyherrera/codexdrake
An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀
adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia
Last synced: 06 Nov 2024
https://github.com/lablnet/web-spider
Multi threaded Web crawler
crawl crawler mit open-source package project python spider
Last synced: 20 Nov 2024
https://github.com/qzcool/cpef
私募基金管理人查询数据接口。Chinese Private Equity Funds APIs.
china crawler data finance fund funds hedge-funds private-equity python python3 scraper scraping-websites spider
Last synced: 21 Nov 2024
https://github.com/exp-codes/bilibili-plugin
哔哩哔哩插件姬
bilibili crawler live programming
Last synced: 16 Dec 2024
https://github.com/hoc081098/comic_app_server_nodejs
Node.js sever for android comic app | https://comic-app-081098.herokuapp.com/
comic-app crawler nodejs nodejs-crawler nodejs-typescript typescript
Last synced: 31 Oct 2024
https://github.com/confact/spider.cr
Spider.cr is a spider crawler in Crystal. It handles collecting, scraping, and parsing. So you can spend your time collecting the data you want on a big scale.
Last synced: 08 Nov 2024
https://github.com/ivan-sincek/scrapy-scraper
Web crawler and scraper based on Scrapy and Playwright's headless browser.
bug-bounty crawler crawling downloader downloading ethical-hacking headless-browser javascript offensive-security penetration-testing python red-team-engagement scraper scraping scrapy security spider spidering web web-penetration-testing
Last synced: 08 Nov 2024
https://github.com/leonzucchini/Recipes
Project to get and analyse data on recipes from chefkoch.de
Last synced: 04 Nov 2024
https://github.com/lysandrejik/omegle-crawler-node
Node library to connect to and interact with the Omegle website.
Last synced: 23 Oct 2024
https://github.com/jtiala/wpdl
⬇️ Scrape pages, posts, images and other data from a WordPress instance.
crawler downloader scraper scraping wordpress
Last synced: 23 Oct 2024
https://github.com/lucasayres/linkedin-crawler-connections
Linkedin crawler to search and collect my connections (profile picture, name, occupation, location, email and phone).
chromedriver connections crawler linkedin profile python scraper selenium
Last synced: 19 Nov 2024
https://github.com/bringyourownideas/laravel-sitemap
Simple crawler and sitemap generator for Laravel. No headless browser - just a crawler.
crawler laravel laravel-sitemap sitemap-generator sitemap-xml
Last synced: 12 Nov 2024
https://github.com/helingfeng/stay-reader
📚Miniprogram Book Reader
crawler laravel-application miniprogram php
Last synced: 04 Dec 2024
https://github.com/gimnathperera/web-scraping-riyasewana.lk
Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan vehicle selling web sites.
crawler python scrapy spider webscraping
Last synced: 12 Nov 2024
https://github.com/gbolmier/newspaper-crawler
:spider: An autonomous French newspaper crawler based on Scrapy framework
Last synced: 13 Oct 2024
https://github.com/SupervisedCo/HyperCrawlTurbo
HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.
ai crawler ml nlp retrieval retrieval-augmented-generation
Last synced: 04 Dec 2024
https://github.com/dori-dev/flask-corona-info
Live Corona statistics and information site with flask.
coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider
Last synced: 09 Nov 2024
https://github.com/sanix-darker/ziim
Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself
cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions
Last synced: 14 Oct 2024
https://github.com/xunzhuo/airspider
A Fast and Light Python Spider Framework 🕷️
asynchronous crawler crawler-python distributed python3 redis spider spider-framework web
Last synced: 28 Oct 2024
https://github.com/agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
crawler crawling datascraping extract-data scraping webscraper webscraping
Last synced: 25 Nov 2024
https://github.com/hironsan/japanese-news-crawler
A complete automated japanese news crawler built on the top of Scrapy framework
Last synced: 13 Dec 2024
https://github.com/twtrubiks/google-play-store-spider-selenium
Google-Play-Store-spider use Selenium +Beautiful Soup on Python
beautifulsoup chrome crawler firefox python selenium spider sqlite
Last synced: 16 Nov 2024