Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-15 00:06:24 UTC
- JSON Representation
https://github.com/migalabs/armiarma
Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network
crawler ethereum libp2p monitoring
Last synced: 15 Nov 2024
https://github.com/veliovgroup/spiderable-middleware
🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks
crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable
Last synced: 14 Oct 2024
https://github.com/ph-7/crawling-emails
Very simple bash script to crawl email addresses from a specific website.
bash crawler email email-scraper scrape scrape-email scraper scraping shell wget
Last synced: 28 Oct 2024
https://github.com/code4everything/visual-spider
欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow
crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization
Last synced: 29 Sep 2024
https://github.com/debugtalk/webcrawler
A web crawler based on requests-html, mainly targets for url validation test.
crawler requests-html web-crawler weblink
Last synced: 08 Nov 2024
https://github.com/minhhungit/github-action-rss-crawler
Auto crawl RSS feeds using Github Action
crawler csharp github-actions litedb netcore rss rss-crawler rss-items
Last synced: 09 Nov 2024
https://github.com/gomjellie/pysaint
[deprecated] 유세인트 파이썬 클라이언트
crawler sap soongsil unofficial
Last synced: 28 Oct 2024
https://github.com/mamal72/iranian-calendar-events
Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website
crawler events iranian jalali jalali-calendar persian
Last synced: 02 Nov 2024
https://github.com/kshru9/web-crawler
A multithreaded web crawler using two mechanism - single lock and thread safe data structures
concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader
Last synced: 28 Oct 2024
https://github.com/k1low/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 17 Oct 2024
https://github.com/k1LoW/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 04 Aug 2024
https://github.com/pykong/pypergrabber
Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper
Last synced: 08 Nov 2024
https://github.com/fedebotu/iclr2023-openreviewdata
Crawl & Visualize ICLR 2023 Data from OpenReview
crawler dataset iclr iclr2023 openreview peer-review review scraper
Last synced: 06 Nov 2024
https://github.com/italia/publiccode-crawler
publiccode.yml crawler for the Open Source software catalog of Developers Italia
crawler developers-italia hacktoberfest publiccode publiccodeyml
Last synced: 10 Nov 2024
https://github.com/riptl/ytpriv
YT metadata exporter
big-data crawler csv datascience json video youtube
Last synced: 03 Aug 2024
https://github.com/ERap320/CrowLeer
Powerful C++ web crawler based on libcurl
Last synced: 03 Aug 2024
https://github.com/alex-page/get-site-urls
🔗 Get all of the URL's from a website.
crawler sitemap-generator urls
Last synced: 27 Oct 2024
https://github.com/spider-rs/spider-nodejs
Spider ported to Node.js
crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript
Last synced: 05 Nov 2024
https://github.com/marcel0024/cococrawler
An declarative and easy to use web crawler and scraper in C#
cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper
Last synced: 12 Oct 2024
https://github.com/novemberde/serverless-crawler-demo
Serverless Architecture Crawler demo
aws crawler demo handson serverless
Last synced: 10 Nov 2024
https://github.com/dachcom-digital/pimcore-lucene-search
Pimcore Website Indexer (powered by Zend Search Lucene)
crawler lucene lucenesearch pimcore
Last synced: 14 Nov 2024
https://github.com/bartozzz/crawlerr
A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
crawler jsdom nodejs scraper spider web-crawler
Last synced: 08 Nov 2024
https://github.com/matheusfelipeog/froxy
Hide your IP with free proxies using Froxy 🔄
crawler free-proxy froxy hide-ip proxies proxies-scraper proxy python requests requests-module scraping
Last synced: 26 Oct 2024
https://github.com/Smartproxy/Python-scraper-tutorial
A short introduction to scraping with Python with given steps and an example scraper script.
beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 04 Aug 2024
https://github.com/aliosm/kontests
Competitive programming contests schedule
a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder
Last synced: 09 Oct 2024
https://github.com/mattwang44/uspto-patft-web-crawler
Crawler for fetching information of US Patents and PDF bulk download
crawler patent patent-crawler pyqt5 python3 uspto
Last synced: 02 Oct 2024
https://github.com/jurooravec/crawlee-one
Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.
actor apify crawlee crawler framework scraper scraping web
Last synced: 13 Nov 2024
https://github.com/o8e/soccer-scrape
:page_with_curl: Scrape football data from Bet365
bet365 betting crawler es6 football javascript puppeteer scraper soccer
Last synced: 13 Nov 2024
https://github.com/alessandrodd/googleplay_api
Google Play Unofficial Python 3 API Library
android crawler googleplay googleplay-api playstore
Last synced: 27 Oct 2024
https://github.com/ivan-sincek/chad
Search Google Dorks like Chad. / Broken link hijacking tool.
broken-link-hijacking bug-bounty crawler ethical-hacking google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web web-penetration-testing
Last synced: 15 Nov 2024
https://github.com/harismuneer/android-apps-downloader
📱 A tool to download android apps from Google Play Store and Xiaomi App Store (the famous Chinese Store).
android-application-downloader android-apps-crawler android-market-scraper android-research android-scraper android-tool app-downloader crawler crawling-tool google-play-application-downloader google-play-store-scraper gplaycli open-source-project python-scraper research-tool scraper scraping-tool wget-utility xiaomi-apps xiaomi-store-scraper
Last synced: 12 Nov 2024
https://github.com/kagami/tistore
:camera: Tistory photo grabber
crawler cross-platform electron tistory
Last synced: 22 Oct 2024
https://github.com/feng19/spider_man
SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.
crawler data-mining elixir erlang framework spider
Last synced: 29 Oct 2024
https://github.com/wwwwwydev/crawlist
A universal solution for web crawling lists
crawl crawler crawler-python python reptile
Last synced: 12 Nov 2024
https://github.com/tokahuke/lopez
Crawling and scraping the Web for fun and profit
crawler rust scraper seo web-scraping
Last synced: 14 Nov 2024
https://github.com/mechazawa/redbetter-wm2
Better.php crawler for Redacted that uses WhatManager
crawler flac redacted seedbox transcoding whatcd whatmanager
Last synced: 06 Nov 2024
https://github.com/rzo1/crawler4j
Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j
crawler crawler4j java spider web-crawler web-spider
Last synced: 29 Sep 2024
https://github.com/tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin
Last synced: 10 Nov 2024
https://github.com/fanhuaandluomu/qqspider
爬取QQ用户信息(qq号、昵称、生日、地址等基本信息)并做简要analysis。
Last synced: 12 Nov 2024
https://github.com/capjamesg/indieweb-search
Source code for the IndieWeb search engine.
crawler indieweb search search-engine
Last synced: 03 Aug 2024
https://github.com/ruedigervoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 08 Nov 2024
https://github.com/norconex/collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
crawler filesystem-crawler java norconex-filesystem-collector search-engine
Last synced: 11 Nov 2024
https://github.com/RuedigerVoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 08 Nov 2024
https://github.com/yokawasa/scrapy-azuresearch-crawler-samples
Scrapy as a Web Crawler for Azure Search Samples
azure azure-search crawler python python3 scrapy search
Last synced: 30 Oct 2024
https://github.com/Actomaton/ActoCrawler
🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.
Last synced: 09 Aug 2024
https://github.com/asing1001/movierater
A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.
apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript
Last synced: 07 Nov 2024
https://github.com/thaoshibe/crawl-original-google-images
python scripts for crawling original image from Google Images
chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search
Last synced: 11 Oct 2024
https://github.com/nvk681/gumo
A crawler that extracts data from a dynamic webpage. Written in node js.
crawler elasticsearch neo4j nodejs
Last synced: 11 Oct 2024
https://github.com/s045pd/sharingan
We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon
asyncio crawler httpx python38 social-network
Last synced: 07 Nov 2024
https://github.com/petehouston/udemy-crawler
Crawling Udemy course info and save into JSON format.
crawler crawling node node-cli udemy udemy-api udemy-crawl
Last synced: 23 Oct 2024
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler
Last synced: 19 Oct 2024
https://github.com/p0dalirius/crawlersuseragents
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
bugbounty crawler crawlers pentest request tool user-agent web
Last synced: 29 Oct 2024
https://github.com/fernandod1/producthunt-scraper
Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.
crawler crawling crawling-sites data-mining datamining producthunt producthunt-api producthunt-users python python-script python3 scrape scraped-data scraper scraper-engine scraping scraping-bot scraping-python scraping-tool scraping-websites
Last synced: 12 Nov 2024
https://github.com/ArchiveTeam/WebArchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 06 Nov 2024
https://github.com/smolijar/offensive-fortune
A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.
crawler fortune fortune-cookie vilejoke
Last synced: 07 Nov 2024
https://github.com/enijkamp/supermonkey
A crawler for automated Android UI testing.
Last synced: 09 Nov 2024
https://github.com/mauriceconrad/xml-parser
A Node.js XML DOM, Parser & Stringifier.
crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema
Last synced: 28 Oct 2024
https://github.com/paambaati/websight
🕷A simple but *really* fast crawler built with Node.js & TypeScript
coding-challenge crawler interview-questions javascript monzo nodejs typescript
Last synced: 08 Nov 2024
https://github.com/bkeepers/spiderman
your friendly neighborhood web crawler
crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping
Last synced: 09 Nov 2024
https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel
A toolkit for Spatie's Crawler and Laravel.
crawler laravel laravel-crawler php-crawler php-scraper spatie-crawler
Last synced: 12 Nov 2024
https://github.com/PadishahIII/SecretScraper
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper
Last synced: 13 Aug 2024
https://github.com/alinebastos/crawler
Web Crawler created with Node.js and Puppeteer
crawler fs javascript nodejs puppeteer scraping
Last synced: 05 Nov 2024
https://github.com/racinmat/premium-downloader
crawler pornhub pornhub-downloader python
Last synced: 06 Nov 2024
https://github.com/Knovour/json-web-crawler
Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.
crawler javascript jquery json web-crawler
Last synced: 03 Aug 2024
https://github.com/vignif/crawler-google-scholar
This bot crawls and downloads statistics and pictures from google scholar's researchers.
crawler downloading-statistics google-scholar indexes statistics
Last synced: 06 Nov 2024
https://github.com/omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 08 Nov 2024
https://github.com/pourmand1376/persiancrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 11 Oct 2024
https://github.com/inspirehep/hepcrawl
Scrapy project for feeds into INSPIRE-HEP
crawler harvest-data publishing python
Last synced: 15 Nov 2024
https://github.com/neuralegion/bright-cli
Command Line Interface (CLI) tool for NeuraLegion's solutions.
api cli crawler cyber-security devops har nexploit oas secops security typescript
Last synced: 14 Nov 2024
https://github.com/pourmand1376/PersianCrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 04 Aug 2024
https://github.com/chainski/chino-proxy-scraper
A python script that scrape proxies from frequently updated proxy sources.
crawler http https proxies proxy proxy-api proxygrabber proxyscrape-api proxyscraper proxytool python python3 scraper socks4 socks5
Last synced: 10 Nov 2024
https://github.com/shadawck/recon-archy
Linkedin Tools (and maybe later other source) to reconstruct a company hierarchy from scraping relations and jobs title
automation company-data crawler cybersecurity geckodriver golang linkedin organisational-analysis osint osinttool reconnaissance scraper selenium
Last synced: 15 Nov 2024
https://github.com/MontFerret/worker
Containerized Ferret worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: 04 Nov 2024
https://github.com/selmi-karim/img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage
Last synced: 08 Nov 2024
https://github.com/fooock/robots.txt
:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot
Last synced: 27 Oct 2024
https://github.com/fanyong920/crawlitem
用于爬取淘宝天猫网页的谷歌插件
crawler javascript taobao tmall
Last synced: 27 Oct 2024
https://github.com/Selbi182/SpotifyDiscoveryBot
A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
bot crawler java music spotify spring-boot springboot sqlite
Last synced: 27 Oct 2024
https://github.com/sigoden/rag-crawler
Crawl a website to generate knowledge file for RAG
Last synced: 27 Oct 2024
https://github.com/ruichongliu/Crawler_pubg.op.gg
This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取
beautifulsoup4 crawler pubg python3 scrape selenium
Last synced: 29 Oct 2024
https://github.com/omarhashem123/venom
Tool designed for fast crawl and extract endpoints
Last synced: 04 Aug 2024
https://github.com/kasthack-labs/kasthack.osp
Генератор сырых дампов пользователей VK.
crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte
Last synced: 26 Sep 2024
https://github.com/danhje/dead-link-crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
async broken-links crawler dead-links python python3
Last synced: 04 Nov 2024
https://github.com/montferret/worker
Containerized Ferret worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: 14 Nov 2024
https://github.com/kirillplatonov/proxy_manager
Ruby proxy manager. Gem for easy usage proxy in parser/web bots.
Last synced: 21 Oct 2024