Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-12-25 00:05:56 UTC
- JSON Representation
https://github.com/agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
crawler crawling datascraping extract-data scraping webscraper webscraping
Last synced: 25 Nov 2024
https://github.com/dori-dev/flask-corona-info
Live Corona statistics and information site with flask.
coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider
Last synced: 09 Nov 2024
https://github.com/ivan-alone/instastories-saver
Program to saving Instagram Stories
api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories
Last synced: 27 Oct 2024
https://github.com/vmarcosp/supervise-crawler
:male_detective: Supervise crawler
crawler esy ocaml reasonml webcrawler
Last synced: 18 Nov 2024
https://github.com/matheuscas/pycnpj-crawler
Mais um módulo para extrair dados de empresas a partir do CNPJ
Last synced: 19 Dec 2024
https://github.com/SupervisedCo/HyperCrawlTurbo
HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.
ai crawler ml nlp retrieval retrieval-augmented-generation
Last synced: 04 Dec 2024
https://github.com/szczyglis-dev/php-ultra-small-proxy
[PHP] Lightweight proxy with full support for sessions, cookies, POST/FORM submissions, and URL rewriting. The proxy offers two methods of URL rewriting: XML and Regex. It also includes features such as HTTP Auth, caching, and more.
cookies crawler crawler-php css http-client http-proxy networking proxy proxy-server webbrowser website www
Last synced: 14 Nov 2024
https://github.com/Ivan-Alone/InstaStories-Saver
Program to saving Instagram Stories
api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories
Last synced: 22 Nov 2024
https://github.com/mithro/fastsvncrawler
fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content
crawler export import subversion svn vcs
Last synced: 14 Oct 2024
https://github.com/vndee/visee
Just a typical search engine in this universe :fire::fire::fire:
crawler django docker e-commerce elasticsearch flask kafka python visual-search
Last synced: 18 Nov 2024
https://github.com/gbolmier/newspaper-crawler
:spider: An autonomous French newspaper crawler based on Scrapy framework
Last synced: 13 Oct 2024
https://github.com/michaelradu/web-crawler
A Web Crawler developed in Python.
crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling
Last synced: 01 Dec 2024
https://github.com/blesstosam/registerappleid
a node js program for registering appleid automatically
Last synced: 18 Nov 2024
https://github.com/lablnet/pakweather_scraper
A multi-threaded Pakistan Weather crawler written in JavaScript
crawler data mit-license open-source pakistan scraping weather weather-channel
Last synced: 20 Nov 2024
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 31 Oct 2024
https://github.com/luyadev/luya-module-crawler
Crawle a Website and provide intelligent search results
crawler hacktoberfest intelligent-search luya search yii2
Last synced: 10 Oct 2024
https://github.com/petersonjr/MetadataCrawler
A simple tool to extract metadata from relational databases
avro crawler database-schemas java jdbc metadata rdms relational-databases
Last synced: 13 Nov 2024
https://github.com/logocomune/botdetector
BotDetector is a golang library that detects Bot/Spider/Crawler from user agent
botdetector bots crawler go golang golang-library spider user-agent
Last synced: 11 Nov 2024
https://github.com/webcoast-dk/versatile-crawler
Extendable and easy to use crawler extension for TYPO3 CMS
crawler extendable indexing search typo3
Last synced: 12 Dec 2024
https://github.com/twtrubiks/pttstatistics
統計PTT看板推文 or 文章標題 熱門關鍵詞 on python
crawler ptt ptt-hot-key python statistics
Last synced: 16 Nov 2024
https://github.com/shawon922/jobs-crawler
Crawl IT/Telecommunication jobs from bdjobs.com
beautifulsoup4 crawler python3
Last synced: 09 Nov 2024
https://github.com/keul/allanon
A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them
Last synced: 11 Nov 2024
https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 23 Oct 2024
https://github.com/samiahmedsiddiqui/http-auth
Helps you to secure your whole site on the development time and admin pages from the Brute attack.
admin auth authentication brute-force brute-force-attacks crawl crawler http-auth http-authentication locked login restrict-pages restrict-site wordpress wordpress-plugin
Last synced: 25 Nov 2024
https://github.com/mediamonks/symfony-crawler-bundle
Implements the crawler package into Symfony
crawler php symfony symfony-bundle
Last synced: 03 Dec 2024
https://github.com/tsoliangwu0130/spotify-news
A Flask application to retrieve the singers' latest news according to your Spotify current playing song.
bootstrap crawler flask oauth2 python3 restful-api spotify-api
Last synced: 11 Nov 2024
https://github.com/eight04/ptt-mail-backup
一個用來抓取 PTT 站內信的 BBS Bot
bbs cli crawler ptt ptt-crawler python python3
Last synced: 28 Oct 2024
https://github.com/igeligel/backpacklogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 19 Nov 2024
https://github.com/sabinbajracharya/Insta-crawler
Pulls data from instagram and saves it to Firebase for storage and Algolia for search
accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper
Last synced: 07 Nov 2024
https://github.com/umihico/minigun-requests
Web scraping API to outsource tons of GET & xpath to cloud computing
crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping
Last synced: 15 Nov 2024
https://github.com/tosone/githubtraveler
Travel all of the GitHub users, orgs, repos.
Last synced: 06 Nov 2024
https://github.com/spekulatius/spatie-crawler-cached-queue-example
Example to demonstrate the usage of cached queues across multiple requests.
crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler
Last synced: 12 Nov 2024
https://github.com/visuellverstehen/t3fetch
Fetches a website (including all subpages), so the TYPO3 cache gets filled.
cache crawler fetch typo3 typo3-extension
Last synced: 24 Nov 2024
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
apartment-rents berlin crawler gis immobilienscout24
Last synced: 04 Nov 2024
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 14 Nov 2024
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 21 Oct 2024
https://github.com/a252937166/quick-selenium
主要使用quick-spring和selenium两个框架爬取各种动态网页的信息
Last synced: 21 Nov 2024
https://github.com/igeligel/BackpackLogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 13 Nov 2024
https://github.com/bbc2/discolinks
Command-line tool which checks a website for broken links.
broken-links crawler html http link-checker link-checkers link-checking validator web
Last synced: 28 Oct 2024
https://github.com/pps-22-scooby/pps-22-scooby
Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.
crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers
Last synced: 14 Oct 2024
https://github.com/piotrpdev/webuy-cex-price-tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 13 Nov 2024
https://github.com/pceuropa/youtube-crawler
Youtube crawler & scraper based on scrapy. Written in Python3.
crawler csv mariadb python3 scraper scrapy sqlalchemy youtube
Last synced: 13 Nov 2024
https://github.com/sebobo/shel.crawler
Neos based crawler for nodes and sites
Last synced: 14 Oct 2024
https://github.com/anikhasibul/stackoverflow-scraper-messenger-bot
A messenger bot that answers messages by scraping stackoverflow questions and answers
chatbot crawler messenger-bot scrapper stackoverflow
Last synced: 24 Nov 2024
https://github.com/activatedgeek/winemag-dataset
Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:
crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting
Last synced: 14 Oct 2024
https://github.com/insign/spatie-crawler-queue-with-laravel-model
Spatie's Crawler with Laravel Model as Queue
cache crawler eloquent laravel queues spatie spatie-crawler
Last synced: 16 Nov 2024
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 11 Oct 2024
https://github.com/drogbadvc/crawlit
This project is a web crawler based on Scrapy, visualization 2D, PageRank
Last synced: 08 Nov 2024
https://github.com/appliedsoul/promise-crawler
Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)
crawler node-crawler nodejs promise-node-crawler spider
Last synced: 08 Nov 2024
https://github.com/codenashwan/telegrambot_instadp
A simple BOT Telegram to downloading Instagram profiles photo
api crawler crawling instagram instagram-api instagram-bot instagramscraper laravel php scraper telegram telegram-api telegram-bot webhook
Last synced: 08 Nov 2024
https://github.com/amirhoseinsb/Cloud_Player_V2
You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.
cloud-player crawler crawling music music-player programming python url-player
Last synced: 20 Nov 2024
https://github.com/the1812/bingwallpapers
A tool for downloading wallpapers from Bing.
Last synced: 04 Nov 2024
https://github.com/yaroslaff/bulk-http-check
Very fast and simple concurrent HTTP client (3500 HTTP req/s)
bulk check concurrent connections crawler header http https multiple parallel spider status
Last synced: 07 Nov 2024
https://github.com/fedebotu/neurips2022-openreviewdata
Crawl & Visualize NeurIPS 2022 Data from OpenReview
crawler dataset neurips neurips-2022 openreview peer-review review scraper
Last synced: 06 Nov 2024
https://github.com/thesp0nge/nightcrawler
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 12 Oct 2024
https://github.com/bajins/scripts_python
Python 脚本
crawler faker faker-generator python-3 python3 rclone rclone-client rclone-config rclone-configuration reptile reptile-image reptiles scraper spider
Last synced: 12 Nov 2024
https://github.com/torhamdev/death-engine
A powerful recon tool
crawler death-engine directory-search google-dorks hacking-tool information-gathering pentesting pentesting-tools port-scanning python3 recon recon-tools scanner web-hacking web-penetration-testing webhacking webpentest whois
Last synced: 15 Nov 2024
https://github.com/duongdev/facebook-group-crawler
Facebook Groups Discussions Crawler
crawler facebook groups puppeteer
Last synced: 12 Nov 2024
https://github.com/tghoul/spider914j
91 web spider for java.
91porn crawler spring-boot webmagic
Last synced: 21 Nov 2024
https://github.com/omilab/internet-archive-link-extractor
Tool for extracting external links of a URL from Internet Archive snapshots
Last synced: 25 Nov 2024
https://github.com/dori-dev/quotes-crawler
Quotes crawler using scrapy and python.
crawler crawling python scraping-python scraping-websites scrapy scrapy-crawler scrapy-spider web-scraper
Last synced: 09 Nov 2024
https://github.com/nakabonne/webcrawlerforserps
Web crawler that scrapes Google search results
Last synced: 24 Oct 2024
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 13 Oct 2024
https://github.com/windfarer/biu
biubiubiu~~ I'm a tiny web crawler framework
crawler python spider spider-framework web-crawler
Last synced: 28 Oct 2024
https://github.com/softmarshmallow/inked-news-crawler
🕷 korean news source crawler (realtime & bulk)
crawler naver-news python3 scrapy
Last synced: 06 Dec 2024
https://github.com/brucewind/fear-and-greed-index-alarm
A notification reminder for indicating when the CNN Fear and Greed Index is out of range.
crawler fear-and-greed fear-greed-index investment sctock stock-market us-stock-market
Last synced: 28 Nov 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/jacobsteves/crawlperl
A web crawler made with Perl. Great for grabbing or searching for data off the web, or ensuring that your own site files are secure and hidden.
crawler perl scripting web-crawler
Last synced: 27 Nov 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/twtrubiks/pttcrawlercontent
PTT Crawler Content on python PTT文章爬蟲
Last synced: 16 Nov 2024
https://github.com/baraja-core/webcrawler
Simple crawling websites by following links.
bot crawler crawling-websites fast php robot speed
Last synced: 06 Nov 2024
https://github.com/synacktraa/crawl
Web crawler designed to efficiently retrieve unique href, script and form links from a web application.
bash crawler regex shell web-spidering
Last synced: 26 Nov 2024
https://github.com/sweeticelolly/sao_title_bot
一个生成骚论文题目的机器人
chrome-dr chromedriver crawler generator language-learning language-model numpy python robot scholar scholarly-articles selenium selenium-webdriver
Last synced: 24 Nov 2024
https://github.com/nobodxbodon/chromecrawlerwildspider
Chrome Extension to crawl web pages by loading them into browser tabs parallelly.
chrome-extension crawler localstorage spider
Last synced: 30 Nov 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/hybridx/webscraper
webcrawler made from Beautiful soup
crawler flask google-dorks javascript python3 search-engine
Last synced: 13 Dec 2024