Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-28 00:06:05 UTC
- JSON Representation
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 07 Nov 2024
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 02 Jan 2025
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 18 Nov 2024
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 24 Jan 2025
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 09 Dec 2024
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 01 Jan 2025
https://github.com/zekrotja/r34-crawler
A simple CLI tool to fetch and download images from rule34.xxx
crawler go rest-api rule34 worker-pool xml
Last synced: 17 Dec 2024
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 20 Dec 2024
https://github.com/darealfreak/figure-tracker
application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops
crawler figure-tracker monitoring
Last synced: 11 Dec 2024
https://github.com/rudrakshi99/web_crawler
A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.
Last synced: 22 Nov 2024
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Nov 2024
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 23 Dec 2024
https://github.com/travorlzh/temperature-analyzer
Python crawler that helps fetch temperature of Beijing, China
crawler homework python variance
Last synced: 17 Jan 2025
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 24 Dec 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 19 Dec 2024
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 17 Dec 2024
https://github.com/nakabonne/netsurfer
netsurfer is a very lightweight scraping framework
Last synced: 14 Dec 2024
https://github.com/epigos/newsbot
A news bot written in Go for Dialogflow and Facebook messenger
autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed
Last synced: 27 Jan 2025
https://github.com/ericz99/go-crawler
Simple lightweight crawler, that will find all endpoints on any website.
Last synced: 30 Nov 2024
https://github.com/Anakeyn/website-contextual-links
Récupération des liens contextuels d'un site Web avec R.
Last synced: 24 Nov 2024
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 18 Dec 2024
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 11 Jan 2025
https://github.com/sean2077/leetcode_anki
Leetcode Anki card factory.
anki crawler leetcode leetcode-anki scrapy
Last synced: 11 Jan 2025
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 17 Jan 2025
https://github.com/first-coding/django-and-web
This is a django and Web front - and back -end separation project.
Last synced: 28 Dec 2024
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 21 Dec 2024
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 16 Dec 2024
https://github.com/leomaurodesenv/smm-maker-profile
A package to fetching the maker profile - Super Mario Maker
crawler javascript json mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/im-perativa/public_crawler
A collection of crawler project for Indonesia dataset
crawler indonesia indonesia-api scrapy
Last synced: 25 Jan 2025
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 17 Dec 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 24 Dec 2024
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 21 Dec 2024
https://github.com/stevieflyer/quokka
An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.
crawler parallel web-automation
Last synced: 14 Dec 2024
https://github.com/beanwei/zmt-post-crawler
Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend
Last synced: 28 Dec 2024
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 25 Dec 2024
https://github.com/bujosa/aldebaran
Example use APP ENGINE with Python3, ThreadPool and webScraping
appengine crawler flask gcp python3 thread-pool
Last synced: 21 Jan 2025
https://github.com/hantang/list-movies-top
豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取
Last synced: 07 Jan 2025
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 29 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 23 Dec 2024
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 21 Jan 2025
https://github.com/somehowchris/swisslos-cralwer
(WIP) Crawler to access the current and history numbers of swisslos
crawler euromillions lotto rust swisslos
Last synced: 27 Jan 2025
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 21 Jan 2025
https://github.com/purrproof/smartcrawl
An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 27 Jan 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Jan 2025
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 28 Dec 2024
https://github.com/thomashirtz/douban-crawler
A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.
Last synced: 25 Dec 2024
https://github.com/vitaee/laravelandcrawlers
php web crawler examples with oop concept and laravel project
Last synced: 26 Dec 2024
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 06 Dec 2024
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 14 Jan 2025
https://github.com/birdroad1/server-pinger
Server pinger for Minecraft written in C++
cpp crawler make minecraft minecraft-scanner postgres scanner server
Last synced: 21 Jan 2025
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 21 Jan 2025
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 08 Dec 2024
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 30 Dec 2024
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 29 Dec 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 02 Jan 2025
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 02 Jan 2025
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 27 Jan 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 11 Nov 2024
https://github.com/supadata-ai/py
Official Python SDK for the Supadata API.
ai api crawler llm markdown scraping sdk transcript web-scraper youtube
Last synced: 27 Jan 2025
https://github.com/supadata-ai/js
Official TypeScript/JavaScript SDK for the Supadata API.
ai crawler llm markdown scraper transcript web-crawler youtube
Last synced: 27 Jan 2025
https://github.com/m-osource/cassiopeiabot
C++ multithread Linux Web Crawler
algorithm berkeleydb bot cassiopeia cplusplus crawler download engine hashing html-parser information-retrieval link-analysis multithread open-source regex search web web-crawler webcrawler www
Last synced: 08 Jan 2025
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 22 Dec 2024
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 23 Jan 2025
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 14 Dec 2024
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 14 Dec 2024
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 27 Jan 2025
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Jan 2025
https://github.com/tsoliangwu0130/ex-dividend-date-notification
crawler email-notification python3 stock-market vanguard
Last synced: 08 Jan 2025
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 04 Jan 2025
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 07 Jan 2025
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 13 Jan 2025
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 27 Jan 2025
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 17 Jan 2025
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 08 Jan 2025
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 16 Jan 2025
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 17 Dec 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 22 Jan 2025