Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-18 00:06:04 UTC
- JSON Representation
https://github.com/mikirasora/osuplayedbeatmapscrawler
A crawler that fetch and download osu!beatmaps which you had played
Last synced: 08 Nov 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 08 Nov 2024
https://github.com/jovijovi/ether-crawler
A transaction crawler for the Ethereum ecosystem.
blockchain crawler ether ethereum transaction
Last synced: 15 Nov 2024
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 28 Oct 2024
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 17 Nov 2024
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 19 Oct 2024
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 15 Nov 2024
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 09 Nov 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 13 Oct 2024
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 20 Oct 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 12 Oct 2024
https://github.com/myconsciousness/metis
Metis main repository.
application client crawler crawling crawlwebpage educatable gui lerning logging programming-language python scrape scraping scraping-websites tkinter tkinter-gui tkinter-python
Last synced: 19 Oct 2024
https://github.com/enansari/guess-price-car
Car price estimation based on the information of a car sales site
crawler jadi machine-learning maktabkhoone maktabkhooneh python
Last synced: 11 Nov 2024
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 08 Nov 2024
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 09 Nov 2024
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 05 Nov 2024
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 27 Oct 2024
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 17 Nov 2024
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 14 Oct 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 14 Oct 2024
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 12 Oct 2024
https://github.com/mustafadalga/website-crawler
Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.
crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling
Last synced: 17 Nov 2024
https://github.com/hoanle396/py-iconnect
crawler flask flask-application image-processing python
Last synced: 27 Oct 2024
https://github.com/camilamaia/crawl4us
[WIP] A Python web crawler looking wildly for tables 🕵️♀️
beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping
Last synced: 18 Oct 2024
https://github.com/imkrunalkanojiya/seo-checker
Resolve your SEO related issue by using SEO Checker Rest API
crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools
Last synced: 09 Nov 2024
https://github.com/tcc0lin/magiccrawler
Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Last synced: 17 Nov 2024
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 12 Nov 2024
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 17 Nov 2024
https://github.com/hoishing/selenium-crawler
a web crawler written in python, powered by Selenium and Tesseract OCR
Last synced: 17 Nov 2024
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. Data include texts, images and links ( Discourse论坛对话(图片,文本)数据爬取并解析,以直接用于(多模态)指令微调).
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 14 Nov 2024
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 15 Oct 2024
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 11 Nov 2024
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 09 Nov 2024
https://github.com/hantang/list-movies-top
豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取
Last synced: 10 Nov 2024
https://github.com/juangesino/gazette
A personal news aggregator application using Meteor.
crawler meteor meteorjs news news-aggregator news-feed scraper
Last synced: 13 Oct 2024
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 06 Nov 2024
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 12 Oct 2024
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 18 Nov 2024
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 18 Nov 2024
https://github.com/jfcherng/wiki-cgroup-crawler
此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。
crawler php-71 wiki-cgroup-crawler wikipedia
Last synced: 28 Sep 2024
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 05 Nov 2024
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 13 Oct 2024
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 12 Oct 2024
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 07 Nov 2024
https://github.com/carloocchiena/python_url_crawler
A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website
beautifulsoup crawler python python3
Last synced: 14 Oct 2024
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 09 Nov 2024
https://github.com/thiiagoms/car-stealth
REST API to all cars that were stolen
Last synced: 15 Nov 2024
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 11 Nov 2024
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 12 Oct 2024
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 06 Nov 2024
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 16 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 05 Nov 2024
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 06 Nov 2024
https://github.com/beanwei/zmt-post-crawler
Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend
Last synced: 07 Nov 2024
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 06 Nov 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 07 Nov 2024
https://github.com/ilsonlasmar/inovamind
Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis
Last synced: 11 Nov 2024
https://github.com/spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping
Last synced: 05 Nov 2024
https://github.com/programming-with-love/skyeyesystem
天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。
crawler mysql redis skyeye skyeyewall springboot
Last synced: 16 Nov 2024
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 13 Oct 2024
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 11 Nov 2024
https://github.com/toannd96/chromedp-example-login
chromedp crawler golang goquery
Last synced: 18 Nov 2024
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 13 Nov 2024
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 09 Nov 2024
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 04 Nov 2024
https://github.com/thomashirtz/douban-crawler
A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.
Last synced: 06 Nov 2024
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 14 Nov 2024
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 11 Nov 2024
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 17 Oct 2024
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 07 Nov 2024
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Nov 2024
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling
Last synced: 09 Nov 2024
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 08 Nov 2024
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 12 Nov 2024
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 17 Nov 2024
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Nov 2024
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Nov 2024
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 08 Nov 2024
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 08 Nov 2024
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 13 Nov 2024
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 13 Nov 2024
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 04 Nov 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 30 Oct 2024
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 09 Nov 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 06 Nov 2024
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 08 Nov 2024