Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2024-11-18 00:06:04 UTC
JSON Representation

https://github.com/iarsham/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 24 Oct 2024

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 03 Nov 2024

https://github.com/naem1023/comic-crawler

Comic crawler.

beautifulsoup crawler python3

Last synced: 13 Oct 2024

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Nov 2024

https://github.com/shaoxiongdu/skyeye

一个基于SpringBoot的全网热点爬虫项目，原始热搜数据会入库，分词统计会存入Redis。方便之后的数据分析。

crawler crawlers mysql redis spring spring-boot

Last synced: 16 Nov 2024

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 21 Oct 2024

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 06 Nov 2024

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 15 Oct 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/eghuro/crawlcheck

Extensible web crawler

configuration crawler http plugin python robots-txt sitemap

Last synced: 13 Nov 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 28 Oct 2024

https://github.com/mg98/ipfs-replicate

Replicate IPFS' distributed data structure locally, based on network traces.

crawler dag ipfs redisgraph scraper

Last synced: 14 Oct 2024

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 14 Oct 2024

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 17 Nov 2024

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Nov 2024

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 15 Oct 2024

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 14 Nov 2024

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 25 Oct 2024

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 17 Nov 2024

https://github.com/tigercosmos/web-crawler

Web Crawler in Java Maven Project

crawler

Last synced: 15 Oct 2024

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 17 Nov 2024

https://github.com/tungct/facebookcrawler

crawler graphapi python2

Last synced: 14 Nov 2024

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 16 Nov 2024

https://github.com/vishaalpkumar/skysift

A distributed search engine from scratch

aws crawler css distributed-systems html java search-engine

Last synced: 05 Nov 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 07 Nov 2024

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 15 Nov 2024

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 07 Nov 2024

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 14 Nov 2024

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 08 Nov 2024

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 15 Oct 2024

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 25 Oct 2024

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Nov 2024

https://github.com/hsiehbocheng/usa-tourist-recommend

crawler mongodb python tableau

Last synced: 14 Nov 2024

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Nov 2024

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 13 Oct 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具，可以自定义抽取自己所需要的字段内容，简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 04 Nov 2024

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 04 Nov 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 10 Nov 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, social metrics, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 16 Nov 2024

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 13 Nov 2024

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 15 Nov 2024

https://github.com/joeri-abbo/python-credly-scraper

This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an

badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling

Last synced: 15 Nov 2024

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 15 Nov 2024

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 20 Oct 2024

https://github.com/yukihirai0505/streamcrawler

akka stream × crawler

akka-streams crawler elasticsearch instagram sbt scala

Last synced: 14 Nov 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 03 Nov 2024

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 15 Nov 2024

https://github.com/lysagxra/eromedownloader

Erome albums and profile downloader

bulk bulk-downloader concurrent-processing crawler downloader erome erome-downloader parallel-processing profile-downloader python python3

Last synced: 16 Nov 2024

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 15 Oct 2024

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 24 Oct 2024

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 15 Nov 2024

https://github.com/m1/smap

smap is a site-mapping engine written in Go.

crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling

Last synced: 22 Oct 2024

https://github.com/clumsyme/ziroom_watcher

crawler email python ziroom

Last synced: 06 Nov 2024

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 15 Oct 2024

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 11 Nov 2024

https://github.com/vhdm/twitter-hashtag-crawler

Twitter hashtag crawler by selenium, without using the Twitter API ;)

crawler python tor twitter

Last synced: 09 Nov 2024

https://github.com/apexcaptain/allergy-alert

오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.

crawler docker expressjs puppeteer reactjs sqlite typescript

Last synced: 14 Oct 2024

https://github.com/miiraak/scrapc

C# WinForms - Crawler & Scraper Web content

crawler csharp html scraper url web windows-forms

Last synced: 13 Oct 2024

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 02 Nov 2024

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 16 Nov 2024

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 11 Nov 2024

https://github.com/zhaotianff/crawler-line

C# command-line crawler

command-line command-line-tool crawler csharp dotnet-core

Last synced: 15 Nov 2024

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 15 Oct 2024

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 08 Nov 2024

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 06 Nov 2024

https://github.com/luciopaiva/dicio-crawler

Node.js crawler for dicio.com.br.

crawler nodejs scraper

Last synced: 14 Oct 2024

https://github.com/mahdijamebozorg/cryptonewscrawler

A crawler to receive crypto news from websites

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 16 Nov 2024

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 15 Oct 2024

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 15 Nov 2024

https://github.com/bradsec/gofindfiles

Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.

crawler osint osint-tool recon scraper web-scraper

Last synced: 10 Nov 2024

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 10 Nov 2024

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Nov 2024

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 09 Nov 2024

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 10 Nov 2024

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Nov 2024

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 09 Nov 2024

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 09 Nov 2024

https://github.com/capturr/json-deep-equal

Check if json objects contains the same values (ignoring arrays order).

array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript

Last synced: 10 Nov 2024

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 15 Oct 2024

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 17 Nov 2024

https://github.com/rcmilan/ex-web-scraping

Web Scraping com F#

crawler f-sharp fsharp fsharp-data scraper web-scraping xplot

Last synced: 17 Nov 2024

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 09 Nov 2024

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 09 Nov 2024

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 21 Oct 2024

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 09 Nov 2024

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 14 Oct 2024

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 09 Nov 2024

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 09 Nov 2024

https://github.com/krishealty/whoknows

All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.

bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance

Last synced: 10 Nov 2024

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 04 Nov 2024

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 14 Nov 2024

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 31 Oct 2024

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 09 Nov 2024

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 12 Nov 2024

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 09 Nov 2024

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 10 Nov 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 10 Nov 2024

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api