Crawler | Ecosyste.ms: Awesome

https://github.com/danhje/dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

async broken-links crawler dead-links python python3

Last synced: 04 Nov 2024

https://github.com/ptsochantaris/bloo

Your search engine on your device

crawler ios ios-app macos macos-app productivity search-engine spotlight spotlight-search swift testflight

Last synced: 07 Nov 2024

https://github.com/gabrielguarisa/brdata

Brazilian financial market data sources

brasil crawler data finance

Last synced: 25 Nov 2024

https://github.com/somnisomni/twitter-account-data-crawler

Crawl and track followers count of Twitter account

crawler crawling follower-count follower-tracker selenium selenium-python twitter twitter-api twitter-crawler twitter-crawling

Last synced: 21 Nov 2024

https://github.com/maxgio92/krawler

A crawler for kernel releases distributed by the major Linux distributions.

crawler kernel linux scraping

Last synced: 28 Oct 2024

https://github.com/floschnell/flatcrawl-processors

A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.

bot crawler flatcrawl flats real-estate rentals-search telegram

Last synced: 02 Dec 2024

https://github.com/codingcrush/aiocrawler

Async crawler framework based on aiohttp and asyncio for running fast.

aiofiles aiohttp asyncio crawler uvloop

Last synced: 17 Nov 2024

https://github.com/valmisson/ytubes

Search for videos, playlists, channels, movies. live and musics on youtube without api key.

channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube

Last synced: 11 Oct 2024

https://github.com/kodjunkie/node-raspar

🕷️ Easily scrap the web for torrent and media files.

api api-rest api-wrapper cli crawler crawling crawling-tool docker expressjs javascript movies mp3 music node-js nodejs scraper series torrent torrent-downloader video

Last synced: 15 Oct 2024

https://github.com/minicli/curly

Simple Curl Client

crawler curl hacktoberfest php

Last synced: 19 Dec 2024

https://github.com/refraction-ray/wos-statistics

The crawler for data on web of science, especially focus on the analysis of citation data

aiohttp citation crawler webofscience

Last synced: 06 Jan 2025

https://github.com/bgadrian/warmcache

A simple tool to scan your website to keep your cache hot & ready. Helper tool for Prerender, Squid, CDN etc..

cache cdn crawler go golang prerender prerenderio squid

Last synced: 15 Nov 2024

https://github.com/xiaoluoboding/metafy-svg

Easily crawl a website's metadata and generate SVG as a service.

crawler metadata saas serverless-functions svg vercel-serverless

Last synced: 28 Oct 2024

https://github.com/ezzcodeezzlife/scraper-instagram

Scrape data from Instagram without applying for the authenticated API 🎯

auth authentication crawler ig instagram instagram-api instagram-client instagram-scraper javascript js nodejs npm scraper scraper-instagram scraping wrapper

Last synced: 17 Dec 2024

https://github.com/gridaco/figma-archives

Figma Files Scraper for Research & Studies

crawler dataset design-database figma machine-learning scrapy selenium

Last synced: 24 Jan 2025

https://github.com/charles-hsiao/python-flightradar

Python airline/flights data crawler

airlines crawler flightradar flightradar24 flights python python-crawler python3

Last synced: 11 Nov 2024

https://github.com/saltyshiomix/web-master

Web mastering tools for my personal services

crawler javascript nodejs scraper typescript web

Last synced: 27 Oct 2024

https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy

Scraping Amazon website using Proxies for extracting Mobile details

amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents

Last synced: 27 Oct 2024

https://github.com/96bearli/biliup_record

对bilibili的up动态留档

bili crawler python

Last synced: 27 Oct 2024

https://github.com/wearetyomsmnv/gptbuster

Generative web directory fuzzer,crawling and subdomain checker based on chatgpt

crawler gpt hacking pentesting python3 reconnaissance web

Last synced: 07 Nov 2024

https://github.com/postman-open-technologies/openapi-web-search

OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀

crawler dataset gsoc gsoc-2023 openapi search-engine swagger

Last synced: 07 Nov 2024

https://github.com/shaoxiongdu/skyeye

一个基于SpringBoot的全网热点爬虫项目，原始热搜数据会入库，分词统计会存入Redis。方便之后的数据分析。

crawler crawlers mysql redis spring spring-boot

Last synced: 16 Jan 2025

https://github.com/fanhuaandluomu/qqzoneparse

模拟登陆QQ空间，获取好友信息，并做分析（年龄分布、性别分布、地址分布等）具体参见说明文档及1049755192文件夹下的分析结果展示。

crawler python27 qqzone spider

Last synced: 12 Nov 2024

https://github.com/scrapingant/scrapingant-client-js

ScrapingAnt API client for JavaScript / Node.js.

crawler scraper scraping scrapingant webscraping

Last synced: 16 Dec 2024

https://github.com/ototot/judgegirl-scoreboard

A Fancy Scoreboard for JudgeGirl

crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2

Last synced: 08 Nov 2024

https://github.com/yifan123/arxiv_spider

An arxiv spider

arxiv crawler spider

Last synced: 09 Nov 2024

https://github.com/dev-chenxing/jjwxc-crawler

A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy | 基于Scrapy开发的晋江爬虫，根据书号下载小说非V章节，生成可编辑的Word文档

chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word

Last synced: 13 Nov 2024

https://github.com/stefanocudini/node-fetch-dom

Magic utility that extract javascript global variables from a remote html page.

crawler dom nodejs scraping webscraping

Last synced: 08 Nov 2024

https://github.com/amirzenoozi/insta-downloader

You Can Download Instagram Post With This Script

crawler crawling downloader instagram

Last synced: 20 Nov 2024

https://github.com/frostming/renren-dumps

人人网数据备份器

crawler renren spider

Last synced: 13 Oct 2024

https://github.com/krolow/marsvin

Structural Crawler framework written in PHP

crawler framework parser php

Last synced: 28 Nov 2024

https://github.com/binaryify/express-middleware-seo

Webpage pre-rendering middleware, base on headless chrome⚡️

chrome crawler express express-middleware nodejs seo

Last synced: 08 Nov 2024

https://github.com/a3r0id/httpscan

Scan a host for open HTTP ports and gain information about the services present.

crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery

Last synced: 06 Nov 2024

https://github.com/redco/goose-starter-kit

This is a starter kit for redco/goose-parser

crawler docker goose goose-parser parser starter-kit

Last synced: 05 Nov 2024

https://github.com/burnzz/scrapy-twitter

Web scraper based on Scrapy to fetch tweets from a list of user accounts

bot crawler scraping scrapy twitter

Last synced: 04 Nov 2024

https://github.com/jsrei/javascript-window-listener-library

javascript逆向开发基础组件，监听window的变化

crawler js-library js-reverse reverse-engineering web-security-research

Last synced: 16 Nov 2024

https://github.com/rsoury/serverless-web-crawler

Serverless Web Crawler that executes for an indefinite amount of time. Perfect for Crawling Jobs that last longer than a minute and only need to be executed once or twice a month.

boilerplate crawler fargate serverless serverless-framework template

Last synced: 10 Nov 2024

https://github.com/zamhown/limit-up-stock-crawler

📈 沪深股市涨停板数据爬虫

crawler python python3 stock

Last synced: 15 Oct 2024

https://github.com/begrossi/anp-price-collector

ANP Price Collector

crawler experiment not-maintained scrapy-crawler

Last synced: 23 Oct 2024

https://github.com/cybercongress/crawler

A toolchain for bringing web2 to web3

cosmos-sdk crawler cyber cyberd ipfs web3 wiki

Last synced: 15 Nov 2024

https://github.com/wuxudong/rxcrawler

a java crawler base on rx-java

crawler nio rxjava

Last synced: 14 Oct 2024

https://github.com/jacraig/spidey

A multi threaded web crawler library that is generic enough to allow different engines to be swapped in.

crawler webcrawler

Last synced: 14 Dec 2024

https://github.com/betta-cyber/netease_music_api

netease cloud music api for python

crawler data-analysis netease-cloud-music

Last synced: 04 Dec 2024

https://github.com/elektrostudios/google-search-url-crawler

Desktop app that crawls urls from Google's search engine results

crawl crawler crawlers crawling dotnet google google-crawler google-search googlesearch hacking search search-engine searcher tool tools url url-crawler vbnet windows winforms

Last synced: 01 Dec 2024

https://github.com/lightzhu/node_crawler

Node.js 项目,koa cheerio爬虫小程序,爬取电影、免费科学上网节点，钉钉定时消息。

crawler freevpn mongoose node ss ssr v2ray vmess vpn

Last synced: 09 Oct 2024

https://github.com/spider-rs/web-crawling-guides

How to guides on web-crawling or scraping

agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping

Last synced: 23 Dec 2024

https://github.com/guillim/arachnida

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

crawler crawling framework headless-chrome javascipt meteor scraper scrapping

Last synced: 26 Jan 2025

https://github.com/viclafouch/fetch-crawler

📌 A Node.JS Web crawler using the API Fetch to scrap static websites

cheerio crawler crawling-sites fetch-api nodejs promises scrapping

Last synced: 02 Dec 2024

https://github.com/niloysikdar/go-imdb-crawler

Want to know which celebrities have a common birthday with yours? 👀 Get the full data about them. Made using Go + Colly

colly crawler golang imdb

Last synced: 07 Nov 2024

https://github.com/geminidsystems/googlenewsscraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper

Last synced: 19 Nov 2024

https://github.com/thesoenke/news-crawler

Crawler that collects and extracts content of daily published news articles

crawler news

Last synced: 09 Nov 2024

https://github.com/wux1an/fake-useragent

Provide random user agent

crawler random spider ua user-agent useragent

Last synced: 20 Nov 2024

https://github.com/johansatge/psi-report

Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.

cli crawler html-report pagespeed-insights

Last synced: 30 Oct 2024

https://github.com/BroNils/GoogleSearch-CLI

Search anything on Google without captcha

captcha crawler google googlesearch googlesearch-cli recaptcha search-engine

Last synced: 30 Oct 2024

https://github.com/freekatz/jd_sentiment_analysis

一个简单的京东商品评论爬虫、处理、可视化、情感分析与模型评估实践

crawler jd spider

Last synced: 07 Dec 2024

https://github.com/embeddinglayer/awesome-fingerprinting

A collection of browser fingerprinting projects, research, and resources. Intended as a way to aggregate research surrounding the subject.

automation browser-fingerprinting crawler device-fingerprint fingerprinting scraper security

Last synced: 17 Dec 2024

https://github.com/petrpatek/airbnb-scraper

Apify public actor for scraping Airbnb homes.

airbnb airbnb-api apify crawler data-extraction scrape

Last synced: 27 Oct 2024

https://github.com/willin/beian-domain

获取最新可备案域名列表爬虫

beian crawler domain node

Last synced: 19 Oct 2024

https://github.com/dxsooo/shortvideocrawl

Short video crawler based on scrapy

crawler kuaishou scrapy spider video-crawler

Last synced: 15 Nov 2024

https://github.com/gimnathperera/abans-lk-webscraping

Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan web sites

crawler python scrapy spider

Last synced: 12 Nov 2024

https://github.com/louis70109/pleaguebot

P+ League Chatbot(unofficial)(deprecated)

basketball chatbot crawler line

Last synced: 15 Oct 2024

https://github.com/hfrost0/simple-baidu-image-download

只有30行的百度图片爬虫，只用最简单的语句

crawler image

Last synced: 14 Nov 2024

https://github.com/nadar/crawler

A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.

crawler hacktoberfest html pdf php

Last synced: 15 Oct 2024

https://github.com/cristipufu/scrapy-net

Scrapy the web scraping tool - a naive implementation in C#

crawler scraper scrapy

Last synced: 11 Oct 2024

https://github.com/discovai/discovai-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

ai api crawler embedding vector-database web-scraping

Last synced: 12 Nov 2024

https://github.com/yggverse/yggo

YGGo! Distributed Web Search Engine

alt-web crawler curl distributed federative fts5 js-less mysql open-source parser pdo php privacy-oriented search-engine sphinx sphinxsearch spider web web-archive yggdrasil

Last synced: 06 Nov 2024

https://github.com/mythkiven/python

python 脚本、python 爬虫、python 工具

crawler python script spider

Last synced: 21 Nov 2024

https://github.com/doreanbyte/katswiri

A crawler to find job listings and aggregate them from multiple sources

assistant crawler employment-opportunities job-aggreg job-finder time-management

Last synced: 31 Dec 2024

https://github.com/twtrubiks/google-play-store-spider-bs4-excel

Google-Play-Store-spider use Beautiful Soup on Python to EXCEL

beautifulsoup crawler google-play-store pyexcel python sql-database xlsx

Last synced: 16 Nov 2024

https://github.com/sobak/scrawler

Declarative, scriptable web robot (crawler) and scrapper

crawler crawler-engine robots-txt scraper scraping-websites

Last synced: 29 Oct 2024

https://github.com/beomi/data_camp_wcr_3

파이썬을 활용한 실전 웹크롤링 CAMP 3기 소스코드

crawler python

Last synced: 10 Jan 2025

https://github.com/byt3n33dl3/crawler_v2

remote access trojan, RAT tools for penetration testing on a devices, access real time with client devices after the malware hits the kernels. Trust attack

crawler rat

Last synced: 31 Oct 2024

https://github.com/catalyst/moodle-tool_crawler

A moodle link crawling robot, find broken, slow and oversized links

crawler moodle plugin-moodle

Last synced: 11 Nov 2024

https://github.com/whitejoce/Get_Weather

通过获取IP定位，爬取当地的天气（不需要API）

crawler python3 spider weather-forecast

Last synced: 08 Nov 2024

https://github.com/davideviolante/socialblade-com-api

Unofficial APIs for socialblade.com website.

crawler scraper scraping social social-media socialblade

Last synced: 02 Nov 2024

https://github.com/wangy8961/python3-concurrency-pics-01

爬虫多线程或异步下载 http://gank.io/api/data/%E7%A6%8F%E5%88%A9/1000/1 所分享的美女图片

aiohhtp asyncio coroutine crawler progressbar python3 requests threadpool

Last synced: 11 Nov 2024

https://github.com/hiyali/node-crawler-on-mongodb

🕷 NodeJS + Puppeteer crawler on MongoDB

crawler example mongob nodejs puppeteer

Last synced: 22 Nov 2024

https://github.com/tca166/ck3-history-extractor

A program designed for creating an encyclopedia of sorts containing your ck3 history

ck3 crawler python3 rust save-file save-files

Last synced: 14 Dec 2024

https://github.com/odanieldcs/bot-webscraper

Código fonte do web scraper

cheerio crawler request scraper spider tutorial

Last synced: 06 Dec 2024

https://github.com/myconsciousness/atproto-pds-search

This project automatically crawls and visualizes the atproto PDS endpoints indexed in the PLC directory.

atproto bluesky crawler dart flutter indexer pds search search-engine searching

Last synced: 25 Jan 2025

https://github.com/theritikchoure/crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping

Last synced: 12 Oct 2024

https://github.com/jayin/goods-crawling

爬取amazon/bestbuy/costco/6pm 的商品详情

amazon crawler node

Last synced: 26 Oct 2024

https://github.com/dvf/bitcoin-node-crawler

A node crawler for discovering nodes on the Bitcoin network

bitcoin btc crawler explorer p2p python

Last synced: 11 Oct 2024

https://github.com/ne-lexa/roach-php-bundle

Symfony bundle for roach-php/core

crawler php roach-php scrapy spider symfony symfony-bundle

Last synced: 12 Oct 2024

https://github.com/bunseokbot/darklight

Engine for collecting onion domains and crawling from webpage based on Tor network

celery crawler crawling darkweb engine python redis tor

Last synced: 17 Nov 2024

https://github.com/bjoern-hempel/php-web-crawler

A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.

crawler mit-license php recursive webcrawler webscraper xpath

Last synced: 07 Nov 2024

https://github.com/lablnet/web-spider

Multi threaded Web crawler

crawl crawler mit open-source package project python spider

Last synced: 20 Nov 2024

https://github.com/bluurr/quora-loader

A realtime read-only locator and extraction library for Quora questions and answers.

answers api bluurr client crawler crawling java questions quora scraper scraping selenium

Last synced: 02 Dec 2024

https://github.com/rodyherrera/codexdrake

An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀

adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia

Last synced: 06 Nov 2024

https://github.com/crispy-computing-machine/phpcrawl

PHPCrawl Web Crawler PHP 8

crawl crawler php php74 sphider

Last synced: 22 Jan 2025

https://github.com/qzcool/cpef

私募基金管理人查询数据接口。Chinese Private Equity Funds APIs.

china crawler data finance fund funds hedge-funds private-equity python python3 scraper scraping-websites spider

Last synced: 21 Nov 2024

https://github.com/cutecutecat/knightreport

坎公骑冠剑会战统计工具

crawler csv-export game-tool

Last synced: 27 Oct 2024

https://github.com/confact/spider.cr

Spider.cr is a spider crawler in Crystal. It handles collecting, scraping, and parsing. So you can spend your time collecting the data you want on a big scale.

crawler spider

Last synced: 08 Nov 2024