Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/rzo1/crawler4j

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

crawler crawler4j java spider web-crawler web-spider

Last synced: 29 Sep 2024

https://github.com/mechazawa/redbetter-wm2

Better.php crawler for Redacted that uses WhatManager

crawler flac redacted seedbox transcoding whatcd whatmanager

Last synced: 06 Nov 2024

https://github.com/alanshaw/libp2p-dht-scrape-aas

🧹 A libp2p DHT scraper as a service allowing anyone to collect, consume and use to generate useful reports & visualisations.

crawler dht kademlia libp2p p2p scraper

Last synced: 05 Dec 2024

https://github.com/capjamesg/indieweb-search

Source code for the IndieWeb search engine.

crawler indieweb search search-engine

Last synced: 16 Nov 2024

https://github.com/tokahuke/lopez

Crawling and scraping the Web for fun and profit

crawler rust scraper seo web-scraping

Last synced: 14 Nov 2024

https://github.com/fanhuaandluomu/qqspider

爬取QQ用户信息(qq号、昵称、生日、地址等基本信息)并做简要analysis。

crawler python qq spider

Last synced: 12 Nov 2024

https://github.com/Actomaton/ActoCrawler

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

crawler swift

Last synced: 29 Nov 2024

https://github.com/RuedigerVoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/nvk681/gumo

A crawler that extracts data from a dynamic webpage. Written in node js.

crawler elasticsearch neo4j nodejs

Last synced: 11 Oct 2024

https://github.com/tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin

Last synced: 10 Nov 2024

https://github.com/yokawasa/scrapy-azuresearch-crawler-samples

Scrapy as a Web Crawler for Azure Search Samples

azure azure-search crawler python python3 scrapy search

Last synced: 30 Oct 2024

https://github.com/ruedigervoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/capturr/scraper

All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.

captcha cheerio crawler crawling data declarative extract growth-hacking html javascript json jsonld nodejs recaptcha scraper scraping spider typescript web web-scraping

Last synced: 06 Dec 2024

https://github.com/asing1001/movierater

A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.

apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript

Last synced: 07 Nov 2024

https://github.com/norconex/collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.

crawler filesystem-crawler java norconex-filesystem-collector search-engine

Last synced: 11 Nov 2024

https://github.com/gruppio/slackwebhooksgithubcrawler

Search for Slack Webhooks token publicly exposed on Github

crawler crawling hack messages nodejs puppeteer slack slack-bot slack-webhook slackbot webhook

Last synced: 16 Nov 2024

https://github.com/petehouston/udemy-crawler

Crawling Udemy course info and save into JSON format.

crawler crawling node node-cli udemy udemy-api udemy-crawl

Last synced: 23 Oct 2024

https://github.com/zyszys/zhengfang_system_spider

:bug:一只登录正方教务管理系统,爬取数据的小爬虫

crawler python spider zhengfang

Last synced: 19 Nov 2024

https://github.com/xiyuan-fengyu/ppspider_example

ppspider爬虫例子,B站视频信息及评论爬取,qq音乐信息及评论爬取,推特主题评论和用户信息爬取

bilibili cheerio crawler ppspider puppeteer qq-music spider twitter

Last synced: 07 Nov 2024

https://github.com/casprwang/sse-option-crawler

SSE 50 index options crawler 上证50期权数据爬虫

crawler python python3 sina stock stock-market stocks

Last synced: 07 Dec 2024

https://github.com/loomisloud/onion-crawler

Tor website crawler (specific for Alphabay at the time)

crawler onion parser python tor

Last synced: 17 Nov 2024

https://github.com/s045pd/sharingan

We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon

asyncio crawler httpx python38 social-network

Last synced: 07 Nov 2024

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler

Last synced: 19 Oct 2024

https://github.com/fmw666/python

🍋 Python基础、Pygame游戏编程、Python算法与面试题、四种常用的Python Web框架、爬虫、数据可视化、机器学习。一共七个Python大方向!

algorithm basis crawler files gui learning-notes markdown pygame pyqt5 python3 script web

Last synced: 18 Dec 2024

https://github.com/mediamonks/crawler

Crawl your own website with various clients for SEO and indexing purposes.

browserkit crawler crawling php prerender prerenderio seo spider

Last synced: 03 Dec 2024

https://github.com/iflycn/hero

百万英雄答题助手 - 兼容全部答题 APP

adb android crawler orc python3

Last synced: 20 Nov 2024

https://github.com/sigoden/rag-crawler

Crawl a website to generate knowledge file for RAG

crawler knowledge llm rag

Last synced: 06 Dec 2024

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 29 Oct 2024

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 14 Oct 2024

https://github.com/paambaati/websight

🕷A simple but *really* fast crawler built with Node.js & TypeScript

coding-challenge crawler interview-questions javascript monzo nodejs typescript

Last synced: 03 Dec 2024

https://github.com/lupino/grapy

Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.

crawler python-library python3 spider

Last synced: 21 Nov 2024

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 04 Dec 2024

https://github.com/twtrubiks/youtube-trends-spider

crawler youtube trends use selenium on python

crawler python selenium tutorial youtube-trends-spider

Last synced: 16 Nov 2024

https://github.com/alinebastos/crawler

Web Crawler created with Node.js and Puppeteer

crawler fs javascript nodejs puppeteer scraping

Last synced: 05 Nov 2024

https://github.com/inspirehep/hepcrawl

Scrapy project for feeds into INSPIRE-HEP

crawler harvest-data publishing python

Last synced: 22 Dec 2024

https://github.com/enijkamp/supermonkey

A crawler for automated Android UI testing.

ai android crawler

Last synced: 09 Nov 2024

https://github.com/wahengchang/node-dcard-scraper

it is an example of implementing cheerio scraper of extracting images in dcard

cheerio crawler dcard example javascript nodejs npm scraper tutorial

Last synced: 08 Dec 2024

https://github.com/nothing12321/proxy-grabber

Python-based Massive Proxy Grabber. This bot grabs proxies from public websites so you can use them.

bot checker crawler grabber javascript parser proxies proxies-scraper proxy proxy-checker proxy-list proxy-parser proxy-scraper proxy-scrapper proxy-tool proxygrabber python socks socks4 socks5

Last synced: 28 Nov 2024

https://github.com/smolijar/offensive-fortune

A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.

crawler fortune fortune-cookie vilejoke

Last synced: 07 Nov 2024

https://github.com/neuralegion/bright-cli

Command Line Interface (CLI) tool for NeuraLegion's solutions.

api cli crawler cyber-security devops har nexploit oas secops security typescript

Last synced: 25 Dec 2024

https://github.com/knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 27 Nov 2024

https://github.com/pourmand1376/persiancrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 11 Oct 2024

https://github.com/twiny/wbot

A simple & efficient web crawler.

big-data crawler golang scraper seo spider

Last synced: 17 Dec 2024

https://github.com/pourmand1376/PersianCrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 20 Nov 2024

https://github.com/tim-saijun/gpt-web-crawler

A web crawler for GPTs to build knowledge bases 用于GPT构建知识库的网站爬虫

chatgpt crawler gpt-crawler knowledge-base

Last synced: 21 Nov 2024

https://github.com/lixi5338619/lxparse

用于解析列表页链接和提取详细页内容的库

crawler htmlparse python

Last synced: 05 Nov 2024

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 06 Nov 2024

https://github.com/Knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 16 Nov 2024

https://github.com/cable8mm/water-melon

Water Melon is simple melon.com api sdk for php

composer crawler kpop laravel melon package php

Last synced: 12 Oct 2024

https://github.com/ariya/penjabarberita

Extract the article list from its raw news HTML

articles cheerio crawler headlines html indexer indonesia news scraper spider

Last synced: 22 Oct 2024

https://github.com/shadawck/recon-archy

Linkedin Tools (and maybe later other source) to reconstruct a company hierarchy from scraping relations and jobs title

automation company-data crawler cybersecurity geckodriver golang linkedin organisational-analysis osint osinttool reconnaissance scraper selenium

Last synced: 15 Nov 2024

https://github.com/achannarasappa/locust

Distributed web data discovery and collection framework built for serverless

aws-lambda crawler locust scraping serverless

Last synced: 18 Nov 2024

https://github.com/chainski/chino-proxy-scraper

A python script that scrape proxies from frequently updated proxy sources.

crawler http https proxies proxy proxy-api proxygrabber proxyscrape-api proxyscraper proxytool python python3 scraper socks4 socks5

Last synced: 10 Nov 2024

https://github.com/kasthack-labs/kasthack.osp

Генератор сырых дампов пользователей VK.

crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte

Last synced: 26 Sep 2024

https://github.com/twtrubiks/eynycrawlermega

eyny 電影 Mega and Google 連結爬蟲 use python

crawler eyny mega python

Last synced: 16 Nov 2024

https://github.com/fooock/robots.txt

:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot

Last synced: 27 Oct 2024

https://github.com/Selbi182/SpotifyDiscoveryBot

A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!

bot crawler java music spotify spring-boot springboot sqlite

Last synced: 27 Oct 2024

https://github.com/fanyong920/crawlitem

用于爬取淘宝天猫网页的谷歌插件

crawler javascript taobao tmall

Last synced: 27 Oct 2024

https://github.com/toannd96/crawler_web_js

Dùng scrapy-splash kết hợp lua script để crawl các trang web sử dụng Javascript (websosanh)

crawler javascript lua-script scrapy scrapy-splash splash

Last synced: 18 Nov 2024

https://github.com/kirillplatonov/proxy_manager

Ruby proxy manager. Gem for easy usage proxy in parser/web bots.

crawler parser proxy ruby

Last synced: 06 Dec 2024

https://github.com/selmi-karim/img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage

Last synced: 08 Nov 2024

https://github.com/ze3kr/wheres-my-offer

University Admission Portal Checker

crawler offer university university-admission

Last synced: 18 Dec 2024

https://github.com/omarhashem123/venom

Tool designed for fast crawl and extract endpoints

crawler python python3 spider

Last synced: 21 Nov 2024

https://github.com/ruichongliu/Crawler_pubg.op.gg

This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取

beautifulsoup4 crawler pubg python3 scrape selenium

Last synced: 29 Oct 2024

https://github.com/gajus/headless-crawler

A crawler implemented using a headless browser (Chrome).

chrome crawler headless puppeteer spider

Last synced: 17 Oct 2024

https://github.com/matheuscas/pynfce

Busca e extrai dados de uma NFCe dada sua URL de acesso.

crawler nfce python3

Last synced: 19 Dec 2024

https://github.com/twtrubiks/crawler_click_tutorial

click tutorial ( crawler ) use python

click command-line-tool crawler python tutorial

Last synced: 16 Nov 2024

https://github.com/danhje/dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

async broken-links crawler dead-links python python3

Last synced: 04 Nov 2024

https://github.com/aurelg/linkbak

linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.

archive backup crawler html pdf python3

Last synced: 05 Nov 2024

https://github.com/ikergarcia1996/questionclustering

Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY

clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings

Last synced: 06 Dec 2024

https://github.com/gabrielguarisa/brdata

Brazilian financial market data sources

brasil crawler data finance

Last synced: 25 Nov 2024

https://github.com/abhineetraj1/phonenumber-scraper

This will tell you which carrier does your SIM belongs. Make sure your internet connection before running this !!

crawler phone-number-information phone-number-validation python3 scraper

Last synced: 28 Nov 2024

https://github.com/shavit/crawlero

Distributed web crawlers. Fault tolerance, user-agent randomizer, RabbitMQ, Tor, PostgreSQL.

crawler marketing-automation marketing-tools pbn proxy rabbitmq tor

Last synced: 23 Nov 2024

https://github.com/sunsetmkt/bilibili-video-reply-crawler

Python爬虫获取Bilibili视频/专栏评论

bilibili crawler github-actions python python3 spider

Last synced: 14 Nov 2024

https://github.com/fanhuaandluomu/qqzoneparse

模拟登陆QQ空间,获取好友信息,并做分析(年龄分布、性别分布、地址分布等)具体参见说明文档及1049755192文件夹下的分析结果展示。

crawler python27 qqzone spider

Last synced: 12 Nov 2024

https://github.com/refraction-ray/wos-statistics

The crawler for data on web of science, especially focus on the analysis of citation data

aiohttp citation crawler webofscience

Last synced: 15 Oct 2024

https://github.com/maxgio92/krawler

A crawler for kernel releases distributed by the major Linux distributions.

crawler kernel linux scraping

Last synced: 28 Oct 2024

https://github.com/codingcrush/aiocrawler

Async crawler framework based on aiohttp and asyncio for running fast.

aiofiles aiohttp asyncio crawler uvloop

Last synced: 17 Nov 2024

https://github.com/valmisson/ytubes

Search for videos, playlists, channels, movies. live and musics on youtube without api key.

channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube

Last synced: 11 Oct 2024

https://github.com/xiaoluoboding/metafy-svg

Easily crawl a website's metadata and generate SVG as a service.

crawler metadata saas serverless-functions svg vercel-serverless

Last synced: 28 Oct 2024