Crawler | Ecosyste.ms: Awesome

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 29 Oct 2024

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 14 Oct 2024

https://github.com/smolijar/offensive-fortune

A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.

crawler fortune fortune-cookie vilejoke

Last synced: 07 Nov 2024

https://github.com/enijkamp/supermonkey

A crawler for automated Android UI testing.

ai android crawler

Last synced: 22 Oct 2024

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 13 Aug 2024

https://github.com/paambaati/websight

🕷A simple but *really* fast crawler built with Node.js & TypeScript

coding-challenge crawler interview-questions javascript monzo nodejs typescript

Last synced: 15 Oct 2024

https://github.com/josecelano/my-favourite-appliances

Laravel CRUD sample

crawler crud laravel sample

Last synced: 29 Oct 2024

https://github.com/mauriceconrad/xml-parser

A Node.js XML DOM, Parser & Stringifier.

crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema

Last synced: 28 Oct 2024

https://github.com/bkeepers/spiderman

your friendly neighborhood web crawler

crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping

Last synced: 23 Oct 2024

https://github.com/alinebastos/crawler

Web Crawler created with Node.js and Puppeteer

crawler fs javascript nodejs puppeteer scraping

Last synced: 05 Nov 2024

https://github.com/pourmand1376/persiancrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 11 Oct 2024

https://github.com/racinmat/premium-downloader

crawler pornhub pornhub-downloader python

Last synced: 06 Nov 2024

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 06 Nov 2024

https://github.com/lixi5338619/lxparse

用于解析列表页链接和提取详细页内容的库

crawler htmlparse python

Last synced: 05 Nov 2024

https://github.com/Knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 03 Aug 2024

https://github.com/cable8mm/water-melon

Water Melon is simple melon.com api sdk for php

composer crawler kpop laravel melon package php

Last synced: 12 Oct 2024

https://github.com/ariya/penjabarberita

Extract the article list from its raw news HTML

articles cheerio crawler headlines html indexer indonesia news scraper spider

Last synced: 22 Oct 2024

https://github.com/pourmand1376/PersianCrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 04 Aug 2024

https://github.com/neuralegion/bright-cli

Command Line Interface (CLI) tool for NeuraLegion's solutions.

api cli crawler cyber-security devops har nexploit oas secops security typescript

Last synced: 14 Oct 2024

https://github.com/fanyong920/crawlitem

用于爬取淘宝天猫网页的谷歌插件

crawler javascript taobao tmall

Last synced: 27 Oct 2024

https://github.com/omarhashem123/venom

Tool designed for fast crawl and extract endpoints

crawler python python3 spider

Last synced: 04 Aug 2024

https://github.com/MontFerret/worker

Containerized Ferret worker

chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker

Last synced: 04 Nov 2024

https://github.com/ruichongliu/Crawler_pubg.op.gg

This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取

beautifulsoup4 crawler pubg python3 scrape selenium

Last synced: 29 Oct 2024

https://github.com/sigoden/rag-crawler

Crawl a website to generate knowledge file for RAG

crawler knowledge llm rag

Last synced: 27 Oct 2024

https://github.com/selmi-karim/img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage

Last synced: 15 Oct 2024

https://github.com/kasthack-labs/kasthack.osp

Генератор сырых дампов пользователей VK.

crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte

Last synced: 26 Sep 2024

https://github.com/fooock/robots.txt

:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot

Last synced: 27 Oct 2024

https://github.com/Selbi182/SpotifyDiscoveryBot

A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!

bot crawler java music spotify spring-boot springboot sqlite

Last synced: 27 Oct 2024

https://github.com/aurelg/linkbak

linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.

archive backup crawler html pdf python3

Last synced: 05 Nov 2024

https://github.com/gajus/headless-crawler

A crawler implemented using a headless browser (Chrome).

chrome crawler headless puppeteer spider

Last synced: 17 Oct 2024

https://github.com/kirillplatonov/proxy_manager

Ruby proxy manager. Gem for easy usage proxy in parser/web bots.

crawler parser proxy ruby

Last synced: 21 Oct 2024

https://github.com/danhje/dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

async broken-links crawler dead-links python python3

Last synced: 04 Nov 2024

https://github.com/ikergarcia1996/questionclustering

Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY

clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings

Last synced: 27 Oct 2024

https://github.com/xiaoluoboding/metafy-svg

Easily crawl a website's metadata and generate SVG as a service.

crawler metadata saas serverless-functions svg vercel-serverless

Last synced: 28 Oct 2024

https://github.com/saltyshiomix/web-master

Web mastering tools for my personal services

crawler javascript nodejs scraper typescript web

Last synced: 27 Oct 2024

https://github.com/maxgio92/krawler

A crawler for kernel releases distributed by the major Linux distributions.

crawler kernel linux scraping

Last synced: 28 Oct 2024

https://github.com/floschnell/flatcrawl-processors

A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.

bot crawler flatcrawl flats real-estate rentals-search telegram

Last synced: 12 Aug 2024

https://github.com/refraction-ray/wos-statistics

The crawler for data on web of science, especially focus on the analysis of citation data

aiohttp citation crawler webofscience

Last synced: 15 Oct 2024

https://github.com/kodjunkie/node-raspar

🕷️ Easily scrap the web for torrent and media files.

api api-rest api-wrapper cli crawler crawling crawling-tool docker expressjs javascript movies mp3 music node-js nodejs scraper series torrent torrent-downloader video

Last synced: 15 Oct 2024

https://github.com/postman-open-technologies/openapi-web-search

OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀

crawler dataset gsoc gsoc-2023 openapi search-engine swagger

Last synced: 07 Nov 2024

https://github.com/valmisson/ytubes

Search for videos, playlists, channels, movies. live and musics on youtube without api key.

channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube

Last synced: 11 Oct 2024

https://github.com/gridaco/figma-archives

Figma Files Scraper for Research & Studies

crawler dataset design-database figma machine-learning scrapy selenium

Last synced: 27 Oct 2024

https://github.com/96bearli/biliup_record

对bilibili的up动态留档

bili crawler python

Last synced: 27 Oct 2024

https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy

Scraping Amazon website using Proxies for extracting Mobile details

amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents

Last synced: 27 Oct 2024

https://github.com/begrossi/anp-price-collector

ANP Price Collector

crawler experiment not-maintained scrapy-crawler

Last synced: 23 Oct 2024

https://github.com/a3r0id/httpscan

Scan a host for open HTTP ports and gain information about the services present.

crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery

Last synced: 06 Nov 2024

https://github.com/cybercongress/crawler

A toolchain for bringing web2 to web3

cosmos-sdk crawler cyber cyberd ipfs web3 wiki

Last synced: 03 Aug 2024

https://github.com/frostming/renren-dumps

人人网数据备份器

crawler renren spider

Last synced: 13 Oct 2024

https://github.com/redco/goose-starter-kit

This is a starter kit for redco/goose-parser

crawler docker goose goose-parser parser starter-kit

Last synced: 05 Nov 2024

https://github.com/wuxudong/rxcrawler

a java crawler base on rx-java

crawler nio rxjava

Last synced: 14 Oct 2024

https://github.com/zamhown/limit-up-stock-crawler

📈 沪深股市涨停板数据爬虫

crawler python python3 stock

Last synced: 15 Oct 2024

https://github.com/burnzz/scrapy-twitter

Web scraper based on Scrapy to fetch tweets from a list of user accounts

bot crawler scraping scrapy twitter

Last synced: 04 Nov 2024

https://github.com/matheuscas/pynfce

Busca e extrai dados de uma NFCe dada sua URL de acesso.

crawler nfce python3

Last synced: 02 Oct 2024

https://github.com/stefanocudini/node-fetch-dom

Magic utility that extract javascript global variables from a remote html page.

crawler dom nodejs scraping webscraping

Last synced: 19 Oct 2024

https://github.com/ototot/judgegirl-scoreboard

A Fancy Scoreboard for JudgeGirl

crawler judgegirl judgegirl-scoreboard php scoreboard tocas-ui tocasui vuejs vuejs2

Last synced: 17 Oct 2024

https://github.com/thesoenke/news-crawler

Crawler that collects and extracts content of daily published news articles

crawler news

Last synced: 23 Oct 2024

https://github.com/willin/beian-domain

获取最新可备案域名列表爬虫

beian crawler domain node

Last synced: 19 Oct 2024

https://github.com/niloysikdar/go-imdb-crawler

Want to know which celebrities have a common birthday with yours? 👀 Get the full data about them. Made using Go + Colly

colly crawler golang imdb

Last synced: 07 Nov 2024

https://github.com/BroNils/GoogleSearch-CLI

Search anything on Google without captcha

captcha crawler google googlesearch googlesearch-cli recaptcha search-engine

Last synced: 30 Oct 2024

https://github.com/petrpatek/airbnb-scraper

Apify public actor for scraping Airbnb homes.

airbnb airbnb-api apify crawler data-extraction scrape

Last synced: 27 Oct 2024

https://github.com/lightzhu/node_crawler

Node.js 项目,koa cheerio爬虫小程序,爬取电影、免费科学上网节点，钉钉定时消息。

crawler freevpn mongoose node ss ssr v2ray vmess vpn

Last synced: 09 Oct 2024

https://github.com/johansatge/psi-report

Crawls a website, gets PageSpeed Insights data for each page, and exports an HTML report.

cli crawler html-report pagespeed-insights

Last synced: 30 Oct 2024

https://github.com/sobak/scrawler

Declarative, scriptable web robot (crawler) and scrapper

crawler crawler-engine robots-txt scraper scraping-websites

Last synced: 29 Oct 2024

https://github.com/louis70109/pleaguebot

P+ League Chatbot(unofficial)(deprecated)

basketball chatbot crawler line

Last synced: 15 Oct 2024

https://github.com/davideviolante/socialblade-com-api

Unofficial APIs for socialblade.com website.

crawler scraper scraping social social-media socialblade

Last synced: 02 Nov 2024

https://github.com/byt3n33dl3/crawler_v2

remote access trojan, RAT tools for penetration testing on a devices, access real time with client devices after the malware hits the kernels. Trust attack

crawler rat

Last synced: 31 Oct 2024

https://github.com/nadar/crawler

A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.

crawler hacktoberfest html pdf php

Last synced: 15 Oct 2024

https://github.com/doreanbyte/katswiri

A crawler to find job listings and aggregate them from multiple sources

assistant crawler employment-opportunities job-aggreg job-finder time-management

Last synced: 07 Sep 2024

https://github.com/yggverse/yggo

YGGo! Distributed Web Search Engine

alt-web crawler curl distributed federative fts5 js-less mysql open-source parser pdo php privacy-oriented search-engine sphinx sphinxsearch spider web web-archive yggdrasil

Last synced: 06 Nov 2024

https://github.com/whitejoce/Get_Weather

通过获取IP定位，爬取当地的天气（不需要API）

crawler python3 spider weather-forecast

Last synced: 01 Aug 2024

https://github.com/cristipufu/scrapy-net

Scrapy the web scraping tool - a naive implementation in C#

crawler scraper scrapy

Last synced: 11 Oct 2024

https://github.com/cyclone-github/spider

URL Spider - web crawler and wordlist / ngram generator

cewl crawler cyclone generator gramify n-gram ngram scaping scraper spider url web wordlist

Last synced: 06 Nov 2024

https://github.com/jtiala/wpdl

⬇️ Scrape pages, posts, images and other data from a WordPress instance.

crawler downloader scraper scraping wordpress

Last synced: 23 Oct 2024

https://github.com/bunseokbot/darklight

Engine for collecting onion domains and crawling from webpage based on Tor network

celery crawler crawling darkweb engine python redis tor

Last synced: 03 Aug 2024

https://github.com/ivan-sincek/scrapy-scraper

Web crawler and scraper based on Scrapy and Playwright's headless browser.

bug-bounty crawler crawling downloader downloading ethical-hacking headless-browser javascript offensive-security penetration-testing python red-team-engagement scraper scraping scrapy security spider spidering web web-penetration-testing

Last synced: 16 Oct 2024

https://github.com/hoc081098/comic_app_server_nodejs

Node.js sever for android comic app | https://comic-app-081098.herokuapp.com/

comic-app crawler nodejs nodejs-crawler nodejs-typescript typescript

Last synced: 31 Oct 2024

https://github.com/lysandrejik/omegle-crawler-node

Node library to connect to and interact with the Omegle website.

crawler omegle puppeteer

Last synced: 23 Oct 2024

https://github.com/cutecutecat/knightreport

坎公骑冠剑会战统计工具

crawler csv-export game-tool

Last synced: 27 Oct 2024

https://github.com/bjoern-hempel/php-web-crawler

A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.

crawler mit-license php recursive webcrawler webscraper xpath

Last synced: 07 Nov 2024

https://github.com/ne-lexa/roach-php-bundle

Symfony bundle for roach-php/core

crawler php roach-php scrapy spider symfony symfony-bundle

Last synced: 12 Oct 2024

https://github.com/leonzucchini/Recipes

Project to get and analyse data on recipes from chefkoch.de

cooking crawler python recipe

Last synced: 04 Nov 2024

https://github.com/rodyherrera/codexdrake

An open source, privacy-first, self-hosting capable and blazing fast search engine written in JavaScript. Browse anonymously and safely without the need to pay third-party APIs. 👀

adblock books crawler google images javascript metasearch metasearch-engine news nodejs privacy-first search search-engine searchengine searx self-hosted videos webscraping websearch wikipedia

Last synced: 06 Nov 2024

https://github.com/theritikchoure/crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping

Last synced: 12 Oct 2024

https://github.com/dvf/bitcoin-node-crawler

A node crawler for discovering nodes on the Bitcoin network

bitcoin btc crawler explorer p2p python

Last synced: 11 Oct 2024

https://github.com/jayin/goods-crawling

爬取amazon/bestbuy/costco/6pm 的商品详情

amazon crawler node

Last synced: 26 Oct 2024

https://github.com/crispy-computing-machine/phpcrawl

PHPCrawl Web Crawler PHP 8

crawl crawler php php74 sphider

Last synced: 28 Sep 2024

https://github.com/gbolmier/newspaper-crawler

:spider: An autonomous French newspaper crawler based on Scrapy framework

crawler scrapy

Last synced: 13 Oct 2024

https://github.com/matheuscas/pycnpj-crawler

Mais um módulo para extrair dados de empresas a partir do CNPJ

cnpj crawler python python3

Last synced: 02 Oct 2024

https://github.com/root4loot/recrawl

A Web URL crawler written in Go

bugbounty crawler discovery enumeration go golang recon reconnaissance web

Last synced: 06 Nov 2024

https://github.com/hironsan/japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework

crawler

Last synced: 27 Oct 2024

https://github.com/ivan-alone/instastories-saver

Program to saving Instagram Stories

api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories

Last synced: 27 Oct 2024

https://github.com/599316527/nakeyouku

抓取优酷视频信息

crawler headless-chrome youku

Last synced: 15 Oct 2024

https://github.com/myconsciousness/atproto-pds-search

This project automatically crawls and visualizes the atproto PDS endpoints indexed in the PLC directory.

atproto bluesky crawler dart flutter indexer pds search search-engine searching

Last synced: 19 Oct 2024

https://github.com/sanix-darker/ziim

Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself

cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions