Crawler | Ecosyste.ms: Awesome

https://github.com/leonzucchini/Recipes

Project to get and analyse data on recipes from chefkoch.de

cooking crawler python recipe

Last synced: 04 Nov 2024

https://github.com/lysandrejik/omegle-crawler-node

Node library to connect to and interact with the Omegle website.

crawler omegle puppeteer

Last synced: 23 Oct 2024

https://github.com/cyclone-github/spider

URL Spider - web crawler and wordlist / ngram generator

cewl crawler cyclone generator gramify n-gram ngram scaping scraper spider url web wordlist

Last synced: 06 Nov 2024

https://github.com/qzcool/cpef

私募基金管理人查询数据接口。Chinese Private Equity Funds APIs.

china crawler data finance fund funds hedge-funds private-equity python python3 scraper scraping-websites spider

Last synced: 21 Nov 2024

https://github.com/jtiala/wpdl

⬇️ Scrape pages, posts, images and other data from a WordPress instance.

crawler downloader scraper scraping wordpress

Last synced: 23 Oct 2024

https://github.com/gbolmier/newspaper-crawler

:spider: An autonomous French newspaper crawler based on Scrapy framework

crawler scrapy

Last synced: 13 Oct 2024

https://github.com/vndee/visee

Just a typical search engine in this universe :fire::fire::fire:

crawler django docker e-commerce elasticsearch flask kafka python visual-search

Last synced: 18 Nov 2024

https://github.com/xunzhuo/airspider

A Fast and Light Python Spider Framework 🕷️

asynchronous crawler crawler-python distributed python3 redis spider spider-framework web

Last synced: 28 Oct 2024

https://github.com/sanix-darker/ziim

Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself

cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions

Last synced: 14 Oct 2024

https://github.com/twtrubiks/google-play-store-spider-selenium

Google-Play-Store-spider use Selenium +Beautiful Soup on Python

beautifulsoup chrome crawler firefox python selenium spider sqlite

Last synced: 16 Nov 2024

https://github.com/root4loot/recrawl

A Web URL crawler written in Go

bugbounty crawler discovery enumeration go golang recon reconnaissance web

Last synced: 06 Nov 2024

https://github.com/michaelradu/web-crawler

A Web Crawler developed in Python.

crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling

Last synced: 01 Dec 2024

https://github.com/rational-kunal/netflix-hotkeys

A Chrome extension to enhance your Netflix binging experience!

chrome-extension crawler netflix

Last synced: 15 Nov 2024

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 25 Nov 2024

https://github.com/ronin-rb/ronin-web-spider

A collection of common web spidering routines

crawler infosec recon ruby scraper spider utils web websecurity

Last synced: 28 Dec 2024

https://github.com/n370/springer-scraper

academic book crawler ebook free scraper springer

Last synced: 21 Jan 2025

https://github.com/dachcom-digital/pimcore-dynamic-search-data-provider-crawler

A Spider Crawler Extension for Pimcore Dynamic Search.

crawler index pimcore scraper search

Last synced: 14 Nov 2024

https://github.com/hironsan/japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework

crawler

Last synced: 13 Dec 2024

https://github.com/kevincobain2000/go-app-reviews-scraper

Apple app store reviews and ratings scraper.

applestore applestoreconnect crawler ios iosapp ratings ratings-extractor reviews reviewscrapper scraper

Last synced: 30 Nov 2024

https://github.com/nerohin/millions-crawler

Homework III of NCKU course WEB RESOURCE DISCOVERY AND EXPLOITATION , I've used the distribute crawler to crawling over miliion web page.

crawler distributed scrapy spider web-crawler

Last synced: 20 Jan 2025

https://github.com/SupervisedCo/HyperCrawlTurbo

HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.

ai crawler ml nlp retrieval retrieval-augmented-generation

Last synced: 04 Dec 2024

https://github.com/aquilax/opendirindexer

Open directory indexer

crawler go indexing

Last synced: 21 Nov 2024

https://github.com/szczyglis-dev/php-ultra-small-proxy

[PHP] Lightweight proxy with full support for sessions, cookies, POST/FORM submissions, and URL rewriting. The proxy offers two methods of URL rewriting: XML and Regex. It also includes features such as HTTP Auth, caching, and more.

cookies crawler crawler-php css http-client http-proxy networking proxy proxy-server webbrowser website www

Last synced: 14 Nov 2024

https://github.com/dori-dev/flask-corona-info

Live Corona statistics and information site with flask.

coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider

Last synced: 09 Nov 2024

https://github.com/ivan-alone/instastories-saver

Program to saving Instagram Stories

api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories

Last synced: 27 Oct 2024

https://github.com/Ivan-Alone/InstaStories-Saver

Program to saving Instagram Stories

api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories

Last synced: 22 Nov 2024

https://github.com/599316527/nakeyouku

抓取优酷视频信息

crawler headless-chrome youku

Last synced: 15 Oct 2024

https://github.com/matheuscas/pycnpj-crawler

Mais um módulo para extrair dados de empresas a partir do CNPJ

cnpj crawler python python3

Last synced: 30 Dec 2024

https://github.com/vmarcosp/supervise-crawler

:male_detective: Supervise crawler

crawler esy ocaml reasonml webcrawler

Last synced: 18 Nov 2024

https://github.com/mithro/fastsvncrawler

fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content

crawler export import subversion svn vcs

Last synced: 14 Oct 2024

https://github.com/sebobo/shel.crawler

Neos based crawler for nodes and sites

crawler neos-cms

Last synced: 14 Oct 2024

https://github.com/bbc2/discolinks

Command-line tool which checks a website for broken links.

broken-links crawler html http link-checker link-checkers link-checking validator web

Last synced: 28 Oct 2024

https://github.com/pceuropa/youtube-crawler

Youtube crawler & scraper based on scrapy. Written in Python3.

crawler csv mariadb python3 scraper scrapy sqlalchemy youtube

Last synced: 13 Nov 2024

https://github.com/toddlerya/learn_scrapy

learn Scrapy 1.4.0

crawler demo python scrapy tutorial

Last synced: 13 Dec 2024

https://github.com/activatedgeek/winemag-dataset

Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:

crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting

Last synced: 14 Oct 2024

https://github.com/anikhasibul/stackoverflow-scraper-messenger-bot

A messenger bot that answers messages by scraping stackoverflow questions and answers

chatbot crawler messenger-bot scrapper stackoverflow

Last synced: 24 Nov 2024

https://github.com/piotrpdev/webuy-cex-price-tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 13 Nov 2024

https://github.com/blesstosam/registerappleid

a node js program for registering appleid automatically

crawler nodejs

Last synced: 18 Nov 2024

https://github.com/trungdq88/movie-showtimes

Web Service & Android Application to look up Vietnam movie showtimes

crawler java movie-showtimes theater

Last synced: 31 Oct 2024

https://github.com/petersonjr/MetadataCrawler

A simple tool to extract metadata from relational databases

avro crawler database-schemas java jdbc metadata rdms relational-databases

Last synced: 13 Nov 2024

https://github.com/lablnet/pakweather_scraper

A multi-threaded Pakistan Weather crawler written in JavaScript

crawler data mit-license open-source pakistan scraping weather weather-channel

Last synced: 20 Nov 2024

https://github.com/shawon922/jobs-crawler

Crawl IT/Telecommunication jobs from bdjobs.com

beautifulsoup4 crawler python3

Last synced: 09 Nov 2024

https://github.com/luyadev/luya-module-crawler

Crawle a Website and provide intelligent search results

crawler hacktoberfest intelligent-search luya search yii2

Last synced: 10 Oct 2024

https://github.com/samiahmedsiddiqui/http-auth

Helps you to secure your whole site on the development time and admin pages from the Brute attack.

admin auth authentication brute-force brute-force-attacks crawl crawler http-auth http-authentication locked login restrict-pages restrict-site wordpress wordpress-plugin

Last synced: 25 Nov 2024

https://github.com/webcoast-dk/versatile-crawler

Extendable and easy to use crawler extension for TYPO3 CMS

crawler extendable indexing search typo3

Last synced: 12 Dec 2024

https://github.com/twtrubiks/pttstatistics

統計PTT看板推文 or 文章標題熱門關鍵詞 on python

crawler ptt ptt-hot-key python statistics

Last synced: 16 Nov 2024

https://github.com/68publishers/crawler

:spider_web: Awesome scenario based crawler

crawlee crawler crawling node nodejs scraper scraping

Last synced: 12 Dec 2024

https://github.com/logocomune/botdetector

BotDetector is a golang library that detects Bot/Spider/Crawler from user agent

botdetector bots crawler go golang golang-library spider user-agent

Last synced: 11 Nov 2024

https://github.com/omkarcloud/web-scraping-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 02 Jan 2025

https://github.com/mmqnym/nft-market-sniper

This bot helps people to get more infomation (e.g. Floor price) automatically from Ebisu's bay (The NFT Market on Cronos).

crawl crawler discord nft pycord python

Last synced: 17 Nov 2024

https://github.com/keul/allanon

A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them

crawler python

Last synced: 11 Nov 2024

https://github.com/tsoliangwu0130/spotify-news

A Flask application to retrieve the singers' latest news according to your Spotify current playing song.

bootstrap crawler flask oauth2 python3 restful-api spotify-api

Last synced: 11 Nov 2024

https://github.com/mediamonks/symfony-crawler-bundle

Implements the crawler package into Symfony

crawler php symfony symfony-bundle

Last synced: 03 Dec 2024

https://github.com/umihico/minigun-requests

Web scraping API to outsource tons of GET & xpath to cloud computing

crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping

Last synced: 15 Nov 2024

https://github.com/igeligel/backpacklogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 19 Nov 2024

https://github.com/spekulatius/spatie-crawler-cached-queue-example

Example to demonstrate the usage of cached queues across multiple requests.

crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler

Last synced: 12 Nov 2024

https://github.com/sabinbajracharya/Insta-crawler

Pulls data from instagram and saves it to Firebase for storage and Algolia for search

accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper

Last synced: 07 Nov 2024

https://github.com/tosone/githubtraveler

Travel all of the GitHub users, orgs, repos.

crawler github golang

Last synced: 06 Nov 2024

https://github.com/itwars/golang-scraping-colly

Exemples de récupération de données non structurées avec le framework Golang COLLY

bigdata colly crawler crawling data forecast golang scraper scraping sports

Last synced: 20 Nov 2024

https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 23 Oct 2024

https://github.com/eight04/ptt-mail-backup

一個用來抓取 PTT 站內信的 BBS Bot

bbs cli crawler ptt ptt-crawler python python3

Last synced: 28 Oct 2024

https://github.com/visuellverstehen/t3fetch

Fetches a website (including all subpages), so the TYPO3 cache gets filled.

cache crawler fetch typo3 typo3-extension

Last synced: 24 Nov 2024

https://github.com/pawod/gis-berlin-rents

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.

apartment-rents berlin crawler gis immobilienscout24

Last synced: 04 Nov 2024

https://github.com/adileo/MicroFrontier

A lightweight crawler frontier implementation in TypeScript using Redis.

crawler frontier microservice redis robots-txt spider

Last synced: 14 Nov 2024

https://github.com/ruanwenjun/crawl-demo

一个简单的JAVA爬虫项目，爬取微博热搜，百度等网页的热搜词

crawler java

Last synced: 16 Oct 2024

https://github.com/fanzeyi/torchic

A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.

bm25 crawler search-engine

Last synced: 21 Oct 2024

https://github.com/a252937166/quick-selenium

主要使用quick-spring和selenium两个框架爬取各种动态网页的信息

crawler quickstart selenium

Last synced: 21 Nov 2024

https://github.com/igeligel/BackpackLogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 13 Nov 2024

https://github.com/bioinformatist/py3_scripts

Life is short, *****.

blast crawler gtf pacbio scrapy

Last synced: 10 Nov 2024

https://github.com/pps-22-scooby/pps-22-scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers

Last synced: 14 Oct 2024

https://github.com/yerkopalma/bash-crawler

:computer: Get a site links with bash

bash crawler

Last synced: 13 Oct 2024

https://github.com/windfarer/biu

biubiubiu~~ I'm a tiny web crawler framework

crawler python spider spider-framework web-crawler

Last synced: 28 Oct 2024

https://github.com/chrisweb/universal-nodejs-scraper

Universal node.js scraper, is a simple tool to crawl web pages and extract content that can then be stored in csv files (sheets) or directly into a database

crawler harvester javascript nodejs scraper typescript