Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 25 Nov 2024

https://github.com/aquilax/opendirindexer

Open directory indexer

crawler go indexing

Last synced: 21 Nov 2024

https://github.com/dori-dev/flask-corona-info

Live Corona statistics and information site with flask.

coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider

Last synced: 09 Nov 2024

https://github.com/vmarcosp/supervise-crawler

:male_detective: Supervise crawler

crawler esy ocaml reasonml webcrawler

Last synced: 18 Nov 2024

https://github.com/matheuscas/pycnpj-crawler

Mais um módulo para extrair dados de empresas a partir do CNPJ

cnpj crawler python python3

Last synced: 19 Dec 2024

https://github.com/myconsciousness/atproto-pds-search

This project automatically crawls and visualizes the atproto PDS endpoints indexed in the PLC directory.

atproto bluesky crawler dart flutter indexer pds search search-engine searching

Last synced: 19 Oct 2024

https://github.com/599316527/nakeyouku

抓取优酷视频信息

crawler headless-chrome youku

Last synced: 15 Oct 2024

https://github.com/SupervisedCo/HyperCrawlTurbo

HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.

ai crawler ml nlp retrieval retrieval-augmented-generation

Last synced: 04 Dec 2024

https://github.com/szczyglis-dev/php-ultra-small-proxy

[PHP] Lightweight proxy with full support for sessions, cookies, POST/FORM submissions, and URL rewriting. The proxy offers two methods of URL rewriting: XML and Regex. It also includes features such as HTTP Auth, caching, and more.

cookies crawler crawler-php css http-client http-proxy networking proxy proxy-server webbrowser website www

Last synced: 14 Nov 2024

https://github.com/mithro/fastsvncrawler

fast-svn-crawler / fastsvncrawler - A tool for listing SVN repository content

crawler export import subversion svn vcs

Last synced: 14 Oct 2024

https://github.com/vndee/visee

Just a typical search engine in this universe :fire::fire::fire:

crawler django docker e-commerce elasticsearch flask kafka python visual-search

Last synced: 18 Nov 2024

https://github.com/gbolmier/newspaper-crawler

:spider: An autonomous French newspaper crawler based on Scrapy framework

crawler scrapy

Last synced: 13 Oct 2024

https://github.com/toddlerya/learn_scrapy

learn Scrapy 1.4.0

crawler demo python scrapy tutorial

Last synced: 13 Dec 2024

https://github.com/blesstosam/registerappleid

a node js program for registering appleid automatically

crawler nodejs

Last synced: 18 Nov 2024

https://github.com/lablnet/pakweather_scraper

A multi-threaded Pakistan Weather crawler written in JavaScript

crawler data mit-license open-source pakistan scraping weather weather-channel

Last synced: 20 Nov 2024

https://github.com/trungdq88/movie-showtimes

Web Service & Android Application to look up Vietnam movie showtimes

crawler java movie-showtimes theater

Last synced: 31 Oct 2024

https://github.com/luyadev/luya-module-crawler

Crawle a Website and provide intelligent search results

crawler hacktoberfest intelligent-search luya search yii2

Last synced: 10 Oct 2024

https://github.com/68publishers/crawler

:spider_web: Awesome scenario based crawler

crawlee crawler crawling node nodejs scraper scraping

Last synced: 12 Dec 2024

https://github.com/petersonjr/MetadataCrawler

A simple tool to extract metadata from relational databases

avro crawler database-schemas java jdbc metadata rdms relational-databases

Last synced: 13 Nov 2024

https://github.com/logocomune/botdetector

BotDetector is a golang library that detects Bot/Spider/Crawler from user agent

botdetector bots crawler go golang golang-library spider user-agent

Last synced: 11 Nov 2024

https://github.com/webcoast-dk/versatile-crawler

Extendable and easy to use crawler extension for TYPO3 CMS

crawler extendable indexing search typo3

Last synced: 12 Dec 2024

https://github.com/mmqnym/nft-market-sniper

This bot helps people to get more infomation (e.g. Floor price) automatically from Ebisu's bay (The NFT Market on Cronos).

crawl crawler discord nft pycord python

Last synced: 17 Nov 2024

https://github.com/twtrubiks/pttstatistics

統計PTT看板推文 or 文章標題 熱門關鍵詞 on python

crawler ptt ptt-hot-key python statistics

Last synced: 16 Nov 2024

https://github.com/shawon922/jobs-crawler

Crawl IT/Telecommunication jobs from bdjobs.com

beautifulsoup4 crawler python3

Last synced: 09 Nov 2024

https://github.com/keul/allanon

A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them

crawler python

Last synced: 11 Nov 2024

https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 23 Oct 2024

https://github.com/samiahmedsiddiqui/http-auth

Helps you to secure your whole site on the development time and admin pages from the Brute attack.

admin auth authentication brute-force brute-force-attacks crawl crawler http-auth http-authentication locked login restrict-pages restrict-site wordpress wordpress-plugin

Last synced: 25 Nov 2024

https://github.com/mediamonks/symfony-crawler-bundle

Implements the crawler package into Symfony

crawler php symfony symfony-bundle

Last synced: 03 Dec 2024

https://github.com/tsoliangwu0130/spotify-news

A Flask application to retrieve the singers' latest news according to your Spotify current playing song.

bootstrap crawler flask oauth2 python3 restful-api spotify-api

Last synced: 11 Nov 2024

https://github.com/itwars/golang-scraping-colly

Exemples de récupération de données non structurées avec le framework Golang COLLY

bigdata colly crawler crawling data forecast golang scraper scraping sports

Last synced: 20 Nov 2024

https://github.com/eight04/ptt-mail-backup

一個用來抓取 PTT 站內信的 BBS Bot

bbs cli crawler ptt ptt-crawler python python3

Last synced: 28 Oct 2024

https://github.com/igeligel/backpacklogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 19 Nov 2024

https://github.com/sabinbajracharya/Insta-crawler

Pulls data from instagram and saves it to Firebase for storage and Algolia for search

accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper

Last synced: 07 Nov 2024

https://github.com/umihico/minigun-requests

Web scraping API to outsource tons of GET & xpath to cloud computing

crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping

Last synced: 15 Nov 2024

https://github.com/tosone/githubtraveler

Travel all of the GitHub users, orgs, repos.

crawler github golang

Last synced: 06 Nov 2024

https://github.com/spekulatius/spatie-crawler-cached-queue-example

Example to demonstrate the usage of cached queues across multiple requests.

crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler

Last synced: 12 Nov 2024

https://github.com/visuellverstehen/t3fetch

Fetches a website (including all subpages), so the TYPO3 cache gets filled.

cache crawler fetch typo3 typo3-extension

Last synced: 24 Nov 2024

https://github.com/pawod/gis-berlin-rents

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.

apartment-rents berlin crawler gis immobilienscout24

Last synced: 04 Nov 2024

https://github.com/adileo/MicroFrontier

A lightweight crawler frontier implementation in TypeScript using Redis.

crawler frontier microservice redis robots-txt spider

Last synced: 14 Nov 2024

https://github.com/fanzeyi/torchic

A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.

bm25 crawler search-engine

Last synced: 21 Oct 2024

https://github.com/ruanwenjun/crawl-demo

一个简单的JAVA爬虫项目,爬取微博热搜,百度等网页的热搜词

crawler java

Last synced: 16 Oct 2024

https://github.com/a252937166/quick-selenium

主要使用quick-spring和selenium两个框架爬取各种动态网页的信息

crawler quickstart selenium

Last synced: 21 Nov 2024

https://github.com/igeligel/BackpackLogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 13 Nov 2024

https://github.com/bioinformatist/py3_scripts

Life is short, *****.

blast crawler gtf pacbio scrapy

Last synced: 10 Nov 2024

https://github.com/bbc2/discolinks

Command-line tool which checks a website for broken links.

broken-links crawler html http link-checker link-checkers link-checking validator web

Last synced: 28 Oct 2024

https://github.com/pps-22-scooby/pps-22-scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers

Last synced: 14 Oct 2024

https://github.com/piotrpdev/webuy-cex-price-tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 13 Nov 2024

https://github.com/pceuropa/youtube-crawler

Youtube crawler & scraper based on scrapy. Written in Python3.

crawler csv mariadb python3 scraper scrapy sqlalchemy youtube

Last synced: 13 Nov 2024

https://github.com/sebobo/shel.crawler

Neos based crawler for nodes and sites

crawler neos-cms

Last synced: 14 Oct 2024

https://github.com/anikhasibul/stackoverflow-scraper-messenger-bot

A messenger bot that answers messages by scraping stackoverflow questions and answers

chatbot crawler messenger-bot scrapper stackoverflow

Last synced: 24 Nov 2024

https://github.com/activatedgeek/winemag-dataset

Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:

crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting

Last synced: 14 Oct 2024

https://github.com/bfwg/node-tinycrawler

Tiny web-crawler in a nute shell for Node.js

crawler nodejs redis

Last synced: 11 Oct 2024

https://github.com/drogbadvc/crawlit

This project is a web crawler based on Scrapy, visualization 2D, PageRank

crawler scrapy seo streamlit

Last synced: 08 Nov 2024

https://github.com/appliedsoul/promise-crawler

Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)

crawler node-crawler nodejs promise-node-crawler spider

Last synced: 08 Nov 2024

https://github.com/amirhoseinsb/Cloud_Player_V2

You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.

cloud-player crawler crawling music music-player programming python url-player

Last synced: 20 Nov 2024

https://github.com/twtrubiks/dowload-image-ptt

PTT圖片下載器 (C# WinForm) For Windows

crawler dowload image ptt winforms

Last synced: 16 Nov 2024

https://github.com/the1812/bingwallpapers

A tool for downloading wallpapers from Bing.

crawler csharp wpf

Last synced: 04 Nov 2024

https://github.com/yaroslaff/bulk-http-check

Very fast and simple concurrent HTTP client (3500 HTTP req/s)

bulk check concurrent connections crawler header http https multiple parallel spider status

Last synced: 07 Nov 2024

https://github.com/aurelius84/pycrawler

A flexible spider based on mysql

crawler etl mysql scrapy spider

Last synced: 06 Nov 2024

https://github.com/fedebotu/neurips2022-openreviewdata

Crawl & Visualize NeurIPS 2022 Data from OpenReview

crawler dataset neurips neurips-2022 openreview peer-review review scraper

Last synced: 06 Nov 2024

https://github.com/khaleddallah/LinkedinScraper

Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file

crawler excel json linkedin python scraper scrapy spider

Last synced: 05 Nov 2024

https://github.com/thesp0nge/nightcrawler

A python program that crawls a website and tries to stress it, polluting forms with bogus data

crawler offensive-scripts offensive-security stress-test web-crawler web-crawling

Last synced: 12 Oct 2024

https://github.com/xanke/node-crawler-server

一个轻量级nodejs的远程采集服务器

crawler nodejs server

Last synced: 02 Dec 2024

https://github.com/rggh/scrapy18

Scrapy start_urls from csv demo

crawler linkextractor scrapy

Last synced: 07 Dec 2024

https://github.com/duongdev/facebook-group-crawler

Facebook Groups Discussions Crawler

crawler facebook groups puppeteer

Last synced: 12 Nov 2024

https://github.com/tghoul/spider914j

91 web spider for java.

91porn crawler spring-boot webmagic

Last synced: 21 Nov 2024

https://github.com/omilab/internet-archive-link-extractor

Tool for extracting external links of a URL from Internet Archive snapshots

crawler internetarchive

Last synced: 25 Nov 2024

https://github.com/29dch/word_cloud

python制作词云项目

crawler jieba wordcloud

Last synced: 11 Nov 2024

https://github.com/nakabonne/webcrawlerforserps

Web crawler that scrapes Google search results

cli crawler golang

Last synced: 24 Oct 2024

https://github.com/sdq/kaggle-crawler

simple scrapy project for kaggle.com

crawler kaggle

Last synced: 17 Dec 2024

https://github.com/yerkopalma/bash-crawler

:computer: Get a site links with bash

bash crawler

Last synced: 13 Oct 2024

https://github.com/windfarer/biu

biubiubiu~~ I'm a tiny web crawler framework

crawler python spider spider-framework web-crawler

Last synced: 28 Oct 2024

https://github.com/softmarshmallow/inked-news-crawler

🕷 korean news source crawler (realtime & bulk)

crawler naver-news python3 scrapy

Last synced: 06 Dec 2024

https://github.com/brucewind/fear-and-greed-index-alarm

A notification reminder for indicating when the CNN Fear and Greed Index is out of range.

crawler fear-and-greed fear-greed-index investment sctock stock-market us-stock-market

Last synced: 28 Nov 2024

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 13 Oct 2024

https://github.com/busterc/crwlr

🕷a minimal puppeteer crawler api

crawl crawler crawling puppeteer spider walker

Last synced: 12 Dec 2024

https://github.com/jacobsteves/crawlperl

A web crawler made with Perl. Great for grabbing or searching for data off the web, or ensuring that your own site files are secure and hidden.

crawler perl scripting web-crawler

Last synced: 27 Nov 2024

https://github.com/integralist/go-web-crawler

A web crawler built in the Go programming language

concurrency crawler go golang web-crawler

Last synced: 11 Oct 2024

https://github.com/twtrubiks/pttcrawlercontent

PTT Crawler Content on python PTT文章爬蟲

crawler gossiping ptt python

Last synced: 16 Nov 2024

https://github.com/oscarnevarezleal/ecommerce-crawler

Parallel ecommerce crawler using Docker and Puppeter on GCP

crawler gcp nodejs pubnub puppeteer

Last synced: 29 Nov 2024

https://github.com/baraja-core/webcrawler

Simple crawling websites by following links.

bot crawler crawling-websites fast php robot speed

Last synced: 06 Nov 2024

https://github.com/synacktraa/crawl

Web crawler designed to efficiently retrieve unique href, script and form links from a web application.

bash crawler regex shell web-spidering

Last synced: 26 Nov 2024

https://github.com/exp-codes/jzone-crawler

QQ空间爬虫(Java版)

crawler programming

Last synced: 16 Dec 2024

https://github.com/nobodxbodon/chromecrawlerwildspider

Chrome Extension to crawl web pages by loading them into browser tabs parallelly.

chrome-extension crawler localstorage spider

Last synced: 30 Nov 2024

https://github.com/Antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 24 Sep 2024

https://github.com/s045pd/magicworld

环球网-神奇世界看看看爬虫

crawler python3 sanic telepot

Last synced: 07 Nov 2024

https://github.com/ajcerejeira/base.gov.pt

A crawler that fetches data from base.gov.pt

crawler csv python scrapy

Last synced: 06 Nov 2024

https://github.com/hybridx/webscraper

webcrawler made from Beautiful soup

crawler flask google-dorks javascript python3 search-engine

Last synced: 13 Dec 2024