Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/efishery/wpi-kkp-crawler

This is crawler for fisheries price on wpi.kkp.go.id

crawler kkp wpi

Last synced: 08 Nov 2024

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 16 Nov 2024

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 08 Nov 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 11 Nov 2024

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 06 Nov 2024

https://github.com/programming-with-love/skyeyesystem

天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 16 Nov 2024

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 07 Nov 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 13 Nov 2024

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 16 Nov 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 12 Oct 2024

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 13 Nov 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 13 Oct 2024

https://github.com/mdazlaanzubair/amazon-scraper-api

A web scraper to crawl on amazon to extract products information and return in JSON format.

amazon crawler expressjs json-api nodejs webscraping

Last synced: 11 Nov 2024

https://github.com/enansari/guess-price-car

Car price estimation based on the information of a car sales site

crawler jadi machine-learning maktabkhoone maktabkhooneh python

Last synced: 11 Nov 2024

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 15 Oct 2024

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 11 Nov 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Nov 2024

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 13 Nov 2024

https://github.com/fa7ad/aiub-notes-dl

Download all notes from AIUB's portal

aiub beautifulsoup4 crawler

Last synced: 24 Oct 2024

https://github.com/bitlytwiser/tormonger

Recursive Tor network crawler

crawler go golang tor

Last synced: 09 Nov 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 19 Oct 2024

https://github.com/ilsonlasmar/inovamind

Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis

crawler rails5 sidekiq

Last synced: 11 Nov 2024

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 08 Nov 2024

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 28 Oct 2024

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 17 Oct 2024

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 11 Nov 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 01 Nov 2024

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 28 Oct 2024

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 12 Oct 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 18 Nov 2024

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 09 Nov 2024

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 08 Nov 2024

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Nov 2024

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 15 Nov 2024

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 18 Nov 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 05 Nov 2024

https://github.com/yassilah/nuxt-crawler

Automatic crawler & search for Nuxt SSG.

algolia crawler nuxt search ssg

Last synced: 16 Nov 2024

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 13 Nov 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 28 Oct 2024

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 16 Nov 2024

https://github.com/linux0hat/cpp-web-crawler

Explore the web.

cpp crawler sqlite3

Last synced: 13 Nov 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 06 Nov 2024

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 05 Nov 2024

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 20 Oct 2024

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 16 Nov 2024

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 12 Oct 2024

https://github.com/projectx3193275578/prjctxx8264

A simple, open-source, easy to use, and free download manager for malware samples.

crawler downloader malware manager samples

Last synced: 09 Nov 2024

https://github.com/christopher-besch/therapy_search

Compute call times from arztsuche-bw into a calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 07 Nov 2024

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 09 Nov 2024

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 15 Oct 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 08 Nov 2024

https://github.com/mikirasora/osuplayedbeatmapscrawler

A crawler that fetch and download osu!beatmaps which you had played

crawler osu

Last synced: 08 Nov 2024

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 08 Nov 2024

https://github.com/zhaotianff/qzone

想起那天夕阳下的奔跑,那是我逝去的青春

crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf

Last synced: 15 Nov 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Nov 2024

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 30 Oct 2024

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 08 Nov 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 16 Oct 2024

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 08 Nov 2024

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 14 Nov 2024

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 14 Oct 2024

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 18 Oct 2024

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 09 Nov 2024

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/bkdev98/ebooks-crawler

Ebooks crawler for personal purpose using ReactJS.

crawler material-ui nodejs reactjs

Last synced: 08 Nov 2024

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 09 Nov 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 04 Nov 2024

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 09 Nov 2024

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 06 Nov 2024

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. Data include texts, images and links ( Discourse论坛对话(图片,文本)数据爬取并解析,以直接用于(多模态)指令微调).

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 14 Nov 2024

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 10 Nov 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 12 Oct 2024

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 20 Oct 2024

https://github.com/rdil/crawley

My attempt at a web crawler.

bs4 crawler python python3 web

Last synced: 09 Nov 2024

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 13 Nov 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 04 Nov 2024

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 16 Oct 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 14 Oct 2024

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 06 Nov 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 13 Oct 2024

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 16 Nov 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 07 Nov 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 06 Nov 2024

https://github.com/sinkaroid/webnovelcrawler

Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.

crawler dompdf webnovel

Last synced: 05 Nov 2024

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 11 Nov 2024

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 06 Nov 2024

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 06 Nov 2024