Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-01-10 00:06:02 UTC
JSON Representation

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 28 Nov 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 28 Nov 2024

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 Jan 2025

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 28 Nov 2024

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 28 Nov 2024

https://github.com/noarche/darknoisy

Same as my Noisy but on TOR network. Logs links. Crawls onion sites.

crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks

Last synced: 03 Dec 2024

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 10 Jan 2025

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 08 Dec 2024

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 01 Oct 2024

https://github.com/d7isme/pixiv-downloader-mod

Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.

chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader

Last synced: 09 Jan 2025

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 21 Dec 2024

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 09 Dec 2024

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 03 Dec 2024

https://github.com/palpitate-xus/sge_data_insert

利用Github Actions实现自动获取sge数据并存入数据库

crawler mysql python

Last synced: 16 Dec 2024

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 03 Dec 2024

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 03 Dec 2024

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 23 Dec 2024

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 23 Dec 2024

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 03 Dec 2024

https://github.com/alancesar/crawler

HTML crawler

crawler docker spider

Last synced: 03 Dec 2024

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 23 Dec 2024

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 19 Nov 2024

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 04 Dec 2024

https://github.com/allotmentandy/socialmedialinkextractor

php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links

crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube

Last synced: 23 Dec 2024

https://github.com/tetreum/xupopter_client

Simple interface to manage Xupopter recipes aswell as it's runners.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 04 Jan 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 09 Dec 2024

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 04 Dec 2024

https://github.com/rutopio/crawler-cpbl-player-data

針對中華職棒官網的球員資料進行爬蟲與整理。

cpbl crawler crawling python

Last synced: 04 Dec 2024

https://github.com/rutopio/crawler-2020-taiwanese-election-results

2020 台灣選舉結果爬蟲：以不分區政黨票為例

crawler python

Last synced: 04 Dec 2024

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 04 Dec 2024

https://github.com/vivekg13186/lucas

A web crawler

crawler crawler-engine crawling-framework java

Last synced: 09 Dec 2024

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 04 Dec 2024

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 09 Dec 2024

https://github.com/tetreum/xupopter_runner

Executes crawling recipes coming from Xupopter Chrome Extension.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/mirusu400/berryz-dl

Batch download berryz webshare files recursively!

berryz berryz-webshare crawler downloader scraper

Last synced: 26 Dec 2024

https://github.com/mstephen19/apify-click-events

Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to

apify apify-sdk crawler scraper web-automation

Last synced: 10 Dec 2024

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 16 Dec 2024

https://github.com/gxjansen/website-to-pdf

Creates a PDF based on the content of a website/subomain

claude-3-sonnet crawler python3

Last synced: 10 Dec 2024

https://github.com/athulmurali/flickr-api-docs-crawler

A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus

api beautifulsoup4 crawler documentation python3

Last synced: 09 Jan 2025

https://github.com/ahsouza/iquizz-api

API RESTfull developed in Node.Js with MongoDB

animations cluster crawler docker docker-compose ejs-templates es8 font-awesome grunt-task helmet-detection heroku javascript jquery material-design mongodb nodejs passport-strategy passportjs pusher token-authetication

Last synced: 10 Dec 2024

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 23 Dec 2024

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 13 Dec 2024

https://github.com/datamine/twitter-name-and-shame

Crawler to find Twitter accounts following more than a million users

crawler flask python python-2 twitter

Last synced: 31 Dec 2024

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 11 Dec 2024

https://github.com/apexcaptain/allergy-alert

오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.

crawler docker expressjs puppeteer reactjs sqlite typescript

Last synced: 01 Dec 2024

https://github.com/yaoshanliang/linkedinspider

Crawl job information from LinkedIn for data analysis

big-data crawler python social-network-analysis

Last synced: 11 Dec 2024

https://github.com/claudio-code/nap-web-crawler

Created It crawler to find broken links in docs of framework and languages

crawler

Last synced: 11 Dec 2024

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 08 Dec 2024

https://github.com/chenbingwei1201/threads_scraper

A Python package for scraping Threads posts.

chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites

Last synced: 11 Dec 2024

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 23 Oct 2024

https://github.com/sinipelto/repo-license-crawler

Collects and summarizes license information on Python and NPM packages into output files.

crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3

Last synced: 11 Dec 2024

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 15 Nov 2024

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 25 Dec 2024

https://github.com/sandrewtx08/gearbest_scraper

Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.

crawler gearbest lxml python scraper scraping sqlite3

Last synced: 10 Jan 2025

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 05 Jan 2025

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 12 Dec 2024

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 14 Dec 2024

https://github.com/antoniowd/crawly

Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)

crawler got jsdom nodejs webcrawler webscraping

Last synced: 12 Dec 2024

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 05 Dec 2024

https://github.com/dnlzrgz/excursionist

Scrapy-powered flight price crawler.

crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling

Last synced: 23 Dec 2024

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 13 Dec 2024

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 23 Dec 2024

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 15 Oct 2024

https://github.com/tetreum/xupopter_chrome_extension

Extension to easily create crawling recipes

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 05 Dec 2024

https://github.com/abdymm/abtelegrambot-sample

sample using Telegram Bot

crawler football php scheduler telegram-bot webhook

Last synced: 10 Jan 2025

https://github.com/wangluyu/trending-crawler

crawler python3 trending

Last synced: 23 Dec 2024

https://github.com/octcarp/sustech_cs209a-java2_f24_proj

(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Yida Tao @yidatao .

crawler spring-boot stackexchange sustech visualization

Last synced: 01 Jan 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 23 Dec 2024

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 23 Dec 2024

https://github.com/purrproof/smartcrawl

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 29 Nov 2024

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 23 Dec 2024

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 29 Nov 2024

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 10 Jan 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/tigercosmos/web-crawler

Web Crawler in Java Maven Project

crawler

Last synced: 05 Dec 2024

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 05 Dec 2024

https://github.com/tetreum/price-crawler

Article price crawler

crawler nodejs

Last synced: 17 Dec 2024

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 12 Dec 2024

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 05 Dec 2024

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 24 Dec 2024

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 10 Jan 2025

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 01 Jan 2025

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 18 Dec 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 19 Nov 2024

https://github.com/luciopaiva/dicio-crawler

Node.js crawler for dicio.com.br.

crawler nodejs scraper

Last synced: 18 Dec 2024

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 10 Jan 2025

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 24 Dec 2024

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 22 Nov 2024

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 22 Nov 2024

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Nov 2024

https://github.com/clumsyme/ziroom_watcher

crawler email python ziroom

Last synced: 24 Dec 2024

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 15 Nov 2024

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 19 Dec 2024

https://github.com/jesseokeya/linkedin-scraper

Selenium webDriver used to get information from linkedIn

chromedriver crawler linkedin os python scraper selenium-webdriver

Last synced: 25 Dec 2024

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 25 Dec 2024

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 13 Nov 2024

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-digital-preservation 45 awesome-fingerprinting 48

Crawler Categories

2.6 机器学习 50 Research 31 Python 18 Replay tools 18 1.1 语言基础 14 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 2.5 数据分析 7 Java 7 Web archiving 7 Other digital objects 6 4\. 异步IO 6 Social Networks 4 Standards and specifications 4 2.3 Django 框架 4 2.2 Flask 框架 4