Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/anshiii/pixder

🤔 A spider for pixiv.net

crawler pixiv spider

Last synced: 23 Jan 2025

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 06 Dec 2024

https://github.com/datamine/twitter-name-and-shame

Crawler to find Twitter accounts following more than a million users

crawler flask python python-2 twitter

Last synced: 19 Jan 2025

https://github.com/joaooliveirapro/trawlergo

Basic HTTP Crawler in Golang

crawler go golang http

Last synced: 13 Jan 2025

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 16 Jan 2025

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 01 Dec 2024

https://github.com/leonardopinho/instagramfeed

Image list based on a tag for the Instagram feed.

crawler instagram python

Last synced: 07 Dec 2024

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 19 Jan 2025

https://github.com/bradsec/gofindfiles

Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.

crawler osint osint-tool recon scraper web-scraper

Last synced: 07 Jan 2025

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Jan 2025

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 24 Jan 2025

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 21 Jan 2025

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 28 Nov 2024

https://github.com/dylancl/sitemap-crawler

Verify the status of each url in a (hosted) sitemap XML file.

crawler parser scraper sitemap xml

Last synced: 27 Dec 2024

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 21 Jan 2025

https://github.com/pyohei/rirakkuma-crawller

Crawler for my hobby.🐻

crawler python rirakkuma

Last synced: 29 Dec 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 06 Jan 2025

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 01 Dec 2024

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 16 Jan 2025

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 18 Jan 2025

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 28 Dec 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 25 Dec 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 19 Jan 2025

https://github.com/kyagara/lol-match-crawler

Very simple crawler for League of Legends matches.

crawler league-of-legends pgx postgres riot-games sql

Last synced: 01 Dec 2024

https://github.com/vishaalpkumar/skysift

A distributed search engine from scratch

aws crawler css distributed-systems html java search-engine

Last synced: 22 Dec 2024

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 28 Dec 2024

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 30 Nov 2024

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 25 Dec 2024

https://github.com/nblthree/python-url-crawler

Simple web crawler

crawler python3

Last synced: 03 Dec 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 07 Jan 2025

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 08 Jan 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 06 Dec 2024

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 09 Jan 2025

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 03 Jan 2025

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 24 Jan 2025

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 13 Jan 2025

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 22 Nov 2024

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 13 Jan 2025

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 24 Jan 2025

https://github.com/viper373/xovideos

一个为用户打造的个性化视频下载工具

crawler downloader githubactions m3u8 mongodb mp4 pornhub python

Last synced: 23 Jan 2025

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 28 Dec 2024

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 23 Jan 2025

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 23 Jan 2025

https://github.com/bradsec/gomine

A Go CLI tool to quickly crawl and mine (download) specific file types from websites.

cli crawler golang terminal-based

Last synced: 22 Dec 2024

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 28 Dec 2024

https://github.com/humbertodias/go-nie-crawler

Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.

crawler golang

Last synced: 13 Jan 2025

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 05 Jan 2025

https://github.com/seart-group/github-keyword-crawler

A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints

api-mining crawler dockerized github-api miner mongodb-database python-script

Last synced: 07 Dec 2024

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 25 Jan 2025

https://github.com/miiraak/scrapc

C# WinForms - Crawler & Scraper Web content

crawler csharp html scraper url web windows-forms

Last synced: 13 Oct 2024

https://github.com/thecloer/crawler-himym

How I met your mother script PDF generator for learning English

crawler pdf pdf-generation typescript web-scraping webscraping

Last synced: 10 Dec 2024

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 22 Jan 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 22 Dec 2024

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 03 Jan 2025

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 22 Dec 2024

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 29 Nov 2024

https://github.com/bennettdams/vace-it-crawler

Python (Scrapy) crawler to access data of FACEIT.com

crawler python scrapy

Last synced: 13 Jan 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 16 Jan 2025

https://github.com/thamindur/ir-project

Search Engine for Sri Lankan MPs

crawler elasticsearch python scraping search-engine

Last synced: 17 Dec 2024

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 11 Jan 2025

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 16 Jan 2025

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 10 Jan 2025

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 25 Dec 2024

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 16 Jan 2025

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 26 Jan 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 21 Jan 2025

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 06 Jan 2025

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 09 Jan 2025

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 08 Dec 2024

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 28 Dec 2024

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 08 Jan 2025

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 25 Jan 2025

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 02 Dec 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 02 Dec 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Jan 2025

https://github.com/surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

crawler crawling-framework crawling-python http library python scraping

Last synced: 18 Jan 2025

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 02 Dec 2024

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 02 Dec 2024

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 20 Jan 2025

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 31 Dec 2024

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 28 Nov 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 28 Nov 2024

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 09 Jan 2025

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 28 Nov 2024