Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/bradsec/gofindfiles

Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.

crawler osint osint-tool recon scraper web-scraper

Last synced: 07 Jan 2025

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 05 Jan 2025

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 19 Nov 2024

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 19 Nov 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 21 Dec 2024

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 02 Dec 2024

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 10 Jan 2025

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 29 Nov 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 13 Nov 2024

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 06 Jan 2025

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Nov 2024

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 28 Nov 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 02 Dec 2024

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 08 Jan 2025

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 05 Jan 2025

https://github.com/igorbrizack/web-scraper

Web-Scraper aplication

crawler pytest python3 scraper

Last synced: 28 Nov 2024

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 28 Nov 2024

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 08 Jan 2025

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 12 Dec 2024

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 01 Jan 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 21 Dec 2024

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 02 Dec 2024

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 02 Jan 2025

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 02 Dec 2024

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 28 Nov 2024

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 26 Dec 2024

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 05 Dec 2024

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 07 Jan 2025

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 08 Jan 2025

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 26 Dec 2024

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 31 Dec 2024

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 28 Nov 2024

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 26 Dec 2024

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 13 Nov 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 19 Nov 2024

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 26 Dec 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 28 Nov 2024

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 22 Nov 2024

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 13 Nov 2024

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 28 Nov 2024

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 28 Nov 2024

https://github.com/noarche/darknoisy

Same as my Noisy but on TOR network. Logs links. Crawls onion sites.

crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks

Last synced: 03 Dec 2024

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 14 Nov 2024

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 13 Nov 2024

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 08 Dec 2024

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 03 Jan 2025

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 03 Jan 2025

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 22 Nov 2024

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 15 Nov 2024

https://github.com/d7isme/pixiv-downloader-mod

Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.

chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader

Last synced: 09 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 26 Dec 2024

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 09 Dec 2024

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 03 Dec 2024

https://github.com/palpitate-xus/sge_data_insert

利用Github Actions实现自动获取sge数据并存入数据库

crawler mysql python

Last synced: 16 Dec 2024

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 03 Dec 2024

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 03 Dec 2024

https://github.com/devindon/movie-crawler

Movie crawler for douban.com, pianku.tv, etc.

crawler nodejs typescript

Last synced: 06 Dec 2024

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 03 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 13 Nov 2024

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 23 Dec 2024

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 23 Dec 2024

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 03 Dec 2024

https://github.com/alancesar/crawler

HTML crawler

crawler docker spider

Last synced: 03 Dec 2024

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 23 Dec 2024

https://github.com/davelongdev/link-report-crawler

A web crawler using Node.js that crawls a site and returns a report showing all internal links.

crawler crawling javascript seo seo-tools

Last synced: 02 Jan 2025

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Nov 2024

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 02 Dec 2024

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 08 Jan 2025

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 04 Dec 2024

https://github.com/allotmentandy/socialmedialinkextractor

php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links

crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube

Last synced: 23 Dec 2024

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 20 Dec 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/sahaavi/web-scraping

Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!

beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping

Last synced: 26 Dec 2024

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 30 Nov 2024

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 09 Dec 2024

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Jan 2025

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 04 Dec 2024

https://github.com/rutopio/crawler-cpbl-player-data

針對中華職棒官網的球員資料進行爬蟲與整理。

cpbl crawler crawling python

Last synced: 04 Dec 2024

https://github.com/rutopio/crawler-2020-taiwanese-election-results

2020 台灣選舉結果爬蟲:以不分區政黨票為例

crawler python

Last synced: 04 Dec 2024

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 16 Nov 2024