Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 15 Nov 2024

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 12 Nov 2024

https://github.com/pyohei/rirakkuma-crawller

Crawler for my hobby.🐻

crawler python rirakkuma

Last synced: 07 Nov 2024

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 06 Nov 2024

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 06 Nov 2024

https://github.com/spaceemotion/goodreads-browser

Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍

books crawler goodreads

Last synced: 06 Nov 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 18 Nov 2024

https://github.com/sandrewtx08/gearbest_scraper

Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.

crawler gearbest lxml python scraper scraping sqlite3

Last synced: 11 Nov 2024

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 07 Nov 2024

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Nov 2024

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 07 Nov 2024

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 07 Nov 2024

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Nov 2024

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 07 Nov 2024

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 06 Nov 2024

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 11 Nov 2024

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 12 Nov 2024

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 16 Nov 2024

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 12 Nov 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 28 Oct 2024

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 12 Nov 2024

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 12 Nov 2024

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 12 Nov 2024

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 10 Nov 2024

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 13 Nov 2024

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 08 Nov 2024

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 06 Nov 2024

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 06 Nov 2024

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Nov 2024

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 15 Oct 2024

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 15 Nov 2024

https://github.com/jesseokeya/linkedin-scraper

Selenium webDriver used to get information from linkedIn

chromedriver crawler linkedin os python scraper selenium-webdriver

Last synced: 06 Nov 2024

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 11 Nov 2024

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 06 Nov 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 06 Nov 2024

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 16 Nov 2024

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 07 Nov 2024

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 06 Nov 2024

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 17 Nov 2024

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 07 Nov 2024

https://github.com/tomfran/crawler

A web crawler written in Rust

bloom-filter crawler rust simhash

Last synced: 10 Nov 2024

https://github.com/spider-rs/spider-clients

Clients to use with the hosted spider service - spider.cloud

ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping

Last synced: 05 Nov 2024

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Nov 2024

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 05 Nov 2024

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 05 Nov 2024

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 08 Nov 2024

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 03 Nov 2024

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 13 Nov 2024

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 17 Nov 2024

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 05 Nov 2024

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 17 Nov 2024

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Nov 2024

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 11 Nov 2024

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 15 Nov 2024

https://github.com/vishaalpkumar/skysift

A distributed search engine from scratch

aws crawler css distributed-systems html java search-engine

Last synced: 05 Nov 2024

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 27 Oct 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 07 Nov 2024

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 13 Oct 2024

https://github.com/iarsham/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 24 Oct 2024

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 07 Nov 2024

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 13 Nov 2024

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 14 Nov 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 13 Nov 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 04 Nov 2024

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 07 Oct 2024

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 04 Nov 2024

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 27 Oct 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 10 Nov 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 03 Nov 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 03 Nov 2024

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 06 Nov 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Nov 2024

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 15 Nov 2024

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 13 Nov 2024

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 02 Nov 2024

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 06 Nov 2024

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 23 Oct 2024

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 19 Oct 2024

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 15 Oct 2024

https://github.com/kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 26 Oct 2024

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 12 Nov 2024

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 23 Oct 2024

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 12 Nov 2024

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 13 Nov 2024

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 09 Nov 2024