Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 08 Dec 2024

https://github.com/bkdev98/ebooks-crawler

Ebooks crawler for personal purpose using ReactJS.

crawler material-ui nodejs reactjs

Last synced: 01 Jan 2025

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 18 Dec 2024

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 18 Dec 2024

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 13 Dec 2024

https://github.com/codelegant/movie-crawler-api

淘宝,猫眼,格瓦拉影票信息抓取接口

async await crawler mongoose request

Last synced: 18 Dec 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 03 Jan 2025

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 01 Jan 2025

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 25 Dec 2024

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 25 Dec 2024

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 25 Dec 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 26 Dec 2024

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 02 Jan 2025

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Dec 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 03 Jan 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Nov 2024

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 03 Jan 2025

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 10 Jan 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 11 Jan 2025

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 11 Jan 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/linux0hat/cpp-web-crawler

Explore the web.

cpp crawler sqlite3

Last synced: 12 Jan 2025

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 12 Jan 2025

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 12 Jan 2025

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 12 Jan 2025

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 06 Jan 2025

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 09 Jan 2025

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 08 Dec 2024

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 05 Dec 2024

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 27 Nov 2024

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 27 Nov 2024

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 19 Nov 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, social metrics, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 16 Nov 2024

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 10 Jan 2025

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 03 Jan 2025

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Nov 2024

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Jan 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Jan 2025

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 02 Dec 2024

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 05 Jan 2025

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 19 Nov 2024

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 19 Nov 2024

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 16 Nov 2024

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 28 Nov 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 02 Dec 2024

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/igorbrizack/web-scraper

Web-Scraper aplication

crawler pytest python3 scraper

Last synced: 28 Nov 2024

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 28 Nov 2024

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 19 Nov 2024

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 26 Dec 2024

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 02 Dec 2024

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 04 Jan 2025

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 02 Dec 2024

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 28 Nov 2024

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 13 Jan 2025

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 14 Oct 2024

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 06 Jan 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 21 Dec 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Jan 2025

https://github.com/shaoxiongdu/skyeye

一个基于SpringBoot的全网热点爬虫项目,原始热搜数据会入库,分词统计会存入Redis。方便之后的数据分析。

crawler crawlers mysql redis spring spring-boot

Last synced: 16 Nov 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 21 Dec 2024

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 31 Dec 2024

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 08 Jan 2025

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 28 Nov 2024

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 11 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 12 Jan 2025

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 11 Jan 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 28 Dec 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 28 Nov 2024

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 25 Dec 2024