An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 04 Aug 2025

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 25 Jul 2025

https://github.com/basemax/okala-database-crawler

A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.

crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper

Last synced: 01 May 2026

https://github.com/b3j4y/unidisk

A Crawler to search for keywords and compare the score

comparison crawler nlp solr-client

Last synced: 17 Jan 2026

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 06 May 2026

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 24 Jul 2025

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 27 Mar 2025

https://github.com/claudio-code/nap-web-crawler

Created It crawler to find broken links in docs of framework and languages

crawler

Last synced: 07 Jul 2025

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 15 Apr 2026

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 23 Jul 2025

https://github.com/heyihuang826/ncku_course

Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).

captcha crawler onedrive

Last synced: 01 Mar 2026

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 07 Jun 2026

https://github.com/yaoshanliang/linkedinspider

Crawl job information from LinkedIn for data analysis

big-data crawler python social-network-analysis

Last synced: 30 Mar 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 27 May 2026

https://github.com/mccranky83/aistudy-docs-crawler

上海市中小学数字教学系统爬虫

crawler hoarding puppeteer

Last synced: 07 Apr 2025

https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance

Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics

arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics

Last synced: 28 Apr 2026

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 28 Jun 2025

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 30 Mar 2025

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/nyarla/net-paranoid-go

(WIP) A paranoidic helpers for untrusted web content crawler

crawler filtering golang helper

Last synced: 14 Jan 2026

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 07 Oct 2025

https://github.com/greytabby/grawl

Simple web crawler for learning.

crawler

Last synced: 14 Jan 2026

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Jul 2025

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 07 May 2026

https://github.com/lolyratul025/web-email-bundler

A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.

crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping

Last synced: 22 May 2026

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 25 Feb 2025

https://github.com/liuzhuan/simple-spider

A simple python web spider.

crawler python python-3

Last synced: 30 Mar 2025

https://github.com/sandrewtx08/gearbest_scraper

Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.

crawler gearbest lxml python scraper scraping sqlite3

Last synced: 23 Jul 2025

https://github.com/kettou/silentscraper

SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more

beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping

Last synced: 23 Jul 2025

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 08 Oct 2025

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 14 Sep 2025

https://github.com/romangw/lukki

Completely free code for a webcrawling bot.

crawler python web-scraping web-scraping-python

Last synced: 08 Oct 2025

https://github.com/killianmeersman/wander

Convenient scraping library for Gophers

crawler data-mining golang scraper spider

Last synced: 14 Jan 2026

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 22 Jun 2025

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 12 Feb 2026

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 18 Mar 2025

https://github.com/patrik-fredon/python_wallpaper_crawler

Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.

crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv

Last synced: 14 Sep 2025

https://github.com/bernieyangmh/check-link

Checking through whole website, identifying broken links.

checkurl crawler golang

Last synced: 14 Jan 2026

https://github.com/peterbencze/silene

Silene is an open source web crawler framework built upon Pyppeteer.

crawler framework pypp python scraper webcrawler

Last synced: 12 Jan 2026

https://github.com/pyohei/rirakkuma-crawller

Crawler for my hobby.🐻

crawler python rirakkuma

Last synced: 29 Nov 2025

https://github.com/gxjansen/website-to-pdf

Creates a PDF based on the content of a website/subomain

claude-3-sonnet crawler python3

Last synced: 30 Mar 2025

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 22 Mar 2025

https://github.com/kyungw00k/stealth-wright

Silent browser automation CLI with stealth capabilities

crawler go playwright stealth-automation

Last synced: 31 May 2026

https://github.com/daitangio/find

Python + SQLite search engine

crawler indexer python search-engine

Last synced: 18 Jan 2026

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 27 Jun 2025

https://github.com/panagiotisptr/codeforces-companion

A codeforces parser, code tester and testcase generator in Go

codeforces-parser competitions crawler go golang parser test-automation testing

Last synced: 14 Jan 2026

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 09 Oct 2025

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 28 Mar 2025

https://github.com/dappsar/ethglobal-crawler

A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.

crawler ethglobal python

Last synced: 09 Oct 2025

https://github.com/mstephen19/apify-click-events

Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to

apify apify-sdk crawler scraper web-automation

Last synced: 23 Aug 2025

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 09 Oct 2025

https://github.com/humbertodias/go-nie-crawler

Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.

crawler golang

Last synced: 03 Jul 2026

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 22 May 2026

https://github.com/choewy/python-g2b-crawler

나라장터 입찰정보 크롤러

crawler pyqt5 python selenium

Last synced: 28 Jun 2026

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 15 Aug 2025

https://github.com/kiranjisonawane143/blockchain-data-crawler

🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 06 May 2026

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 10 May 2026

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 06 May 2026

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 19 May 2026

https://github.com/boatraceventureproject/boatracescraper

The BVP Crawler package for Boatrace.

boatrace crawler php php-library php8

Last synced: 17 Mar 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/slava-vishnyakov/grucrawler

Simple Ruby crawler

crawler ruby

Last synced: 25 Oct 2025

https://github.com/cafitac/ai-crawler

AI-driven network-first crawler compiler for authorized workflows

agents ai crawler http mcp python scraping

Last synced: 31 May 2026

https://github.com/zrquan/gatherer

Gatherer 是一个简易的爬虫工具

crawler infosec pentest security

Last synced: 14 Jan 2026

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/evansuner/smartproxypool

智能代理,自动获取可用高匿代理

crawler fastapi proxy python

Last synced: 15 May 2026

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 22 Jul 2025

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 10 Jun 2026

https://github.com/ninja-yubaraj/lootbin

A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.

crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin

Last synced: 11 Oct 2025

https://github.com/andreposman/magic-number

A CLI Tool/API to calculate the passive income in FII's

crawler finance golang

Last synced: 14 Jan 2026

https://github.com/maddevsio/spiderwoman

"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources

big-data count-links crawler golang

Last synced: 19 May 2026

https://github.com/katronquillo/grimm

Simple search engine for the Brothers Grimm Fairy Tales

crawler elasticlunr react

Last synced: 24 Apr 2026

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 16 Jan 2026

https://github.com/weizujie/python3-spider

Python 写的一些爬虫小脚本

crawler python3

Last synced: 18 May 2026

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 25 Jan 2026

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Jan 2026

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 12 Oct 2025

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 21 Mar 2025

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 14 May 2025

https://github.com/jesseokeya/linkedin-scraper

Selenium webDriver used to get information from linkedIn

chromedriver crawler linkedin os python scraper selenium-webdriver

Last synced: 29 Apr 2026

https://github.com/ignmaro/new

The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.

bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3

Last synced: 13 Oct 2025

https://github.com/mevljas/gov.si-crawler-playwright

A standalone crawler that crawls only .gov.si web sites using Playwright.

crawler multithreading playwright sqlachemy

Last synced: 19 Jan 2026

https://github.com/adham90/github_user_crawler

GeekHub: github username crawler

crawler github-api

Last synced: 21 Mar 2025

https://github.com/abx123/coronachan

Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.

crawler lambda-functions python

Last synced: 28 Mar 2025

https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai

This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.

artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec

Last synced: 04 May 2026

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 29 Mar 2025

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 18 May 2026

https://github.com/hiscaler/fetch-one-page

Fetch one page by configs

crawler golang

Last synced: 06 Nov 2025

https://github.com/xyk2002/aqistudy-crawler

关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件

aqistudy crawler python-3

Last synced: 22 Aug 2025

https://github.com/mt4110/postal_converter_ja

High-performance Japanese Postal Code Converter & API. Auto-updating, DB-agnostic (MySQL/PostgreSQL), written in Rust & Next.js.日本郵便局のデータを自動更新機能付き、Rustの非同期クローリングシステム。最加速で最新の郵便番号データの更新化がされます。

api crawler docker mysql nextjs nix postgresql react rust

Last synced: 13 Feb 2026

https://github.com/constaf79/pycn

🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.

cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web

Last synced: 14 May 2026

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 11 May 2025