An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 18 Mar 2025

https://github.com/ashwantmanikoth/intellilsearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 15 Apr 2026

https://github.com/tjdsneto/jcnet-crawler

Extract (scrap) movie schedule info from JCNet movies page

crawler scraping

Last synced: 11 Apr 2026

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 13 Mar 2025

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 24 Jun 2025

https://github.com/codegram01/go-ai-crawl

Golang Web Crawl with AI

ai chromedp crawler golang ollama

Last synced: 16 Apr 2026

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/yokoyang/baidu-crawler

tieba_crawler

crawler

Last synced: 16 Jun 2025

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 21 Apr 2026

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 28 Feb 2025

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 22 Jun 2025

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 13 Apr 2026

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 24 Mar 2025

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 30 Jul 2025

https://github.com/shashankgroovy/crawler

Python crawler

crawler python webcrawler

Last synced: 30 Jul 2025

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 11 Apr 2026

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 23 Mar 2025

https://github.com/roele/roast

A JVM Data Crawler

cli crawler jvm

Last synced: 16 May 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 11 Sep 2025

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 09 Jan 2026

https://github.com/yangxuhui/requests-google

A simple google related Parsing Package

crawler google-api parsing

Last synced: 14 Jan 2026

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 06 Jul 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 14 May 2026

https://github.com/loko5ja/seed-gen

Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.

crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman

Last synced: 03 Apr 2025

https://github.com/azshurith/depth-crawler

A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.

crawler puppeteer python

Last synced: 20 Apr 2026

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 17 May 2026

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 01 Mar 2025

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 21 Jun 2025

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 07 Apr 2025

https://github.com/jplitza/urlsearch

Index typical webserver directory listings and then search for arbitrary terms

crawler search

Last synced: 17 Mar 2025

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 30 Oct 2025

https://github.com/xprnvd/makdi

Website crawler created for pentest exercises like HTB.

crawler htb htb-scripts pentest python

Last synced: 20 Jul 2025

https://github.com/andrepradika/scrape-medrecruit.medworld.com

🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/andrepradika/scrape-xpel.com

📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 16 Mar 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 31 May 2026

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 13 Sep 2025

https://github.com/pengkobe/my-web-crawler

auto pull blog update from bloggers. dev based on angular2

crawler nodejs

Last synced: 18 May 2026

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 13 Apr 2026

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 01 Apr 2025

https://github.com/apurvsikka/mediaverse

MediaVerse is a versatile search engine for various media types such as anime, books and drama

anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv

Last synced: 29 Mar 2025

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 08 Jul 2025

https://github.com/d7isme/pixiv-downloader-mod

Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.

chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader

Last synced: 14 May 2026

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 10 May 2026

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 30 Oct 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 26 Mar 2025

https://github.com/kweonminsung/crawl2toast

Real-time toast notification of crawled data with CSS selectors(Windows Only)

beautifulsoup4 crawler selenium tkinter toast-notifications

Last synced: 18 May 2026

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 03 Jan 2026

https://github.com/lfsc09/crawl-this-go

Simple CLI tool for crawling pdf documents and html pages

crawler go

Last synced: 18 Jun 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 10 Apr 2025

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 10 Apr 2025

https://github.com/vaenow/crawler-chromeless

A chromeless crawler for coursera

chromeless coursera crawler puppeteer

Last synced: 18 May 2026

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 29 Oct 2025

https://github.com/javapuppteernodejs/bypass-awswaf-crawl4ai

Bypass AWS WAF with Crawl4AI & CapSolver: A personal developer's guide to seamless web scraping on WAF-protected sites, featuring API and browser extension integration examples.

automation aws aws-waf capsolver captcha captcha-solver crawl4ai crawler crawling python web-scraping

Last synced: 28 Dec 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 13 Mar 2025

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 16 Jun 2025

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 11 May 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 28 Dec 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 15 Mar 2025

https://github.com/yaoshanliang/linkedinspider

Crawl job information from LinkedIn for data analysis

big-data crawler python social-network-analysis

Last synced: 30 Mar 2025

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 14 Sep 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Apr 2026

https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance

Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics

arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics

Last synced: 28 Apr 2026

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 05 May 2026

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 12 Jun 2026

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 02 Feb 2026

https://github.com/allanbian1017/mbpprice

二手Macbook Pro資訊

crawler python

Last synced: 14 Jan 2026

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 27 Mar 2025

https://github.com/laffrex/xiaolanben_crawler

一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。

crawler pandas selenium src

Last synced: 23 Mar 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/mehdieidi/offliner

Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.

concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread

Last synced: 14 Jan 2026

https://github.com/heitor57/astronomy-news

:telescope::newspaper: Astronomy News

crawler data-science news text-mining

Last synced: 06 Oct 2025

https://github.com/xyk2002/aqistudy-crawler

关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件

aqistudy crawler python-3

Last synced: 22 Aug 2025

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 18 May 2026

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/b3j4y/unidisk

A Crawler to search for keywords and compare the score

comparison crawler nlp solr-client

Last synced: 17 Jan 2026

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 29 Mar 2025

https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai

This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.

artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec

Last synced: 04 May 2026

https://github.com/taleblou/brokenlinkchecker_python

This Python web crawler traverses a website, verifies resource links (CSS, JS, images, videos, iframes), and identifies broken links with HTTP errors (400-599)

crawler http links python resources website

Last synced: 03 Apr 2025