Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-02-11 00:06:38 UTC
JSON Representation

https://github.com/giscafer/airlevel-crawler

a demo of crawler for air-level.com

crawler java nodejs

Last synced: 17 Nov 2024

https://github.com/moehmeni/ezweb

Easy to use web page analyzer

analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www

Last synced: 05 Nov 2024

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 15 Nov 2024

https://github.com/vitorebatista/horoscopefree

The Astrology API Rest daily horoscope

crawler horoscope horoscope-crawler horoscopes-api

Last synced: 30 Nov 2024

https://github.com/typingmonk/mnd_adiz_news_crawler

Web crawler that target to mnd.gov.tw post relate to ADIZ(防空識別區) report.

crawler

Last synced: 25 Jan 2025

https://github.com/vinitkumar/pycrawler

Crawler in Python 3.7, 3.8. 3.9. Pypy3

crawler python python35 python36 utils

Last synced: 28 Oct 2024

https://github.com/yjyoon-dev/nara-crawler

Crawler for National Archives Catalog

crawler python scrapy

Last synced: 20 Nov 2024

https://github.com/lon9/arxiv

For scraping arxiv.org

arxiv crawler golang

Last synced: 29 Jan 2025

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 17 Nov 2024

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 28 Oct 2024

https://github.com/moqsien/scrapx

scrapy定制版; A customized and enhanced version of scrapy for managing hundreds or even thousands of spiders.

crawler framework pymongo scrapy spider

Last synced: 20 Nov 2024

https://github.com/eished/tujigu_crawler

tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler

crawler node nodejs

Last synced: 29 Jan 2025

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 19 Dec 2024

https://github.com/trudi-group/mc-crawler

A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).

crawler mobilecoin rust

Last synced: 02 Dec 2024

https://github.com/foolin/scrago

An simpe, fast, extensible crawl page framework for golang

crawler go scrago scrapy

Last synced: 05 Jan 2025

https://github.com/wangshouh/sdufelib_seat_crawler

SDUFE Library Reservation Seat Monitoring Crawler

crawler python

Last synced: 02 Feb 2025

https://github.com/hxr16f/ss-grabber

Automation script for downloading user screenshots.

automation crawler downloader grabber lightshot screenshot script

Last synced: 27 Nov 2024

https://github.com/omerdogan3/kitapp-crawler

Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.

book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler

Last synced: 06 Feb 2025

https://github.com/iml1111/toonkor_collector

툰코 만화 수집기

crawler python

Last synced: 09 Dec 2024

https://github.com/xcrypt0r/hyacinth

🌸 Dcinside image crawler with deadly simple structure

beautifulsoup4 crawler dcinside parsing pyqt5 pyside2

Last synced: 09 Jan 2025

https://github.com/doroudi/imdb-crawler

imdb.com movies crawler in scrapy

crawler data-mining python scrapy

Last synced: 06 Feb 2025

https://github.com/kernelerr/pixivsync

Pixiv图片下载及同步工具

crawler pixiv pixiv-crawler python

Last synced: 19 Nov 2024

https://github.com/vmdang/historycrawler

The OOP project collects historical data in Vietnam and displays

crawler gson java javafx jsoup

Last synced: 11 Oct 2024

https://github.com/zain-ul-din/lgu-crawler

LGU timetable Crawler

contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source

Last synced: 10 Dec 2024

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.

college crawler extract html-parser python python-script ugc

Last synced: 12 Dec 2024

https://github.com/mirocow/yii2-crawler

Http concurrent crawler for Yii2

concurrency crawler guzzle yii2-extension

Last synced: 17 Jan 2025

https://github.com/1uc1f3r616/dark-net-websites-dataset

Dataset of Onion Websites

crawler darknet data-analysis dataset onion search-engine website

Last synced: 10 Jan 2025

https://github.com/ivan-alone/instastories-saver-cpp

Program to saving Instagram Stories - Rewritten to C++

api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories

Last synced: 19 Dec 2024

https://github.com/sanmak/queue-web-crawler

This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.

async chai crawler csv hyperlinks mocha nodejs queue scrapper web

Last synced: 28 Nov 2024

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 26 Dec 2024

https://github.com/pjt3591oo/golang-crawler

golang으로 크롤러 만들기

crawler golang

Last synced: 26 Dec 2024

https://github.com/aprilnea/xjtlu

This is how to get all the network resources of XJTLU.

crawler gateway http-auth python spider web-crawler xjtlu

Last synced: 15 Nov 2024

https://github.com/inishchith/python-scripts

Some Scripts & Projects

crawler python-script python3 scripts youtube

Last synced: 19 Dec 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 19 Jan 2025

https://github.com/aminehsan/crawler-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scarping

Last synced: 31 Jan 2025

https://github.com/basemax/instagramseleniumhashtagimagepython

Instagram Selenium Python: A selenium-based crawler to extract images from special hashtags on Instagram.

crawler crawler-python crawlers instagram python python-selenium selenium selenium-python

Last synced: 09 Feb 2025

https://github.com/basemax/firstselenium

Some sample codes for using selenium in Python just for fun.

crawl crawler crawlers crawling python python-selenium python3 selenium selenium-example selenium-py selenium-python selenium-sample selenium-tests selenium-website

Last synced: 09 Feb 2025

https://github.com/mrrfv/webarchive

Crawls websites and saves found URLs to a file.

archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping

Last synced: 27 Oct 2024

https://github.com/fzdwx/go-pachong

go 爬虫，能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url

crawler go golang

Last synced: 08 Feb 2025

https://github.com/pyaesoneaungrgn/2d-crawler

2D crawler for set.or.th

2d 2d-crawler crawler myanmar php

Last synced: 09 Nov 2024

https://github.com/alexmili/reachable

Check if a URL exists and is reachable

crawler health-check monitoring reachability webscraping

Last synced: 10 Dec 2024

https://github.com/igeligel/TeamFortressOutpostApi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 13 Nov 2024

https://github.com/spa5k/quick-scraper

An easy, lightweight scraper built using typescript for good developer experience.

crawler dx easy-to-use esbuild scraper typescript

Last synced: 13 Nov 2024

https://github.com/arthurc0102/ntub-bot

北商大教學評量機器人

bot crawler ntub

Last synced: 27 Jan 2025

https://github.com/obaskly/kikfriender.com-bot

A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.

ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist

Last synced: 08 Jan 2025

https://github.com/chenyangguang/hundun

crawler go gocolly

Last synced: 14 Jan 2025

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 25 Jan 2025

https://github.com/thaddeusjiang/campcat

キャンプ場予約情報監視 Bot

bot crawler telegram

Last synced: 25 Oct 2024

https://github.com/joshuaquek/docusite-to-pdf

Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.

crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper

Last synced: 12 Jan 2025

https://github.com/mouday/httpserver

用于爬虫请求头测试的简单服务器，使用Python + Flask

crawler flask python spider

Last synced: 26 Jan 2025

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 15 Jan 2025

https://github.com/zenrows/crawling-from-scratch

Repository for the Mastering Web Scraping in Python: Crawling from Scratch blogpost with the final code.

crawler crawling python python3 scraping

Last synced: 16 Jan 2025

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 19 Nov 2024

https://github.com/mouday/freeipproxy

通过抓取免费代理ip维护一个有效的proxy代理池

crawler proxy python spider

Last synced: 26 Jan 2025

https://github.com/ruedigervoigt/salted

Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files

asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python

Last synced: 11 Oct 2024

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 10 Feb 2025

https://github.com/ericz99/go-crawler

Simple lightweight crawler, that will find all endpoints on any website.

crawler golang

Last synced: 28 Jan 2025

https://github.com/cls1991/gank.io

抓取干货集中营图片资源 (http://gank.io)

crawler curl gankio picture

Last synced: 11 Nov 2024

https://github.com/gatenlp/wpextract

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Last synced: 13 Nov 2024

https://github.com/mmqnym/etherscan_tracker

Show how to tacker wallet on etherscan.io

crawler ethereum python

Last synced: 18 Jan 2025

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL

crawler fastimage hero hero-image opengraph webscraping

Last synced: 15 Jan 2025

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 10 Feb 2025

https://github.com/yakuza8/coronavirus-timeseries-predictor

Timeseries analyzer for coronavirus with recurrent neural network

asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper

Last synced: 24 Jan 2025

https://github.com/feliz-szk/berserk

Berserk: Crawler to increase web traffic(based on tor and privoxy)

anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser

Last synced: 12 Jan 2025

https://github.com/reycn/china-drug-trials-crawler

A web crawler for Chinadrugtrials.org.cn, written in Python 3.6+.

china crawler drug python scraper

Last synced: 12 Jan 2025

https://github.com/huzecong/film-spider

Spiders crawling for film listing websites.

crawler

Last synced: 11 Jan 2025

https://github.com/a-x-/scian

Simple cian stat

cian crawler static-site

Last synced: 11 Jan 2025

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 13 Jan 2025

https://github.com/georgea93/crawley

nodejs web crawler

crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn

Last synced: 21 Jan 2025

https://github.com/zhaotianff/qzone

想起那天夕阳下的奔跑，那是我逝去的青春

crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf

Last synced: 15 Jan 2025

https://github.com/capturr/price-extract

Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.

amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript

Last synced: 07 Jan 2025

https://github.com/qin2dim/istockphoto-go

📸 Gracefully download dataset from iStockPhoto.

colly crawler istockphoto

Last synced: 10 Feb 2025

https://github.com/archan937/webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing

Last synced: 10 Nov 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/rodyherrera/cdrake-se

✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).

bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube

Last synced: 25 Dec 2024

https://github.com/sergioburdisso/solidscraper

Easy to use JQuery-Like API for Web Scraping/Crawling.

crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping

Last synced: 23 Nov 2024

https://github.com/jmkim/stock-crawler

Universal Stock Crawler

crawler stock stock-market yahoo-finance

Last synced: 26 Jan 2025

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging

crawler

Last synced: 13 Dec 2024

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 28 Dec 2024

https://github.com/exp-codes/python-crawler-template

Python 爬虫开发模板

crawler programming template

Last synced: 09 Feb 2025

https://github.com/t-rekttt/tlu-schedule

chatfuel crawler nodejs vuejs

Last synced: 09 Dec 2024

https://github.com/zhaotianff/crawler-line

C# command-line crawler

command-line command-line-tool crawler csharp dotnet-core

Last synced: 15 Jan 2025

https://github.com/fanyong920/crawlitem-puppeteer

puppeteer抓取商品的例子

chromnium crawler javascript nodejs puppeteer scrapy

Last synced: 23 Dec 2024

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/basemax/film2serial-api-service-crawler

Crawling content and Movies of a Persian site using PHP.

crawler crawler-movie crawler-php crawlers movie-crawler movie-database php php-crawler php7 php74

Last synced: 23 Jan 2025

https://github.com/YektaDev/Krawler

A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.

crawl crawler crawlers crawling

Last synced: 06 Feb 2025

https://github.com/ozansz/github-crawler

A basic utility for crawling users and e-mails of users

crawler github python python3

Last synced: 02 Feb 2025

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 28 Nov 2024

https://github.com/xdk78/grabbi

grabbi a simple web scraper/crawler

crawler html scraper web-scraper

Last synced: 13 Jan 2025

https://github.com/developerdavi/meli-crawler

Basic web crawler API for getting products from MercadoLibre (BRL | MLB)

api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit

Last synced: 25 Nov 2024

https://github.com/agmmnn/nis-scraper

Scrapy script to scrape nisanyansozluk.com

cli crawler python scraper

Last synced: 21 Dec 2024

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 06 Jan 2025