Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/keosariel/ramby

Ramby is a simple way to setup a webscraper

beautifulsoup crawler python3 webscraping

Last synced: 06 Dec 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 21 Dec 2024

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 14 Dec 2024

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 03 Jan 2025

https://github.com/viclafouch/pe-crawler

📌 An automated system that serves data extracted from the Google Help Center

crawler javascript nodejs postgresql sequelize

Last synced: 02 Dec 2024

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 12 Jan 2025

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 27 Nov 2024

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 07 Jan 2025

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 13 Jan 2025

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 20 Dec 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 17 Jan 2025

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 20 Jan 2025

https://github.com/nextlevelshit/fick

Fucking Incredible Command line King. Add CLI flavour to any website you like to.

cli crawler

Last synced: 20 Jan 2025

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 03 Jan 2025

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 14 Jan 2025

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 16 Dec 2024

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 24 Dec 2024

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 03 Jan 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 23 Jan 2025

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 21 Jan 2025

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 28 Dec 2024

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 25 Dec 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 21 Jan 2025

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 07 Jan 2025

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 16 Dec 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 23 Dec 2024

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 28 Dec 2024

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 21 Dec 2024

https://github.com/stevieflyer/quokka

An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.

crawler parallel web-automation

Last synced: 14 Dec 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 25 Dec 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 28 Dec 2024

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 26 Dec 2024

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 08 Jan 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 14 Jan 2025

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 30 Dec 2024

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 29 Dec 2024

https://github.com/efishery/wpi-kkp-crawler

This is crawler for fisheries price on wpi.kkp.go.id

crawler kkp wpi

Last synced: 02 Jan 2025

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 02 Jan 2025

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 29 Nov 2024

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 21 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 22 Dec 2024

https://github.com/alexzhangs/stockdb

Stock data collecting and analyzing

crawler django pandas scrapy stock tushare

Last synced: 08 Jan 2025

https://github.com/tsaohucn/crawler_fb_group

This is crawler use selenium for facebook groups

crawler facebook-groups rails ruby

Last synced: 20 Jan 2025

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 06 Dec 2024

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 06 Dec 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Jan 2025

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 21 Jan 2025

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 21 Jan 2025

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 08 Dec 2024

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Jan 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 04 Jan 2025

https://github.com/excaliburhan/littlenews

A news app via electron

crawler electron rss-feed

Last synced: 29 Nov 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Nov 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Jan 2025

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 06 Dec 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 08 Jan 2025

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 23 Jan 2025

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 16 Jan 2025

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 22 Jan 2025

https://github.com/hamidrabedi/digikala-crawler

a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls

crawler digikala digikala-crawler django python scraper

Last synced: 14 Dec 2024

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 14 Dec 2024

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 22 Jan 2025

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 06 Dec 2024

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 22 Jan 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 16 Jan 2025

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 31 Dec 2024

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider

Last synced: 29 Nov 2024

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 14 Jan 2025

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 14 Jan 2025

https://github.com/programming-with-love/skyeyesystem

天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 16 Jan 2025

https://github.com/mazzasaverio/scrapy-playwright-scrapegraphai

Web crawler using Scrapy + Playwright for dynamic content, featuring YAML-based configuration, PostgreSQL storage via aiosql, structured logging with logfire, and complete Docker/Terraform infrastructure. Built with uv package manager and Python 3.11+.

aiosql crawler docker playwright scrapy scrapy-playwright terraform uv

Last synced: 14 Jan 2025

https://github.com/weaming/simple-crawler

my simple crawler

crawler

Last synced: 12 Jan 2025

https://github.com/ryanchao2012/okbot

A conversation retrieval engine based on PTT corpus

chatbot crawler django ptt

Last synced: 12 Jan 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 17 Dec 2024