Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/ozakboy/taiwan-news-crawlers

.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)

crawler data-collection dataset-generation dotnet news taiwan webcrawlers

Last synced: 22 Jan 2025

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 03 Jan 2025

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 21 Dec 2024

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 11 Jan 2025

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 21 Jan 2025

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 20 Dec 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 17 Jan 2025

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/r3c0ger/liscaps

A LSTM-based intelligent stock crawl, analysis and prediction system.

crawler lstm python pytorch stock streamlit

Last synced: 11 Nov 2024

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 16 Dec 2024

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 24 Dec 2024

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 19 Dec 2024

https://github.com/panyanyany/vps_spider

VPS Spider powering https://findallvps.com

crawler spider vps

Last synced: 11 Jan 2025

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 14 Dec 2024

https://github.com/skulltech/arachnid

Crawling Instagram for reasons.

crawler instagram instagram-scraper python3 scraper scrapy

Last synced: 01 Feb 2025

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 17 Dec 2024

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 30 Jan 2025

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 16 Dec 2024

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 01 Jan 2025

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 19 Dec 2024

https://github.com/e73b025/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 11 Nov 2024

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 05 Jan 2025

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 19 Dec 2024

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 07 Jan 2025

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 23 Jan 2025

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 28 Jan 2025

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 23 Dec 2024

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 07 Jan 2025

https://github.com/rdil/crawley

My attempt at a web crawler.

bs4 crawler python python3 web

Last synced: 04 Jan 2025

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 16 Jan 2025

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 19 Dec 2024

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 14 Dec 2024

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/viclafouch/pe-crawler

📌 An automated system that serves data extracted from the Google Help Center

crawler javascript nodejs postgresql sequelize

Last synced: 29 Jan 2025

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 08 Jan 2025

https://github.com/santhin/real-estate

Real estate crawler with ML on scraped data

crawler jupyter-notebook ml real-estate scrapy

Last synced: 24 Jan 2025

https://github.com/epigos/newsbot

A news bot written in Go for Dialogflow and Facebook messenger

autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed

Last synced: 27 Jan 2025

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 18 Jan 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 26 Dec 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 22 Dec 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 28 Dec 2024

https://github.com/alexzhangs/stockdb

Stock data collecting and analyzing

crawler django pandas scrapy stock tushare

Last synced: 08 Jan 2025

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 25 Dec 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Jan 2025

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 21 Jan 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 02 Feb 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 27 Jan 2025

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 28 Dec 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 23 Dec 2024

https://github.com/laurybueno/monibus

API de monitoramento de ônibus em São Paulo

api crawler django docker mapping sptrans

Last synced: 27 Jan 2025

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Jan 2025

https://github.com/opda0887/bahamut-crawler-to-gmail

發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 26 Jan 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 04 Jan 2025

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 07 Jan 2025

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 21 Jan 2025

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 25 Dec 2024

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 28 Dec 2024

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 21 Jan 2025

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Jan 2025

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 16 Dec 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Jan 2025

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 26 Jan 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 21 Dec 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/stevieflyer/quokka

An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.

crawler parallel web-automation

Last synced: 14 Dec 2024

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 08 Jan 2025

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 31 Dec 2024

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 21 Jan 2025

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 16 Jan 2025

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 27 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024