Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 08 Jan 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 26 Dec 2024

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Jan 2025

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 28 Dec 2024

https://github.com/sinkaroid/webnovelcrawler

Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.

crawler dompdf webnovel

Last synced: 23 Dec 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 25 Dec 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Jan 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 27 Jan 2025

https://github.com/exp-codes/pyzone-crawler

QQ空间爬虫(Python版)

crawler programming

Last synced: 16 Dec 2024

https://github.com/0fatal/zjxxc-crawl

在浙学爬虫:作业情况和登录

crawler

Last synced: 16 Dec 2024

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 28 Dec 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 23 Dec 2024

https://github.com/sefinek/niedlascamu.pl-tracker

Śledzenie zmian na stronie niedlascamu.pl.

crawl crawler niedlascamu tracker tracking

Last synced: 07 Dec 2024

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 07 Jan 2025

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 21 Jan 2025

https://github.com/zhoudaxia233/unilogo

A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.

crawler python visualization

Last synced: 15 Dec 2024

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 25 Dec 2024

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 28 Dec 2024

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 21 Jan 2025

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 16 Dec 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Jan 2025

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 26 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 21 Dec 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/stevieflyer/quokka

An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.

crawler parallel web-automation

Last synced: 14 Dec 2024

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 29 Jan 2025

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 21 Jan 2025

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 27 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 06 Dec 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Jan 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 29 Jan 2025

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 21 Jan 2025

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 08 Dec 2024

https://github.com/chen0040/ios-stock-tracker

Stock tracker implemented using Objective-C for iOS

crawler ios-app objective-c stock-prices

Last synced: 16 Dec 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Nov 2024

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 23 Jan 2025

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 18 Jan 2025

https://github.com/hamidrabedi/digikala-crawler

a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls

crawler digikala digikala-crawler django python scraper

Last synced: 14 Dec 2024

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 14 Dec 2024

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 27 Jan 2025

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 06 Dec 2024

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 27 Jan 2025

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 03 Jan 2025

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 17 Dec 2024

https://github.com/altescy/mincrawler

A minimal web crawler.

configurable crawler python scraping

Last synced: 26 Jan 2025

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 29 Nov 2024

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 29 Jan 2025

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 26 Jan 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 26 Jan 2025

https://github.com/fi1a/crawler

PHP crawler

crawler php

Last synced: 29 Jan 2025

https://github.com/victorhuu/amazonmovieintegration

本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据

crawler data-warehouse movies pandas scrapy xpath

Last synced: 26 Jan 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 06 Dec 2024

https://github.com/ammirsm/data-grabber-cnn-twitter

Basic setup to get data from twitter and CNN with a keyword.

cnn crawler django scrapyd twitter

Last synced: 09 Dec 2024

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 25 Jan 2025

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 24 Jan 2025

https://github.com/liebki/githubnet

This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.

crawler crawling github github-trending htmlagilitypack microsoft

Last synced: 24 Jan 2025

https://github.com/somnisomni/trawler-csharp

The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#

crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper

Last synced: 05 Jan 2025

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 22 Dec 2024

https://github.com/tsaohucn/crawler_fb_group

This is crawler use selenium for facebook groups

crawler facebook-groups rails ruby

Last synced: 20 Jan 2025

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 30 Jan 2025

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 26 Jan 2025

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 24 Jan 2025

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 19 Jan 2025

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 24 Jan 2025

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 09 Dec 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 26 Jan 2025

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 10 Jan 2025

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 26 Jan 2025

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 18 Jan 2025

https://github.com/hoishing/selenium-crawler

a web crawler written in python, powered by Selenium and Tesseract OCR

crawler python selenium

Last synced: 18 Jan 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 09 Dec 2024

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 12 Jan 2025

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 12 Jan 2025

https://github.com/aminehsan/crawler-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scarping

Last synced: 04 Dec 2024

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 09 Jan 2025

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 18 Jan 2025