Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 22 Oct 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 22 Oct 2024

https://github.com/e73b025/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 11 Oct 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 11 Oct 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 28 Oct 2024

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 17 Oct 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 04 Nov 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 29 Oct 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 12 Oct 2024

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 11 Oct 2024

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 28 Oct 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 19 Oct 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 13 Oct 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 11 Oct 2024

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 17 Oct 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 04 Nov 2024

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 07 Nov 2024

https://github.com/reycn/china-drug-trials-crawler

A web crawler for Chinadrugtrials.org.cn, written in Python 3.6+.

china crawler drug python scraper

Last synced: 11 Oct 2024

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 14 Oct 2024

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 05 Nov 2024

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 11 Oct 2024

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 27 Oct 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 14 Oct 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 12 Oct 2024

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 16 Oct 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 05 Nov 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Oct 2024

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 06 Nov 2024

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 06 Nov 2024

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 06 Nov 2024

https://github.com/christopher-besch/therapy_search

Compute call times from arztsuche-bw into a calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 07 Nov 2024

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 11 Oct 2024

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 16 Oct 2024

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 06 Nov 2024

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 18 Oct 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 06 Nov 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 06 Nov 2024

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 06 Nov 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 12 Oct 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 12 Oct 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 13 Oct 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 03 Nov 2024

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 01 Oct 2024

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 19 Oct 2024

https://github.com/datamine/twitter-name-and-shame

Crawler to find Twitter accounts following more than a million users

crawler flask python python-2 twitter

Last synced: 12 Oct 2024

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 19 Oct 2024

https://github.com/devindon/movie-crawler

Movie crawler for douban.com, pianku.tv, etc.

crawler nodejs typescript

Last synced: 16 Oct 2024

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 11 Oct 2024

https://github.com/sahaavi/web-scraping

Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!

beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping

Last synced: 07 Nov 2024

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 19 Oct 2024

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 05 Nov 2024

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 23 Oct 2024

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 15 Oct 2024

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 15 Oct 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 03 Nov 2024

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 15 Oct 2024

https://github.com/tigercosmos/web-crawler

Web Crawler in Java Maven Project

crawler

Last synced: 15 Oct 2024

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 05 Nov 2024

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 15 Oct 2024

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 11 Oct 2024

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 11 Oct 2024

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 11 Oct 2024

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 15 Oct 2024

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 06 Nov 2024

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 31 Oct 2024

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 15 Oct 2024

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 04 Nov 2024

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 07 Nov 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 04 Nov 2024

https://github.com/apexcaptain/allergy-alert

오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.

crawler docker expressjs puppeteer reactjs sqlite typescript

Last synced: 14 Oct 2024

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 13 Oct 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 11 Oct 2024

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 05 Nov 2024

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 05 Nov 2024

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 27 Oct 2024

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 12 Oct 2024

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 07 Oct 2024

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 27 Oct 2024

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 11 Oct 2024

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 25 Oct 2024

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 15 Oct 2024

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 13 Oct 2024

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 13 Oct 2024

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 05 Nov 2024