Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 11 Oct 2024

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 15 Oct 2024

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 20 Oct 2024

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 15 Oct 2024

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 15 Oct 2024

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 15 Oct 2024

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 20 Oct 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 01 Nov 2024

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 01 Nov 2024

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 30 Oct 2024

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 01 Nov 2024

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 07 Nov 2024

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 07 Nov 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 07 Nov 2024

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 24 Oct 2024

https://github.com/fa7ad/aiub-notes-dl

Download all notes from AIUB's portal

aiub beautifulsoup4 crawler

Last synced: 24 Oct 2024

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 28 Oct 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 29 Oct 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 28 Oct 2024

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 17 Oct 2024

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 28 Oct 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 19 Oct 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 13 Oct 2024

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 14 Oct 2024

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 27 Oct 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 04 Nov 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 14 Oct 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 12 Oct 2024

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 16 Oct 2024

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 07 Nov 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Oct 2024

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 11 Oct 2024

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 18 Oct 2024

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 05 Nov 2024

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 16 Oct 2024

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 05 Nov 2024

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 14 Oct 2024

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 06 Nov 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 06 Nov 2024

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 06 Nov 2024

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 07 Nov 2024

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 05 Nov 2024

https://github.com/christopher-besch/therapy_search

Compute call times from arztsuche-bw into a calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 07 Nov 2024

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 06 Nov 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 06 Nov 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 07 Nov 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 06 Nov 2024

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 06 Nov 2024

https://github.com/sinkaroid/webnovelcrawler

Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.

crawler dompdf webnovel

Last synced: 05 Nov 2024

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 08 Nov 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 12 Oct 2024

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 07 Nov 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 12 Oct 2024

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 21 Oct 2024

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 23 Oct 2024

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 14 Oct 2024

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 13 Oct 2024

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 25 Oct 2024

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 06 Nov 2024

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 04 Nov 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 12 Oct 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 04 Nov 2024

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 24 Oct 2024

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 13 Oct 2024

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 31 Oct 2024

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 15 Oct 2024

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 06 Nov 2024

https://github.com/krishpranav/gozap

⚡️ Multiple target ZAP Scanning made in go

cli crawler go go-crawler golang zap

Last synced: 15 Oct 2024

https://github.com/mirusu400/berryz-dl

Batch download berryz webshare files recursively!

berryz berryz-webshare crawler downloader scraper

Last synced: 06 Nov 2024

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 11 Oct 2024

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 13 Oct 2024

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 06 Nov 2024

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 06 Nov 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 03 Nov 2024

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 24 Oct 2024

https://github.com/iarsham/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 24 Oct 2024

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 06 Nov 2024

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 19 Oct 2024

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 07 Nov 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 07 Nov 2024

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 05 Nov 2024

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 03 Nov 2024