Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/teamnsrg/mida
MIDA: A Tool for Measuring the Internet
https://github.com/teamnsrg/mida
chrome chromedp crawling devtools golang web
Last synced: 11 days ago
JSON representation
MIDA: A Tool for Measuring the Internet
- Host: GitHub
- URL: https://github.com/teamnsrg/mida
- Owner: teamnsrg
- License: mit
- Created: 2018-12-18T16:20:04.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-03-07T02:37:22.000Z (over 1 year ago)
- Last Synced: 2024-08-02T15:47:59.049Z (4 months ago)
- Topics: chrome, chromedp, crawling, devtools, golang, web
- Language: Go
- Size: 562 KB
- Stars: 18
- Watchers: 9
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MIDA: A Tool for Measuring the Web
[![Go](https://github.com/teamnsrg/mida/actions/workflows/go.yml/badge.svg)](https://github.com/teamnsrg/mida/actions/workflows/go.yml)
[![Go Report Card](https://goreportcard.com/badge/github.com/teamnsrg/mida)](https://goreportcard.com/report/github.com/teamnsrg/mida)MIDA is meant to be a general tool for web measurement projects. It is built in Go
on top of Chrome/Chromium and the DevTools protocol, giving it a realistic vantage point
to study the web and fine-grained access to information provided by Chrome Developer Tools.---
## Getting Started
Getting started with MIDA is easy! First, install:
```bash
$ wget files.mida.sprai.org/setup.py
$ sudo python3 setup.py
```Now we are ready to visit a site and collect some data:
```bash
$ mida go example.org
```You can find the results of your crawl in the `results/` directory.
## Easy At-Scale Crawling
One major benefit of MIDA is in being able to run large scale, highly configurable crawls
without needing to write your own crawler code. Here's an example of a single MIDA command which
will crawl the Alexa Top 100K and gather a few specific types of data:```bash
$ mida go -f https://files.mida.sprai.org/toplists/alexa.lst -n100000 -c8 --all-resources --screenshot --dom
```Breaking this down by argument:
`-f https://files.mida.sprai.org/toplists/alexa.lst`: This is a list of the Alexa Top Websites.
You can read from a local file or go get one hosted on the web somewhere`-n100000`: Read the top 100,000 entries from the list
`-c8`: Run with 8 parallel crawlers (browser instances)
`--all-resources`: Gather all of the actual files/resources required to render the web page.
Beware, this takes a lot of space!`--screenshot`: Capture a screenshot after/if the load event for each website fires.
`--dom`: Capture a JSON representation of the DOM for each website visited.