https://github.com/m-lab/stats-pipeline

Contains code that processes M-Lab data and provides it in various formats for other use.
https://github.com/m-lab/stats-pipeline

Last synced: about 1 month ago
JSON representation

Contains code that processes M-Lab data and provides it in various formats for other use.

Host: GitHub
URL: https://github.com/m-lab/stats-pipeline
Owner: m-lab
License: apache-2.0
Created: 2020-06-23T19:02:39.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-06-08T17:02:44.000Z (12 months ago)
Last Synced: 2024-10-29T14:53:30.617Z (7 months ago)
Language: Go
Size: 479 KB
Stars: 14
Watchers: 15
Forks: 6
Open Issues: 26
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![Version](https://img.shields.io/github/tag/m-lab/stats-pipeline.svg)](https://github.com/m-lab/stats-pipeline/releases) [![Build Status](https://travis-ci.com/m-lab/stats-pipeline.svg?branch=master)](https://travis-ci.com/m-lab/stats-pipeline) [![Coverage Status](https://coveralls.io/repos/github/m-lab/stats-pipeline/badge.svg?branch=master)](https://coveralls.io/github/m-lab/stats-pipeline?branch=master) [![GoDoc](https://godoc.org/github.com/m-lab/stats-pipeline?status.svg)](https://godoc.org/github.com/m-lab/stats-pipeline) [![Go Report Card](https://goreportcard.com/badge/github.com/m-lab/stats-pipeline)](https://goreportcard.com/report/github.com/m-lab/stats-pipeline)

# Statistics Pipeline Service

This repository contains code that processes NDT data and provides aggregate

metrics by day for standard global, and some national geographies. The resulting

aggregations are made available in JSON format, for use by other applications.

The `stats-pipeline` service is written in Go, runs on GKE, and generates and

updates daily aggregate statistics. Access is provided in public BigQuery tables

and in per-year JSON formatted files hosted on GCS.

## Documentation Provided for the Statistics Pipeline Service

* (This document) Overview of the `stats-pipeline` service, fields provided

  (schema), output formats, available geographies, and API URL structure.

* [What Statistics are Provided by stats-pipeline, and How are They Calculated?][stats-overview]

* [Geographic Precision in stats-pipeline][geo-precision]

* [Statistics Output Format, Schema, and Field Descriptions][format-schema]

* [Statistics API URL Structure, Available Geographies & Aggregations][api-structure]

[stats-overview]: docs/stats-overview.md

[geo-precision]: docs/geo-precision.md

[format-schema]: docs/format-schema.md

[api-structure]: docs/api-structure.md

## General Recommendations for All Aggregations of NDT data

In general, [our recommendations][recommendations] for research aggregating NDT data are:

* Don't oversimplify

* Aggregate by ASN in addition to time/date and location

* Be aware of, and illustrate multimodal distributions

* Use histogram and logarithmic scales

* Take into account, and compensate for, client bias and population drift

[recommendations]: upcoming-blog-post

## Roadmap

Below we list additional features, methods, geographies, etc. which may be

considered for future versioned releases of `stats-pipeline`.

### Geographies

* US Zip Codes, US Congressional Districts, Block Groups, Blocks

### Output Formats

* histogram_daily_stats.csv - Same data as the JSON, but in CSV. Useful for importing into a spreadsheet.

* histogram_daily_stats.sql - A SQL query which returns the same rows in the corresponding .json and .csv. Useful for verifying the exported data against the source and to tweak the query as needed by different use cases.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/m-lab/stats-pipeline

Awesome Lists containing this project

README