https://github.com/gitbookio/micro-analytics
A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.
https://github.com/gitbookio/micro-analytics
Last synced: 9 months ago
JSON representation
A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.
- Host: GitHub
- URL: https://github.com/gitbookio/micro-analytics
- Owner: GitbookIO
- Created: 2015-11-18T20:04:58.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-10-28T12:06:30.000Z (about 9 years ago)
- Last Synced: 2025-04-04T05:04:57.289Z (9 months ago)
- Language: Go
- Homepage:
- Size: 6.98 MB
- Stars: 76
- Watchers: 9
- Forks: 18
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# µAnalytics
A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.

## Principle
Analytics databases tend to grow fast and exponentially.
Requesting data for one specific website from a single database thus become very slow over time.
But analytics data are highly decoupled between two websites.
The idea behind **µAnalytics** is to shard your analytics data on a key, which is usually a website name.
Each shard thus only contains a specific website data, allowing faster response times and easy horizontal scaling.
To handle requests even faster, **µAnalytics** automatically manages a pool of connections to multiple shards at a time.
By default, the service keeps 10 connections alive.
But you can easily increase/decrease the max number of alive shards with the `--connections` flag when launching the app.
## Install
```Shell
$ go install github.com/GitbookIO/micro-analytics
```
## Service launching
To launch the application, simply run:
```
$ ./micro-analytics
```
The command takes the following optional parameters:
Parameter | Environment Variable | Usage | Type | Default Value
---- | ---- | ---- | ---- | ----
`--user, -u` | `MA_USER` | Username for basic auth | String | `""`
`--password, -w` | `MA_PASSWORD` | Password for basic auth | String | `""`
`--port, -p` | `MA_PORT` | Port to listen on | String | `"7070"`
`--root, -r` | `MA_ROOT` | Database directory | String | `"./dbs"`
`--connections, -c` | `MA_POOL_SIZE` | Max number of alive shards connections | Number | `1000`
`--idle-timeout, -i` | `MA_POOL_TIMEOUT` | Idle timeout for DB connections in seconds | Number | `60`
`--cache-directory, -d` | `MA_CACHE_DIR` | Cache directory | String | `".diskache"`
If `--user` is provided, the service will automatically use [basic access authentication](https://en.wikipedia.org/wiki/Basic_access_authentication) on all requests.
The actual cache directory will be a subdirectory named after the app major version. The default will then be `./.diskache/0`.
## Analytics schema
All shards of the **µAnalytics** database share the same TABLE schema:
```SQL
CREATE TABLE visits (
time INTEGER,
event TEXT,
path TEXT,
ip TEXT,
platform TEXT,
refererDomain TEXT,
countryCode TEXT
)
```
## Service requests
### GET requests
##### Common Parameters
Every query for a specific website can be executed using a time range.
Every following GET request thus takes the two following optional query string parameters:
Name | Type | Description | Default | Example
---- | ---- | ---- | ---- | ----
`start` | Date | Start date to query a range | none | `"2015-11-20T12:00:00.000Z"`
`end` | Date | End date to query a range | none | `"2015-11-21T12:00:00.000Z"`
The dates can be passed either as:
- ISO (RFC3339) `"2015-11-20T12:00:00.000Z"`
- UTC (RFC1123) `"Fri, 20 Nov 2015 12:00:00 GMT"`
- A Unix timestamp as a String `"1448020800"`
##### Common Aggregation Parameters
Name | Type | Description | Default | Example
---- | ---- | ---- | ---- | ----
`unique` | Boolean | Include the total number of unique visitors in response | none | `true`
##### Common Aggregation Response Values
Except for `GET /:website`, every response to a GET request will contain the two following values:
Name | Type | Description
---- | ---- | ----
`total` | Integer | Total number of visits
`unique` | Integer | Total number of unique visitors based on `ip`, set to `0` unless `unique=true` is passed as a query string parameter
#### GET `/:website`
Returns the full analytics for a website.
##### Response
```JavaScript
{
"list": [
{
"time": "2015-11-25T16:00:00+01:00",
"event": "download",
"path": "/somewhere",
"ip": "127.0.0.1",
"platform": "Windows",
"refererDomain": "gitbook.com",
"countryCode": "fr"
},
...
]
}
```
#### GET `/:website/count`
Returns the count of analytics for a website. The `unique` query string parameter is not necessary for this request.
##### Response
```JavaScript
{
"total": 1000,
"unique": 900
}
```
#### GET `/:website/countries`
Returns the number of visits per `countryCode`.
##### Response
`label` contains the country full name.
```JavaScript
{
"list": [
{
"id": "fr",
"label": "France",
"total": 1000,
"unique": 900
},
...
]
}
```
#### GET `/:website/platforms`
Returns the number of visits per `platform`.
##### Response
```JavaScript
{
"list": [
{
"id": "Linux",
"label": "Linux",
"total": 1000,
"unique": 900
},
...
]
}
```
#### GET `/:website/domains`
Returns the number of visits per `refererDomain`.
##### Response
```JavaScript
{
"list": [
{
"id": "gitbook.com",
"label": "gitbook.com",
"total": 1000,
"unique": 900
},
...
]
}
```
#### GET `/:website/events`
Returns the number of visits per `event`.
##### Response
```JavaScript
{
"list": [
{
"id": "download",
"label": "download",
"total": 1000,
"unique": 900
},
...
]
}
```
#### GET `/:website/time`
Returns the number of visits as a time serie. The interval in seconds can be specified as an optional query string parameter. Its default value is `86400`, equivalent to one day.
##### Parameters
Name | Type | Description | Default | Example
---- | ---- | ---- | ---- | ----
`interval` | Integer | Interval of the time serie | `86400` (1 day) | `3600`
##### Response
Example with interval set to `3600`:
```JavaScript
{
"list": [
{
"start": "2015-11-24T12:00:00.000Z",
"end": "2015-11-24T13:00:00.000Z",
"total": 450,
"unique": 390
},
{
"start": "2015-11-24T13:00:00.000Z",
"end": "2015-11-24T14:00:00.000Z",
"total": 550,
"unique": 510
},
...
]
}
```
### POST requests
#### POST `/:website`
Insert new data for the specified website.
##### POST Body
```JavaScript
{
"time": "2015-11-24T13:00:00.000Z", // optional
"event": "download",
"ip": "127.0.0.1",
"path": "/README.md",
"headers": {
// ...
// HTTP headers received from your visitor
}
}
```
The `time` parameter is optional and is set to the date of your POST request by default.
Passing the HTTP headers in the POST body allows the service to extract the `refererDomain` and `platform` values.
The `countryCode` will be deduced from the passed `ip` parameter using [Maxmind's GeoLite2 database](http://dev.maxmind.com/geoip/geoip2/geolite2/).
#### POST `/:website/bulk`
Insert a list of analytics for a specific website. The analytics can be sent directly in DB format, with `time` being a String value.
`time` can be passed as either:
- ISO (RFC3339) `"2015-11-20T12:00:00.000Z"`
- UTC (RFC1123) `"Fri, 20 Nov 2015 12:00:00 GMT"`
- A Unix timestamp as a String `"1448020800"`
If the `time` parameter is not provided, it will be defaulted to the exact time of the server processing the `POST` request.
As for the `POST /:website` method, the analytics can also have an optional `headers` parameter.
If the `refererDomain` and/or `platform` values are not passed in the JSON body, the `headers` parameter will be used to set these values automatically.
##### POST Body
```JavaScript
{
"list": [
{
"time": "1450098642",
"ip": "127.0.0.1",
"event": "download",
"path": "/somewhere",
"platform": "Apple Mac",
"refererDomain": "www.gitbook.com",
"countryCode": "fr"
},
{
"time": "2015-11-20T12:00:00.000Z",
"ip": "127.0.0.1",
"event": "login",
"path": "/someplace",
"headers": {
// ...
// HTTP headers received from your visitor
}
}
]
}
```
The `countryCode` will be reprocessed by the service using GeoLite2 based on the `ip`.
#### POST `/bulk`
Insert a list of analytics for different websites. The analytics have the same format as `POST /:website/bulk`, with a mandatory `website` parameter.
##### POST Body
```JavaScript
{
"list": [
{
"website": "website-1",
"time": "1450098642",
"ip": "127.0.0.1",
"event": "download",
"path": "/somewhere",
"platform": "Apple Mac",
"refererDomain": "www.gitbook.com",
"countryCode": "fr"
},
{
"website": "website-2",
"time": "2015-11-20T12:00:00.000Z",
"ip": "127.0.0.1",
"event": "login",
"path": "/someplace",
"headers": {
// ...
}
}
]
}
```
### DELETE requests
#### DELETE `/:website`
Fully delete a shard from the file system.