Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vkuznet/sitestat

CMS site statistics
https://github.com/vkuznet/sitestat

Last synced: about 1 month ago
JSON representation

CMS site statistics

Awesome Lists containing this project

README

        

# sitestat

[![Build Status](https://travis-ci.org/vkuznet/sitestat.svg?branch=master)](https://travis-ci.org/vkuznet/sitestat)
[![GoDoc](https://godoc.org/github.com/vkuznet/sitestat?status.svg)](https://godoc.org/github.com/vkuznet/sitestat)

### sitestat tool
sitestat tool designed to catch statistics from various CMS sites.
The underlying process follow these steps:

- Fetch all site names from SiteDB
- loop over specific time range, e.g. last 3m
- create dates for that range
- Use popularity API (DSStatInTImeWindow)
to get summary statistics. The API returns various information about dataset
usage on sites.
- Organize data in number of access bins
- For every bin collect dataset names
- Call DBS APIs to get dataset statistics via blocksummaries API.
- sum up info about file_size which will give total size used by specific site.

Here is example of sitestat tool usage

```
Usage of ./sitestat:
-bins string
Comma separated list of bin values, e.g. 0,1,2,3,4 for naccesses or 0,10,100 for tot cpu metrics
-blkinfo
Use block information for finding statistics, by default use dataset info
-breakdown string
Breakdown report into more details (tier, dataset)
-chunkSize int
chunkSize for processing URLs (default 100)
-dbsinfo
Use DBS to collect dataset information, default use PhEDEx
-format string
Output format type, txt or json (default "txt")
-metric string
Popularity DB metric (NACC, TOTCPU, NUSERS) (default "NACC")
-pbrdb string
Name of PBR db (see PhedexReplicaMonitoring project)
-phgroup string
Phedex group name (default "AnalysisOps")
-profile
profile code
-site string
CMS site name, use T1, T2, T3 to specify all Tier sites
-tier string
Look-up specific data-tier
-trange string
Specify time interval in YYYYMMDD format, e.g 20150101-20150201 or use short notations 1d, 1m, 1y for one day, month, year, respectively (default "1d")
-verbose int
Verbose level, support 0,1,2
```

### Examples
In all examples below we use T2_XX_Abc as a site name.

```
# list site statistics for last month
sitestat -site T2_XX_Abc -trange 1m

# list site statistics for specific time range
sitestat -site T2_XX_Abc -trange 20150201-20150205

# list site statistics for last 3 months
sitestat -site T2_XX_Abc -trange 3m

# list site statistics for last month and only count AOD data-tier
sitestat -site T2_XX_Abc -trange 1m -tier AOD

# list site statistics for last month with breakdown for all data-tiers
sitestat -site T2_XX_Abc -trange 1m -breakdown tier

# list site statistics for last month with breakdown for all datasets
sitestat -site T2_XX_Abc -trange 1m -breakdown dataset

# list site statistics for last month with breakdown for all data-tiers and look for NUSERS metric
sitestat -site T2_XX_Abc -trange 1m -metric NUSERS -breakdown tier

# by default sitestat relies on PhEDEx data-service to collect
# dataset information on site, but we may use DBS instead
sitestat -site T2_XX_Abc -trange 1m -dbsinfo

# return information in json data format
sitestat -site T2_XX_Abc -trange 1m -format json
```

### Tools
The tools directory contains useful scripts to use
[PhedexReplicaMonitoring](https://github.com/vkuznet/PhedexReplicaMonitoring)
which allows to obtained weighted datasets size on sites from PhEDEx DB by
running pbr script from PhedexReplicaMonitoring repository.

- pbr_avg.sh script can be used to submit Spark job to calculate average
size of datasets
- pbr_db.py script can be used to convert HDFS output from pbr_avg.sh
and convert it into SQLiteDB. The later can be used by sitestat tool
- plot.R an R script to produce size vs bins (#accesses) plot.