Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vkuznet/sitestat
CMS site statistics
https://github.com/vkuznet/sitestat
Last synced: about 1 month ago
JSON representation
CMS site statistics
- Host: GitHub
- URL: https://github.com/vkuznet/sitestat
- Owner: vkuznet
- Created: 2016-02-10T21:30:08.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-11-29T12:53:30.000Z (about 7 years ago)
- Last Synced: 2024-10-30T06:27:34.670Z (3 months ago)
- Language: Go
- Size: 148 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# sitestat
[![Build Status](https://travis-ci.org/vkuznet/sitestat.svg?branch=master)](https://travis-ci.org/vkuznet/sitestat)
[![GoDoc](https://godoc.org/github.com/vkuznet/sitestat?status.svg)](https://godoc.org/github.com/vkuznet/sitestat)### sitestat tool
sitestat tool designed to catch statistics from various CMS sites.
The underlying process follow these steps:- Fetch all site names from SiteDB
- loop over specific time range, e.g. last 3m
- create dates for that range
- Use popularity API (DSStatInTImeWindow)
to get summary statistics. The API returns various information about dataset
usage on sites.
- Organize data in number of access bins
- For every bin collect dataset names
- Call DBS APIs to get dataset statistics via blocksummaries API.
- sum up info about file_size which will give total size used by specific site.Here is example of sitestat tool usage
```
Usage of ./sitestat:
-bins string
Comma separated list of bin values, e.g. 0,1,2,3,4 for naccesses or 0,10,100 for tot cpu metrics
-blkinfo
Use block information for finding statistics, by default use dataset info
-breakdown string
Breakdown report into more details (tier, dataset)
-chunkSize int
chunkSize for processing URLs (default 100)
-dbsinfo
Use DBS to collect dataset information, default use PhEDEx
-format string
Output format type, txt or json (default "txt")
-metric string
Popularity DB metric (NACC, TOTCPU, NUSERS) (default "NACC")
-pbrdb string
Name of PBR db (see PhedexReplicaMonitoring project)
-phgroup string
Phedex group name (default "AnalysisOps")
-profile
profile code
-site string
CMS site name, use T1, T2, T3 to specify all Tier sites
-tier string
Look-up specific data-tier
-trange string
Specify time interval in YYYYMMDD format, e.g 20150101-20150201 or use short notations 1d, 1m, 1y for one day, month, year, respectively (default "1d")
-verbose int
Verbose level, support 0,1,2
```### Examples
In all examples below we use T2_XX_Abc as a site name.```
# list site statistics for last month
sitestat -site T2_XX_Abc -trange 1m# list site statistics for specific time range
sitestat -site T2_XX_Abc -trange 20150201-20150205# list site statistics for last 3 months
sitestat -site T2_XX_Abc -trange 3m# list site statistics for last month and only count AOD data-tier
sitestat -site T2_XX_Abc -trange 1m -tier AOD# list site statistics for last month with breakdown for all data-tiers
sitestat -site T2_XX_Abc -trange 1m -breakdown tier# list site statistics for last month with breakdown for all datasets
sitestat -site T2_XX_Abc -trange 1m -breakdown dataset# list site statistics for last month with breakdown for all data-tiers and look for NUSERS metric
sitestat -site T2_XX_Abc -trange 1m -metric NUSERS -breakdown tier# by default sitestat relies on PhEDEx data-service to collect
# dataset information on site, but we may use DBS instead
sitestat -site T2_XX_Abc -trange 1m -dbsinfo# return information in json data format
sitestat -site T2_XX_Abc -trange 1m -format json
```### Tools
The tools directory contains useful scripts to use
[PhedexReplicaMonitoring](https://github.com/vkuznet/PhedexReplicaMonitoring)
which allows to obtained weighted datasets size on sites from PhEDEx DB by
running pbr script from PhedexReplicaMonitoring repository.- pbr_avg.sh script can be used to submit Spark job to calculate average
size of datasets
- pbr_db.py script can be used to convert HDFS output from pbr_avg.sh
and convert it into SQLiteDB. The later can be used by sitestat tool
- plot.R an R script to produce size vs bins (#accesses) plot.