https://github.com/mlrun/metrics-gen

dummy metrics generator
https://github.com/mlrun/metrics-gen
Last synced: 5 months ago
JSON representation
dummy metrics generator
Host: GitHub
URL: https://github.com/mlrun/metrics-gen
Owner: mlrun
License: apache-2.0
Created: 2022-01-06T14:50:14.000Z (over 4 years ago)
Default Branch: development
Last Pushed: 2023-03-20T14:26:52.000Z (over 3 years ago)
Last Synced: 2025-06-07T07:06:06.297Z (about 1 year ago)
Language: Python
Size: 31.3 KB
Stars: 0
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # metrics-gen

dummy metrics generator

## Getting Started

Metrics generator is built upon three main components:

- **Deployment**: The indexes of the table, for example:

  - symbol in stock market.

  - (data_center, device_id) for devices in data centers

- **Static Data**: Static data regarding the deployment, for example:

  - model_number for a device

  - score for a model

- **Metrics**: Continuous metrics to generate about the deployment, for example:

  - cpu_utilization of a device

  - price of a stock

The first step in setting up the generator is creating a deployment.  Then using the deployment, you can generate static data or continuous stream of metrics.

### Create a deployment from configuration

To create a deployment from configuration you need to provide a **yaml** file containing the following:

```yaml

deployment:

    :

      faker: 

      num_items: 

```

Where `level_name` will be the name of the index, `faker_type` is the name of the [faker generator](https://github.com/joke2k/faker) and `num_items` is how many keys to create for this index.  

Each provided level will create another `num_items` instances for each entry in it's previous levels.

**Example**: Given the following configuration yaml file:

```yaml

deployment:

    device:

      faker: msisdn

      num_items: 2

    core:

      faker: msisdn

      num_items: 2

```

and running the following command:

```python

from metrics_gen.deployment_generator import deployment_generator

dep_gen = deployment_generator()

deployment = dep_gen.generate_deployment(configuration=configuration)

```

Will generate the following example deployment:

| | device      |     core |

|- |- |-|

| 0 | 4120271911677 | 6950611701382 |

| 1 | 4120271911677 | 2255426557707 |

| 2 | 4120271911677 | 7717168891372 |

| 3 | 2260158002886 | 3213635322383 |

| 4 | 2260158002886 | 4007792940086 |

| 5 | 2260158002886 | 3720953132595 |

**Notice** that each extra level, multiplies the number of items created by `num_item`, thus we got 2 * 3 = 6 items created.

### Create Static Data

To create a static data generator you need to supply a deployment dataframe and a configuration yaml.

The static data generator knows how to generator from two kinds of feature configurations: **range** and **choice** which should be specified in the yaml.

```yaml

static:

    :

        kind: range

        min_range: , defaults to 0

        max_range: 

        as_integer: , defaults to False

    :

        kind: choice

        choices: 

```

Each provided feature will generate a new feature column in the generated dataframe.

Example: Given the following yaml:

```yaml

static:

    models: 

      kind: range

      min_range: 10

      max_range: 15

      as_integer: True

    country: 

      kind: choice

      choices: [A, B, C, D, E, F, G]

```

And the previous deployment:

```python

from metrics_gen.static_data_generator import Static_data_generator

static_data_generator = Static_data_generator(

    deployment, static_configuration

)

generated_df = static_data_generator.generate_static_data()

```

Will generate the following dataframe:

|  | device | core | models | country |

|-- |------- |----- |------- |-----

| 0 | 4120271911677 | 6950611701382  |    13   |    A |

| 1 | 4120271911677 | 2255426557707  |    14   |    C |

| 2 | 4120271911677 | 7717168891372  |    14   |    G |

| 3 | 2260158002886 | 3213635322383  |    14   |    G |

| 4 | 2260158002886 | 4007792940086  |    11   |    G |

| 5 |  2260158002886 | 3720953132595  |    14   |    D |

### Create Continuous Metrics

To create a continuous metrics stream you need to provide a deployment dataframe and metrics creation configuration yaml.

```yaml

errors:

    rate_in_ticks: < ~ticks between errors>

    length_in_ticks: < ~length of error mode>

timestamps:

    interval: 

    stochastic_interval: 

metrics:

  :

    accuracy: 

    distribution: normal

    distribution_params:

        mu: 

        noise: 

        sigma: 

    is_threshold_below: 

    past_based_value: 

    produce_min: 

    validation:

        distribution: # per-sample validation

            max: 

            min: 

            validate: 

      metric: # metric level validations

        max: 

        min: 

        validate: 

```

Each configured feature will generate additional metric for your deployment.

Example: Given the following yaml

```yaml

errors: {length_in_ticks: 10, rate_in_ticks: 5}

timestamps: {interval: 5s, stochastic_interval: true}

metrics:

  cpu_utilization:

    accuracy: 2

    distribution: normal

    distribution_params: {mu: 70, noise: 0, sigma: 10}

    is_threshold_below: true

    past_based_value: false

    produce_max: false

    produce_min: false

    validation:

      distribution: {max: 1, min: -1, validate: false}

      metric: {max: 100, min: 0, validate: true}

  throughput:

    accuracy: 2

    distribution: normal

    distribution_params: {mu: 250, noise: 0, sigma: 20}

    is_threshold_below: false

    past_based_value: false

    produce_max: false

    produce_min: false

    validation:

      distribution: {max: 1, min: -1, validate: false}

      metric: {max: 300, min: 0, validate: true}

```

And the previous deployment:

```python

from metrics_gen.metrics_generator import Generator_df

metrics_generator = Generator_df(metrics_configuration, user_hierarchy=deployment)

generator = metrics_generator.generate(as_df=True)

df = next(generator)

```

Will generate the following dataframe:

| timestamp                  	| core          	| device        	| cpu_utilization    	| cpu_utilization_is_error 	| throughput         	| throughput_is_error 	| is_error 	|

|----------------------------	|---------------	|---------------	|--------------------	|--------------------------	|--------------------	|---------------------	|----------	|

| 2022-01-31 19:20:21.007087 	| 2113309831673 	| 4469221325973 	| 100.0              	| True                     	| 0.0                	| True                	| True     	|

| 2022-01-31 19:20:21.007087 	| 2115933686087 	| 4469221325973 	| 100.0              	| True                     	| 235.0679405785135  	| False               	| False    	|

| 2022-01-31 19:20:21.007087 	| 0175482390171 	| 4469221325973 	| 70.26657388732976  	| False                    	| 208.34378630077305 	| False               	| False    	|

| 2022-01-31 19:20:21.007087 	| 1626403145660 	| 4038890878426 	| 59.932750968399404 	| False                    	| 217.4335871243806  	| False               	| False    	|

| 2022-01-31 19:20:21.007087 	| 7247058922310 	| 4038890878426 	| 83.98361382584898  	| False                    	| 265.3476318369042  	| False               	| False    	|

| 2022-01-31 19:20:21.007087 	| 7030239128061 	| 4038890878426 	| 100.0              	| False                    	| 225.16604191632058 	| False               	| False    	|

To generate new samples all we need to do is call `next(generator)` and a new sample will be generated.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mlrun/metrics-gen

Awesome Lists containing this project

README