{"id":28466829,"url":"https://github.com/mlrun/metrics-gen","last_synced_at":"2026-01-20T16:50:29.160Z","repository":{"id":57441216,"uuid":"445205587","full_name":"mlrun/metrics-gen","owner":"mlrun","description":"dummy metrics generator","archived":false,"fork":false,"pushed_at":"2023-03-20T14:26:52.000Z","size":32,"stargazers_count":0,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"development","last_synced_at":"2025-06-07T07:06:06.297Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlrun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-06T14:50:14.000Z","updated_at":"2023-03-09T15:56:01.000Z","dependencies_parsed_at":"2022-09-02T08:30:35.420Z","dependency_job_id":null,"html_url":"https://github.com/mlrun/metrics-gen","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mlrun/metrics-gen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlrun%2Fmetrics-gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlrun%2Fmetrics-gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlrun%2Fmetrics-gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlrun%2Fmetrics-gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlrun","download_url":"https://codeload.github.com/mlrun/metrics-gen/tar.gz/refs/heads/development","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlrun%2Fmetrics-gen/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261649854,"owners_count":23189751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-07T07:06:06.587Z","updated_at":"2026-01-20T16:50:29.154Z","avatar_url":"https://github.com/mlrun.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# metrics-gen\n\ndummy metrics generator\n\n## Getting Started\nMetrics generator is built upon three main components:\n\n- **Deployment**: The indexes of the table, for example:\n  - symbol in stock market.\n  - (data_center, device_id) for devices in data centers\n- **Static Data**: Static data regarding the deployment, for example:\n  - model_number for a device\n  - score for a model\n- **Metrics**: Continuous metrics to generate about the deployment, for example:\n  - cpu_utilization of a device\n  - price of a stock\n\nThe first step in setting up the generator is creating a deployment.  Then using the deployment, you can generate static data or continuous stream of metrics.\n\n### Create a deployment from configuration\nTo create a deployment from configuration you need to provide a **yaml** file containing the following:\n\n```yaml\ndeployment:\n    \u003clevel_name\u003e:\n      faker: \u003cfaker_type\u003e\n      num_items: \u003cnum_items in the level\u003e\n```\n\nWhere `level_name` will be the name of the index, `faker_type` is the name of the [faker generator](https://github.com/joke2k/faker) and `num_items` is how many keys to create for this index.  \nEach provided level will create another `num_items` instances for each entry in it's previous levels.\n\n**Example**: Given the following configuration yaml file:\n\n```yaml\ndeployment:\n    device:\n      faker: msisdn\n      num_items: 2\n    core:\n      faker: msisdn\n      num_items: 2\n```\n\nand running the following command:\n```python\nfrom metrics_gen.deployment_generator import deployment_generator\n\ndep_gen = deployment_generator()\ndeployment = dep_gen.generate_deployment(configuration=configuration)\n```\n\nWill generate the following example deployment:\n\n| | device      |     core |\n|- |- |-|\n| 0 | 4120271911677 | 6950611701382 |\n| 1 | 4120271911677 | 2255426557707 |\n| 2 | 4120271911677 | 7717168891372 |\n| 3 | 2260158002886 | 3213635322383 |\n| 4 | 2260158002886 | 4007792940086 |\n| 5 | 2260158002886 | 3720953132595 |\n\n**Notice** that each extra level, multiplies the number of items created by `num_item`, thus we got 2 * 3 = 6 items created.\n\n### Create Static Data\nTo create a static data generator you need to supply a deployment dataframe and a configuration yaml.\n\nThe static data generator knows how to generator from two kinds of feature configurations: **range** and **choice** which should be specified in the yaml.\n\n```yaml\nstatic:\n    \u003cfeature_name\u003e:\n        kind: range\n        min_range: \u003cmin_feature_range\u003e, defaults to 0\n        max_range: \u003cmax_feature_range\u003e\n        as_integer: \u003cint or float\u003e, defaults to False\n    \u003cfeature_name\u003e:\n        kind: choice\n        choices: \u003clist of possible choices\u003e\n```\n\nEach provided feature will generate a new feature column in the generated dataframe.\n\nExample: Given the following yaml:\n\n```yaml\nstatic:\n    models: \n      kind: range\n      min_range: 10\n      max_range: 15\n      as_integer: True\n    country: \n      kind: choice\n      choices: [A, B, C, D, E, F, G]\n```\n\nAnd the previous deployment:\n\n```python\nfrom metrics_gen.static_data_generator import Static_data_generator\n\n\nstatic_data_generator = Static_data_generator(\n    deployment, static_configuration\n)\n\ngenerated_df = static_data_generator.generate_static_data()\n```\n\nWill generate the following dataframe:\n\n\n|  | device | core | models | country |\n|-- |------- |----- |------- |-----\n| 0 | 4120271911677 | 6950611701382  |    13   |    A |\n| 1 | 4120271911677 | 2255426557707  |    14   |    C |\n| 2 | 4120271911677 | 7717168891372  |    14   |    G |\n| 3 | 2260158002886 | 3213635322383  |    14   |    G |\n| 4 | 2260158002886 | 4007792940086  |    11   |    G |\n| 5 |  2260158002886 | 3720953132595  |    14   |    D |\n\n### Create Continuous Metrics\n\nTo create a continuous metrics stream you need to provide a deployment dataframe and metrics creation configuration yaml.\n\n```yaml\nerrors:\n    rate_in_ticks: \u003c ~ticks between errors\u003e\n    length_in_ticks: \u003c ~length of error mode\u003e\ntimestamps:\n    interval: \u003ctime between samples in seconds\u003e\n    stochastic_interval: \u003ccreate random intervals (around interval)\u003e\nmetrics:\n  \u003cmetric name\u003e:\n    accuracy: \u003cdecimals to produce\u003e\n    distribution: normal\n    distribution_params:\n        mu: \u003cmean\u003e\n        noise: \u003cnoise\u003e\n        sigma: \u003cstd\u003e\n    is_threshold_below: \u003cTrue to produce max when in error mode, False for min\u003e\n    past_based_value: \u003cTrue to add the latest metric to the last result (like in daily stock market), False to replace normally)\n    produce_max: \u003cTrue for candles-like presentation\u003e\n    produce_min: \u003cTrue for candles-like presentation\u003e\n    validation:\n        distribution: # per-sample validation\n            max: \u003cmax value for individual sample\u003e\n            min: \u003cmin value for individual sample\u003e\n            validate: \u003cTrue to activate validation\u003e\n      metric: # metric level validations\n        max: \u003cmax value for overall-metric sample (only applicable to past-based-values)\u003e\n        min: \u003cmin value for overall-metric sample (only applicable to past-based-values)\u003e\n        validate: \u003cTrue to activate validation\u003e\n```\n\nEach configured feature will generate additional metric for your deployment.\n\nExample: Given the following yaml\n\n```yaml\nerrors: {length_in_ticks: 10, rate_in_ticks: 5}\ntimestamps: {interval: 5s, stochastic_interval: true}\nmetrics:\n  cpu_utilization:\n    accuracy: 2\n    distribution: normal\n    distribution_params: {mu: 70, noise: 0, sigma: 10}\n    is_threshold_below: true\n    past_based_value: false\n    produce_max: false\n    produce_min: false\n    validation:\n      distribution: {max: 1, min: -1, validate: false}\n      metric: {max: 100, min: 0, validate: true}\n  throughput:\n    accuracy: 2\n    distribution: normal\n    distribution_params: {mu: 250, noise: 0, sigma: 20}\n    is_threshold_below: false\n    past_based_value: false\n    produce_max: false\n    produce_min: false\n    validation:\n      distribution: {max: 1, min: -1, validate: false}\n      metric: {max: 300, min: 0, validate: true}\n```\n\nAnd the previous deployment:\n\n```python\nfrom metrics_gen.metrics_generator import Generator_df\n\nmetrics_generator = Generator_df(metrics_configuration, user_hierarchy=deployment)\ngenerator = metrics_generator.generate(as_df=True)\n\ndf = next(generator)\n```\n\nWill generate the following dataframe:\n\n| timestamp                  \t| core          \t| device        \t| cpu_utilization    \t| cpu_utilization_is_error \t| throughput         \t| throughput_is_error \t| is_error \t|\n|----------------------------\t|---------------\t|---------------\t|--------------------\t|--------------------------\t|--------------------\t|---------------------\t|----------\t|\n| 2022-01-31 19:20:21.007087 \t| 2113309831673 \t| 4469221325973 \t| 100.0              \t| True                     \t| 0.0                \t| True                \t| True     \t|\n| 2022-01-31 19:20:21.007087 \t| 2115933686087 \t| 4469221325973 \t| 100.0              \t| True                     \t| 235.0679405785135  \t| False               \t| False    \t|\n| 2022-01-31 19:20:21.007087 \t| 0175482390171 \t| 4469221325973 \t| 70.26657388732976  \t| False                    \t| 208.34378630077305 \t| False               \t| False    \t|\n| 2022-01-31 19:20:21.007087 \t| 1626403145660 \t| 4038890878426 \t| 59.932750968399404 \t| False                    \t| 217.4335871243806  \t| False               \t| False    \t|\n| 2022-01-31 19:20:21.007087 \t| 7247058922310 \t| 4038890878426 \t| 83.98361382584898  \t| False                    \t| 265.3476318369042  \t| False               \t| False    \t|\n| 2022-01-31 19:20:21.007087 \t| 7030239128061 \t| 4038890878426 \t| 100.0              \t| False                    \t| 225.16604191632058 \t| False               \t| False    \t|\n\nTo generate new samples all we need to do is call `next(generator)` and a new sample will be generated.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlrun%2Fmetrics-gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlrun%2Fmetrics-gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlrun%2Fmetrics-gen/lists"}