{"id":23478431,"url":"https://github.com/opennms/elasticsearch-drift-plugin","last_synced_at":"2025-04-14T21:31:35.710Z","repository":{"id":30268488,"uuid":"116873261","full_name":"OpenNMS/elasticsearch-drift-plugin","owner":"OpenNMS","description":"Elasticearch plugin that helps generate time series data from flow data","archived":false,"fork":false,"pushed_at":"2024-03-27T21:59:40.000Z","size":203,"stargazers_count":3,"open_issues_count":9,"forks_count":2,"subscribers_count":22,"default_branch":"master","last_synced_at":"2024-04-15T15:36:09.283Z","etag":null,"topics":["elasticsearch","flows","hacktoberfest","ipfix","netflow","opennms"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenNMS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-01-09T21:34:43.000Z","updated_at":"2023-10-09T13:36:36.000Z","dependencies_parsed_at":"2023-02-14T09:16:45.590Z","dependency_job_id":null,"html_url":"https://github.com/OpenNMS/elasticsearch-drift-plugin","commit_stats":{"total_commits":99,"total_committers":9,"mean_commits":11.0,"dds":0.5656565656565656,"last_synced_commit":"03cf4745be8defd7f76140a200d8861f3cb7f465"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNMS%2Felasticsearch-drift-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNMS%2Felasticsearch-drift-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNMS%2Felasticsearch-drift-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenNMS%2Felasticsearch-drift-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenNMS","download_url":"https://codeload.github.com/OpenNMS/elasticsearch-drift-plugin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231143394,"owners_count":18334385,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","flows","hacktoberfest","ipfix","netflow","opennms"],"created_at":"2024-12-24T19:19:37.334Z","updated_at":"2024-12-24T19:19:41.769Z","avatar_url":"https://github.com/OpenNMS.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Elasticsearch Drift Plugin  [![CircleCI](https://circleci.com/gh/OpenNMS/elasticsearch-drift-plugin.svg?style=svg)](https://circleci.com/gh/OpenNMS/elasticsearch-drift-plugin)\n\nTime series aggregation for flow records.\n\n|   Drift Plugin  | Elasticsearch     | Release date   |\n|-----------------|-------------------|:--------------:|\n| 1.0.x           | 6.2.4             |  May 2018      |\n| 1.1.0           | 6.5.4             |  Feb 2019      |\n| x.y.z           | x.y.z             |  June 2019     |\n\n\u003e After 1.1.0 we switched to using the same version number as the Elasticsearch version that is being targetted.\n\n## Overview\n\nThis plugin provides a new aggregation function `proportional_sum` that can be used to:\n\n1. Group documents that contain a date range into multiple buckets\n1. Calculate a sum on a per bucket basis using a ratio that is proportional to the range of time in which the document spent in that bucket.\n\nThis aggregation function behaves like a hybrid of both the `Metrics` and `Bucket` type aggregations since we both create buckets and calculate a new metric.\n\n## Installation\n\n### RPM\n\nInstall the package repository:\n```\nsudo yum install https://yum.opennms.org/repofiles/opennms-repo-stable-rhel7.noarch.rpm\nsudo rpm --import https://yum.opennms.org/OPENNMS-GPG-KEY\n```\n\nInstall the package:\n```\nsudo yum install elasticsearch-drift-plugin\n```\n\n### Debian\n\nCreate a new apt source file (eg: `/etc/apt/sources.list.d/opennms.list`), and add the following 2 lines:\n```\ndeb https://debian.opennms.org stable main\ndeb-src https://debian.opennms.org stable main\n```\n\nImport the packages' authentication key with the following command:\n```\nwget -O - https://debian.opennms.org/OPENNMS-GPG-KEY | sudo apt-key add -\n```\n\nInstall the package:\n```\nsudo apt-get update\nsudo apt-get install elasticsearch-drift-plugin\n```\n\n## Use Case\n\nWe are interested in generating time series for Netflow records stored in Elasticsearch.\nEach Netflow record is stored as a separate document and contains the following fields of interest:\n\n```json\n{\n  \"timestamp\": 460,\n  \"netflow.first_switched\": 100,\n  \"netflow.last_switched\": 450,\n  \"netflow.bytes\": 350\n}\n```\n\nFor this record, we’d like to be able to generate a time series with start=0, end=500, step=100, and have the following data points:\n\n```\nt=0, bytes=0\nt=100, bytes=100\nt=200, bytes=100\nt=300, bytes=100\nt=400, bytes=50\nt=500, bytes=0\n```\n\nIn this case, each step (or bucket) would contain a fraction of the bytes, relative to how much of the flow falls into that step.\nWe assume that the flow bytes are evenly spread across the range and if were multiple flow records in a single step we would sum of the corresponding bytes.\n\nSince the existing aggregation facilities in Elasticsearch don't support this behavior, we've gone ahead and developed our own.\n\n## Usage\n\nUsing the record above, the `proportional_sum` aggregation can be used as follows:\n\n### Request\n\n```json\n{\n  \"size\": 0,\n  \"aggs\": {\n    \"bytes_over_time\": {\n      \"proportional_sum\": {\n        \"fields\": [\n          \"netflow.first_switched\",\n          \"netflow.last_switched\",\n          \"netflow.bytes\"\n        ],\n        \"interval\": 100,\n        \"start\": 0,\n        \"end\": 500\n      }\n    },\n    \"bytes_total\": {\n      \"sum\": {\n        \"field\": \"netflow.bytes\"\n      }\n    }\n  }\n}\n```\n\nThe `fields` options must be present, and must reference the following document fields in order:\n\n1. The start of the range\n1. The end of the range\n3. The value\n\nThe `interval` can be set a string with a date format, or a numeric value representing the number of milliseconds between steps.\n\nThe `start` and `end` fields are optional and take a unix timestamp in milliseconds.\nWhen set, the generated buckets will be limited to ones that fall within this range.\nThis allows for the documents themselves to be contain wider ranges for which we do not want generate buckets/series for.\n\n### Response\n\n```json\n{\n  \"took\" : 2,\n  \"timed_out\" : false,\n  \"_shards\" : {\n    \"total\" : 5,\n    \"successful\" : 5,\n    \"skipped\" : 0,\n    \"failed\" : 0\n  },\n  \"hits\" : {\n    \"total\" : 1,\n    \"max_score\" : 0.0,\n    \"hits\" : [ ]\n  },\n  \"aggregations\" : {\n    \"bytes_total\" : {\n      \"value\" : 350.0\n    },\n    \"bytes_over_time\" : {\n      \"buckets\" : [\n        {\n          \"key\" : 100,\n          \"doc_count\" : 1,\n          \"value\" : 100.0\n        },\n        {\n          \"key\" : 200,\n          \"doc_count\" : 1,\n          \"value\" : 100.0\n        },\n        {\n          \"key\" : 300,\n          \"doc_count\" : 1,\n          \"value\" : 100.0\n        },\n        {\n          \"key\" : 400,\n          \"doc_count\" : 1,\n          \"value\" : 50.0\n        }\n      ]\n    }\n  }\n}\n```\n\nHere we can see that many buckets were generated for the single document and that the value was spread into these buckets accordingly.\n\n## Building and installing from source\n\nTo compile the plugin run:\n```\nmvn clean package\n```\n\nNext, ensure setup an Elasticsearch instance using the same version that is defined in the `pom.xml`.\nThe version must match exactly, otherwise Elasticsearch will refuse to start.\n\nInstall the plugin using:\n```\n/usr/share/elasticsearch/bin/elasticsearch-plugin install file:///path/to/elasticsearch-drift/plugin/target/releases/elasticsearch-drift-plugin-1.0.0-SNAPSHOT.zip\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopennms%2Felasticsearch-drift-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopennms%2Felasticsearch-drift-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopennms%2Felasticsearch-drift-plugin/lists"}