{"id":25824163,"url":"https://github.com/utmhikari/daggre","last_synced_at":"2026-05-11T01:24:01.980Z","repository":{"id":100475752,"uuid":"456441221","full_name":"utmhikari/daggre","owner":"utmhikari","description":"DAta-AGGREgator, a tool to handle data aggregation tasks","archived":false,"fork":false,"pushed_at":"2022-11-22T13:10:36.000Z","size":2536,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-14T14:49:24.822Z","etag":null,"topics":["daggre","data-aggregation","data-filtering","data-process","game-configuration","game-testing","gin","golang","join-tables","table-joining-service"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/utmhikari.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-07T09:38:57.000Z","updated_at":"2022-12-06T01:40:14.000Z","dependencies_parsed_at":"2023-05-15T02:00:22.342Z","dependency_job_id":null,"html_url":"https://github.com/utmhikari/daggre","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utmhikari%2Fdaggre","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utmhikari%2Fdaggre/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utmhikari%2Fdaggre/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utmhikari%2Fdaggre/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/utmhikari","download_url":"https://codeload.github.com/utmhikari/daggre/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241150665,"owners_count":19918354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["daggre","data-aggregation","data-filtering","data-process","game-configuration","game-testing","gin","golang","join-tables","table-joining-service"],"created_at":"2025-02-28T12:36:38.866Z","updated_at":"2026-05-11T01:23:56.934Z","avatar_url":"https://github.com/utmhikari.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# daggre\n\n**DAta-AGGREgator**, a tool to handle data aggregation tasks\n\nInspired by [mongodb aggregation](https://www.mongodb.com/docs/manual/aggregation/), daggre allows you to specify data aggregation pipelines via json-based configurations, so you need not to code anymore. A single aggregation pipeline can be applied to multiple groups of data, no matter where the data comes from, so that the pipelines are easily to be managed.\n\n## Scenarios\n\nA typical scenario is for config-table-check in game QA works. In game development, config-tables may appear as not only DB data but also code lines, made into files of different forms. \n\nWhen testing new features, such as malls and markets, the QA may need a joint-view of tables of malls, categories, products and items, linked by different primary-key ids, to check if the actual config fits the demand. And also, for some game systems like main/side-quests, the QA may need some fixed rules to monitor and check if the config data have no potential bug risks.\n\nFor joint-views of multiple tables, an aggregation pipeline with **lookup** and **unwind** stage can be able to join and list all the data. For example, if you have these tables:\n\n```python\nmalls = [\n    {\"id\": 1, \"name\": \"Weapon\", \"products\": [1, 2, 3]},\n    {\"id\": 2, \"name\": \"Armor\", \"products\": [4]}\n]\n\nproducts = [\n    {\"id\": 1, \"name\": \"AWP\", \"itemID\": 45},\n    {\"id\": 2, \"name\": \"AK47\", \"itemID\": 42},\n    {\"id\": 3, \"name\": \"M4A1\", \"itemID\": 43},\n    {\"id\": 4, \"name\": \"Kevlar\": \"itemID\": 51}\n]\n```\n\nPick **malls** as main table, after **unwind** the **products** and **lookup** the **product** by **id**, we are able to get the final joint-view:\n\n```python\nmalls_products = [\n    {\"id\": 1, \"name\": \"Weapon\", \"product\": {\"id\": 1, \"name\": \"AWP\", \"itemID\": 45}},\n    {\"id\": 1, \"name\": \"Weapon\", \"product\": {\"id\": 2, \"name\": \"AK47\", \"itemID\": 42}},\n    {\"id\": 1, \"name\": \"Weapon\", \"product\": {\"id\": 3, \"name\": \"M4A1\", \"itemID\": 43}},\n    {\"id\": 2, \"name\": \"Armor\", \"product\": {\"id\": 4, \"name\": \"Kevlar\": \"itemID\": 51}}\n]\n```\n\nFor data-check works, pipelines with **filter** stages is needed. For example, if we have items data like these:\n\n```python\nitems = [\n    {\"id\": 1, \"name\": \"Molotov\", \"desc\": \"fire in the hole\"},\n    {\"id\": 2, \"name\": \"Flashbang\", \"desc\": \"hooxi sucks\"}\n]\n```\n\nIf we don't want to see the dirty words, just **filter** the rows by the rule: **desc** excludes **suck**.\n\n## Usage\n\n### daggre-cli\n\n**DAGGRE-CLI** is the command-line program for processing data \u0026 aggregation specifications.\n\nCompile `daggre_cli.go` to generate the executable, arguments are follows:\n\n- `-h`: show help message\n- `--workdir`: working directory for daggre-cli\n- `--datapath`: the input json file path of data source, relative to `workdir`\n- `--aggrepath`: the input json file path of aggregation specification, relative to `workdir`\n- `--outputpath`: the output json file path of aggregated data, relative to `workdir`\n- `--statspath`: the output json file path of aggregation statistics, relative to `workdir`\n\nFor the input data of `datapath`, the contents should be dict of table names and row data:\n\n```json\n{\n  \"tableName\": [\n    {\"rowID\": 1, \"name\": \"foo\"},\n    {\"rowID\": 2, \"name\": \"bar\"}\n  ],\n  \"tableName2\": [\n    {\"id\": 1, \"name\":  \"hello\"}\n  ]\n}\n```\n\nFor the input aggregation of `aggrepath`, the contents should specify all pipelines and a main pipeline, which looks like this:\n\n```json\n{\n  \"pipelines\": [\n    {\n      \"name\": \"pipeline1\",\n      \"desc\": \"my first pipeline\",\n      \"tables\": [\"tableName\"],\n      \"stages\": [\n        {\n          \"name\": \"filter\",\n          \"params\": {\n            \"locator\": \"rowID\",\n            \"operator\": \"\u003c=\",\n            \"value\": 1\n          }\n        }\n      ]\n    },\n    {\n      \"name\": \"pipeline2\",\n      \"desc\": \"my second pipeline\",\n      \"tables\": [\"tableName2\"],\n      \"stages\": []\n    }\n  ],\n  \"main\": \"pipeline1\"\n}\n```\n\nEach pipeline should have:\n\n- unique **name**\n- optional **desc**\n- init **tables**, which would be merged when the pipeline start\n- **stages** to be processed\n\nFor each stage, the stage **name** and **params** should be specified, see **Pipeline Stages** below for details.\n\nFinally, **main** pipeline must be specified, as the basic data for final output.\n\nAfter execution of **daggre-cli**, the json output files are generated at `outputpath` and `statspath`.\n\nThe file of `outputpath` contains the final aggregated data, while the file of `statspath` contains the stats of all executed pipelines, these are:\n\n- if the aggregation runs successfully\n- error message\n- input \u0026 output data sizes\n- start \u0026 end unix timestamps\n- status, output size, start \u0026 end unix timestamps of all the pipeline stages\n\nSee `res/testcases/cli` for examples of workplaces and results.\n\n### daggre-svr\n\n**DAGGRE-SVR** is an HTTP server offering aggregation services, based on [gin](https://github.com/gin-gonic)\n\nCompile `daggre-svr.go` to generate the executable, arguments are follows:\n\n- `-h`: show help message\n- `-c`: specify yaml config file path\n\nThe yaml config file should specify the listen port:\n\n```yaml\nport: 8954\n```\n\nAfter launching the server, users should **POST** **json body** to `/api/v1/aggre` to start aggregation tasks.\n\nThe **json body** should be like this:\n\n```json\n{\n  \"data\": \"the dataset, same as the contents of 'datapath' in daggre-cli\",\n  \"aggre\": \"the aggregation, same as the contents of 'aggrepath' in daggre-cli\"\n}\n```\n\nAnd the server would respond **json body** like this:\n\n```json\n{\n  \"output\": \"the final output data, same as the contents of 'outputpath' in daggre-cli\",\n  \"stats\": \"the aggregation statistics, same as the contents of 'statspath' in daggre-cli\"\n}\n```\n\nLaunch server and test `res/testcaes/svr/aggre_test.go` to take a try!\n\n## Pipeline Stages\n\n### filter\n\n**FILTER** stage can filter the rows based on the rules specified, the params are follows:\n\n- `locator`: a string to locate the value key by key, with dot `.` as the separator\n- `operator`: comparison operators only: `\u003c`, `\u003c=`, `\u003e`, `\u003e=`, `==`, `!=` \n- `value`: the value to be compared\n\n### lookup\n\n**LOOKUP** stage can join two tables based on specific columns, the params are follows:\n\n- `fromPipeline`: rows from which pipeline to joint into\n- `localLocator`: the locator specifies the local joint-key\n- `foreignLocator`: the locator specifies the foreign joint-key\n- `toField`: which field the foreigh row data to joint info\n\n### sort\n\n**SORT** stage can sort the current table in place, the params are follows:\n\n- `rules`: sort rules in priority order\n  - `locator`: the locator to locate the row value\n  - `order`: the sort order, should be either `1` (ASC) or `-1` (DESC)\n\n### unwind\n\n**UNWIND** stage can flatten an array into multiple items linking to same copies of row data.\n\nThe params are follows:\n\n- `locator`: the locator to locate the array value\n- `includeArrayIndex`: if not empty, specifies the key to hold the array index value\n- `preserveNullAndEmptyArrays`: whether preserve the row if array value cannot be located\n\n## Customization\n\nYou are able to customize your own pipeline stages by doing these steps:\n\n- declare the `struct` your stage, which should contain `daggre.BasePipelineStage` and your stage params as members\n- implement methods of `daggre.PipelineStageInterface` if necessary\n  - `Check`: check if there is error in stage params\n  - `ChildPipelines`: declare all the stage params representing other pipelines\n  - `Process`: pipeline process logic\n- implement factory function `NewXXXStage`, then call `daggre.RegisterPipelineStage` to register it\n\nsee `res/testcases/zzz/custom/custom_stage_test.go` for an example\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futmhikari%2Fdaggre","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Futmhikari%2Fdaggre","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futmhikari%2Fdaggre/lists"}