{"id":13567352,"url":"https://github.com/michelcaradec/Graph-OLAP","last_synced_at":"2025-04-04T01:31:49.412Z","repository":{"id":94611149,"uuid":"125622218","full_name":"michelcaradec/Graph-OLAP","owner":"michelcaradec","description":"An attempt to model an OLAP cube with Neo4j.","archived":false,"fork":false,"pushed_at":"2018-03-17T11:39:01.000Z","size":1514,"stargazers_count":47,"open_issues_count":2,"forks_count":3,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-04T22:36:53.845Z","etag":null,"topics":["cypher","neo4j","olap-cube"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michelcaradec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-03-17T11:24:17.000Z","updated_at":"2024-10-20T06:46:13.000Z","dependencies_parsed_at":"2023-04-18T13:49:17.366Z","dependency_job_id":null,"html_url":"https://github.com/michelcaradec/Graph-OLAP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelcaradec%2FGraph-OLAP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelcaradec%2FGraph-OLAP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelcaradec%2FGraph-OLAP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelcaradec%2FGraph-OLAP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michelcaradec","download_url":"https://codeload.github.com/michelcaradec/Graph-OLAP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247107816,"owners_count":20884793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cypher","neo4j","olap-cube"],"created_at":"2024-08-01T13:02:29.218Z","updated_at":"2025-04-04T01:31:44.401Z","avatar_url":"https://github.com/michelcaradec.png","language":null,"funding_links":[],"categories":["Content","Misc"],"sub_categories":[],"readme":"# Graph OLAP with Neo4j\r\n\r\n*Alternative title: \"The [Graph Kimball](https://en.wikipedia.org/wiki/Ralph_Kimball) Project\" (sorry, I could not resist).*\r\n\r\nAn attempt to model an [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) with [Neo4j](https://neo4j.com).\r\n\r\nAs a former [OLAP developer](https://bimatters1403.wordpress.com/), and [recent adopter](https://github.com/michelcaradec/Graph-Theory) of graph databases, I was curious to check how **Neo4j** could handle **Online Analytical Processing** structures, dedicated to reporting.\r\n\r\nThese concepts will be experimented by creating an OLAP cube, with the only use of [Cypher](https://neo4j.com/developer/cypher-query-language/) language.\r\n\r\nMade with:\r\n\r\n- [Visual Studio Code](https://code.visualstudio.com/).\r\n    - [Markdown All in One](https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one).\r\n- [draw.io](https://www.draw.io).\r\n- [Arrow Tool](http://www.apcjones.com/arrows/).\r\n- [Neo4j Desktop](https://neo4j.com/download/) with [Neo4j Engine v3.3.3](https://github.com/neo4j/neo4j/releases/tag/3.3.3).\r\n\r\nTable of Content:\r\n\r\n\u003cdetails\u003e\r\n\r\n- [Neo4j Graph Cube](#neo4j-graph-cube)\r\n    - [Dataset](#dataset)\r\n    - [Cube Structure](#cube-structure)\r\n    - [Storage Implementation](#storage-implementation)\r\n        - [Facts Consolidation](#facts-consolidation)\r\n    - [Data Import](#data-import)\r\n        - [Dimension Import](#dimension-import)\r\n        - [Facts Import](#facts-import)\r\n        - [Implementation Details](#implementation-details)\r\n    - [Querying](#querying)\r\n        - [Query 1 - Sum of Sales for Strings in 2016](#query-1---sum-of-sales-for-strings-in-2016)\r\n        - [Query 2 - Sum of Sales for Strings in January 2016](#query-2---sum-of-sales-for-strings-in-january-2016)\r\n        - [Query 3 - Detail of Sales for Strings in January 2016](#query-3---detail-of-sales-for-strings-in-january-2016)\r\n    - [Aggregates Store](#aggregates-store)\r\n        - [Aggregate Strategy](#aggregate-strategy)\r\n        - [Aggregate Bitmask](#aggregate-bitmask)\r\n        - [Aggregate Creation](#aggregate-creation)\r\n            - [By Place.Region](#by-placeregion)\r\n            - [By Place.Country](#by-placecountry)\r\n            - [By Product.Product x Place.Country](#by-productproduct-x-placecountry)\r\n    - [Querying Aggregates](#querying-aggregates)\r\n        - [Query 1 - Sum of Sales for Violin in France](#query-1---sum-of-sales-for-violin-in-france)\r\n        - [Query 2 - Sum of Sales for Violin in 2016](#query-2---sum-of-sales-for-violin-in-2016)\r\n    - [Conclusion](#conclusion)\r\n\r\n\u003c/details\u003e\r\n\r\n## Dataset\r\n\r\nWe will use a sample dataset stored in tabulation-delimited files named [sales.2016.tsv](data/sales.2016.tsv) and [sales.2017.tsv](data/sales.2017.tsv).\r\n\r\nTop and bottom lines:\r\n\r\n```raw\r\nRegion         Country    Year  Month  Product   Category  Sales    Units\r\nNorth America  USA        2016  1      Piano     Strings   4251.00  693\r\nNorth America  USA        2016  1      Violin    Strings   2789.00  571\r\nNorth America  USA        2016  1      Cello     Strings   3908.00  311\r\nNorth America  USA        2016  1      Guitar    Strings   5034.00  284\r\n...\r\nSouth America  Argentina  2017  12     Clarinet  Winds     4576.00  329\r\nSouth America  Argentina  2017  12     Trumpet   Brass     3540.00  327\r\nSouth America  Argentina  2017  12     Trombone  Brass     3646.00  599\r\nSouth America  Argentina  2017  12     Tuba      Brass     6971.00  465\r\nSouth America  Argentina  2017  12     Sax       Winds     5718.00  236\r\n```\r\n\r\nIn order to make it easy to understand, some explanations will be based on a tiny dataset ([sales.small.tsv](data/sales.small.tsv)):\r\n\r\n```raw\r\nRegion  Country  Year  Month  Product  Category  Sales  Units\r\nFrance  Europe   2016  1      Piano    Strings   1000   3\r\nFrance  Europe   2016  1      Piano    Strings   2000   5\r\nFrance  Europe   2016  1      Violin   Strings   100    1\r\nFrance  Europe   2016  1      Violin   Strings   200    2\r\n```\r\n\r\n## Cube Structure\r\n\r\nOur cube will be made of **three dimensions** and **two measures**.\r\n\r\n| Object | Type | Components | Description |\r\n|---|---|---|---|\r\n| Product | Dimension | Levels = Category, Product | Products sold. |\r\n| Place | Dimension | Levels = Region, Country | Locations where products were sold. |\r\n| Time | Dimension | Levels = Year, Month | Dates when events (sales) occurred. |\r\n| Measure | Measure | Measures = Sales, Units | Sales in Euros, units sold. |\r\n\r\nThe [Star Schema](https://en.wikipedia.org/wiki/Star_schema) model of our cube:\r\n\r\n![](assets/star.png)\r\n\r\n## Storage Implementation\r\n\r\nEach component (levels and measures) will be stored as nodes.\r\n\r\nNodes will be linked by relationships to represent how they are hierarchically connected.\r\n\r\n| Dimension | Level | Label |\r\n|---|---|---|\r\n| Product | Category | `:Category` |\r\n| Product | Product | `:Product` |\r\n| Place | Region | `:Region` |\r\n| Place | Country | `:Country` |\r\n| Time | Year | `:Year` |\r\n| Time | Month | `:Month` |\r\n| Measures | | `:Measure`|\r\n\r\nThis gives use the following meta-model:\r\n\r\n![](assets/model.png)\r\n\r\nExample:\r\n\r\n![](assets/implementation_notes_01.png)\r\n\r\n*Graph generated importing the first 3 lines (including header) of the tiny dataset.* \r\n\r\n- For dimensions, there will be one node per member of each level.\r\n- For facts, there will be one node per event for a given product, place and time (i.e. [leaf level members](http://olap.com/learn-bi-olap/olap-bi-definitions/leaf-level/) of corresponding dimensions).\r\n\r\nIn the following screenshot, **Product** category **Strings** appears once, and is connected to two products (**Piano** and **Violin**). The same principle is applied for **Time** and **Place** dimensions.\r\n\r\n![](assets/implementation_notes_02.png)\r\n\r\n*Graph generated importing the tiny dataset.* \r\n\r\nThere are two facts nodes, corresponding to two events which occurred in **France** in **January 2016** for **Piano** and **Violin**.\r\n\r\n### Facts Consolidation\r\n\r\nMultiple events can occur at the same time, for the same product and the same country (see rows 2-3 and 4-5 of our tiny dataset).\r\n\r\nAt import time, facts must be aggregated to a single `:Measure` node containing cumulated values (using Sum aggregation operator).\r\n\r\n## Data Import\r\n\r\nWe will use the [LOAD CSV](http://neo4j.com/docs/developer-manual/current/cypher/clauses/load-csv/) clause to import our data.\r\n\r\nBefore starting, let's initialize our database with some index creation:\r\n\r\n```cypher\r\n// Dimensions\r\nCREATE INDEX ON :Country(country);\r\nCREATE INDEX ON :Product(product);\r\nCREATE INDEX ON :Month(month);\r\n// Measures\r\nCREATE INDEX ON :Measure(mid);\r\n```\r\n\r\nOur import script must support:\r\n\r\n- **Incremental** data import, as data can come from multiple data sources, at different time, with overlapping events.\r\n- **[Slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension)**: we will keep it simple by overwriting property values.\r\n\r\nFirst part of the import script:\r\n\r\n```cypher\r\nUNWIND [\"sales.2016.tsv\", \"sales.2017.tsv\"] AS sourceFile\r\n//USING PERIODIC COMMIT 500\r\nLOAD CSV WITH HEADERS\r\n    FROM \"file:///sourceFile\" + sourceFile\r\n    AS row\r\n    FIELDTERMINATOR '\\t'\r\n```\r\n\r\nSecond line was commented as it doesn't seem possible to mix [UNWIND](https://neo4j.com/docs/developer-manual/current/cypher/clauses/unwind/) or [FOREACH](https://neo4j.com/docs/developer-manual/current/cypher/clauses/foreach/) clauses with [USING PERIODIC COMMIT](https://neo4j.com/docs/developer-manual/current/cypher/query-tuning/using/#query-using-periodic-commit-hint) query hint.\r\n\r\n### Dimension Import\r\n\r\nFor each dimensions levels, we will create one node per member. Nodes hierarchically related will be connected by a relationship.\r\n\r\nExample for **Europe** and **France** (respectively members of **Region** and **Country** levels of **Place** dimension): `(:Country {country: 'France'})-[:IN_REGION]-\u003e(:Region {region: 'Europe'})`.\r\n\r\nPlace dimension:\r\n\r\n```cypher\r\nMERGE (r:Region {region: row.Region})\r\nMERGE (c:Country {country: row.Country})\r\nMERGE (c)-[:IN_REGION]-\u003e(r)\r\n```\r\n\r\nProduct dimension:\r\n\r\n```cypher\r\nMERGE (cat:Category {category: row.Category})\r\nMERGE (prod:Product {product: row.Product})\r\nMERGE (prod)-[:IN_CATEGORY]-\u003e(cat)\r\n```\r\n\r\nTime dimension:\r\n\r\n```cypher\r\nMERGE (y:Year {year: toInteger(row.Year)})\r\nMERGE (m:Month {year: toInteger(row.Year), month: toInteger(row.Month)})\r\nMERGE (m)-[:OF_YEAR]-\u003e(y)\r\n```\r\n\r\nFor **Month** level (`:Month` node), a `year` property is added in order to disambiguate months from different years (**March 2016** is different from **March 2017**).\r\n\r\n### Facts Import\r\n\r\nImport script continues with the import of measures:\r\n\r\n```cypher\r\nWITH\r\n    c, m, prod, row,\r\n    c.country + '_' + toString(m.year) + '_' + toString(m.month) + '_' + prod.product AS MeasureID\r\n// Facts\r\nMERGE (meas:Measure {mid: MeasureID})\r\nON CREATE\r\n    // Create new fact node\r\n    SET meas.sales = toFloat(row.Sales),\r\n    meas.units = toInteger(row.Units)\r\nON MATCH\r\n    // Update fact node\r\n    SET meas.sales = meas.sales + toFloat(row.Sales),\r\n    meas.units = meas.units + toInteger(row.Units)\r\n// Foreign keys\r\nMERGE (meas)-[:IN_PLACE]-\u003e(c)\r\nMERGE (meas)-[:AT_TIME]-\u003e(m)\r\nMERGE (meas)-[:FOR_PRODUCT]-\u003e(prod)\r\n```\r\n\r\nThe full import cypher script can be found in [create.v2.cypher](cypher/create.v2.cypher) file.\r\n\r\nOutput after execution:\r\n\r\n```raw\r\nAdded 4285 labels, created 4285 nodes, set 12757 properties, created 12723 relationships, completed after 4277 ms.\r\n```\r\n\r\n### Implementation Details\r\n\r\nSlowly changing dimension is handled by using [MERGE](https://neo4j.com/docs/developer-manual/current/cypher/clauses/merge/) expressions. It succeeds in updating or inserting level members.\r\n\r\nAn extra challenge must be faced with measures, as properties can't just be overwritten. If a node already exists for a given event, imported values must be added to existing ones.\r\n\r\nA [first version of the script](cypher/create.v1.cypher) was relying on [OPTIONAL MATCH](https://neo4j.com/docs/developer-manual/current/cypher/clauses/optional-match/) clause to identify existing and non-existing measure nodes, based on presence of as many relationships as number of dimensions for the event to import:\r\n\r\n```cypher\r\nWITH c, m, prod, row\r\n// Facts\r\nOPTIONAL MATCH\r\n    (meas:Measure)-[:IN_PLACE]-\u003e(c),\r\n    (meas:Measure)-[:AT_TIME]-\u003e(m),\r\n    (meas:Measure)-[:FOR_PRODUCT]-\u003e(prod)\r\n// Create new fact node\r\nFOREACH (x IN CASE WHEN meas IS NULL THEN [1] ELSE [] END |\r\n    CREATE (meas:Measure {sales: toFloat(row.Sales), units: toInteger(row.Units)})\r\n    // Foreign keys\r\n    CREATE (meas)-[:IN_PLACE]-\u003e(c)\r\n    CREATE (meas)-[:AT_TIME]-\u003e(m)\r\n    CREATE (meas)-[:FOR_PRODUCT]-\u003e(prod)\r\n)\r\n// Update fact node\r\nFOREACH (x IN CASE WHEN meas IS NULL THEN [] ELSE [1] END |\r\n    SET meas.sales = meas.sales + toFloat(row.Sales)\r\n    SET meas.units = meas.units + toInteger(row.Units)\r\n)\r\n```\r\n\r\nFor some reason I couldn't explain, existing measures were never detected, leading to systematic measure node creation (i.e. last part of the script was never executed).\r\n\r\nThis option was initially preferred for its **relationship-oriented nature**. As I wasn't successful in making it work, an [alternative solution was found, as previously seen](#facts-import), based on an extra-property named `mid` (for **M**easure **ID**entifier) containing duplicate information on its related dimension members.\r\n\r\nThe full non-working import cypher script can be found in [create.v1.cypher](cypher/create.v1.cypher) file.\r\n\r\nThe execution of the data import script gives the following meta-model:\r\n\r\n![](assets/model_neo4j.png)\r\n\r\n## Querying\r\n\r\nA this stage, we have an OLAP-like graph structure containing consolidated facts.\r\n\r\nWe are now ready to run few queries.\r\n\r\n### Query 1 - Sum of Sales for Strings in 2016\r\n\r\n```cypher\r\nMATCH (cat:Category {category: 'Strings'})\u003c-[*]-(meas:Measure)\r\nMATCH (year:Year {year: 2016})\u003c-[*]-(meas:Measure)\r\nRETURN\r\n    SUM(meas.sales) AS Sales,\r\n    SUM(meas.units) AS Units;\r\n```\r\n\r\nOutput (from tiny dataset):\r\n\r\n| Sales | Units |\r\n|---|---|\r\n| 3300 | 11 |\r\n\r\n### Query 2 - Sum of Sales for Strings in January 2016\r\n\r\n```cypher\r\nMATCH (cat:Category {category: 'Strings'})\u003c-[*]-(meas:Measure)\r\nMATCH (year:Year {year: 2016})\u003c-[]-(month:Month {month: 1})\u003c-[]-(meas:Measure)\r\nRETURN\r\n    SUM(meas.sales) AS Sales,\r\n    SUM(meas.units) AS Units;\r\n```\r\n\r\nAlternatively:\r\n\r\n```cypher\r\nMATCH (cat:Category {category: 'Strings'})\u003c-[*]-(meas:Measure)\r\nMATCH (month:Month {year: 2016, month: 1})\u003c-[]-(meas:Measure)\r\nRETURN\r\n    SUM(meas.sales) AS Sales,\r\n    SUM(meas.units) AS Units;\r\n```\r\n\r\nOutput (from tiny dataset):\r\n\r\n| Sales | Units |\r\n|---|---|\r\n| 3300 | 11 |\r\n\r\n### Query 3 - Detail of Sales for Strings in January 2016\r\n\r\nReturning the detail of an aggregated measure is called [Drill-through](http://olap.com/learn-bi-olap/olap-bi-definitions/drill-through/).\r\n\r\n```cypher\r\nMATCH (cat:Category {category: 'Strings'})\u003c-[]-(prod:Product)\u003c-[]-(meas:Measure)\r\nMATCH (year:Year {year: 2016})\u003c-[]-(month:Month {month: 1})\u003c-[]-(meas:Measure)\r\nRETURN\r\n    year.year AS Year,\r\n    month.month AS Month,\r\n    cat.category AS Category,\r\n    prod.product AS Product,\r\n    meas.sales AS Sales,\r\n    meas.units AS Units\r\nLIMIT 10;\r\n```\r\n\r\nOutput (from tiny dataset):\r\n\r\n| Year | Month | Category | Product | Sales | Units |\r\n|---|---|---|---|---|---|\r\n| 2016 | 1 | Strings | Piano | 3000 | 8 |\r\n| 2016 | 1 | Strings | Violin | 300 | 3 |\r\n\r\n## Aggregates Store\r\n\r\nThe [queries](#querying) we previously wrote only relied on facts to return a result. Once retreived, facts must be aggregated with the use of [SUM](https://neo4j.com/docs/developer-manual/current/cypher/functions/aggregating/#functions-sum) aggregation function.\r\n\r\nThis does the job, but is far from OLAP philosophy, where aggregates can be pre-calculated in order to improve query response time.\r\n\r\n### Aggregate Strategy\r\n\r\nAggregates will be stored as nodes, with a label name composed of two parts:\r\n\r\n- A prefix `:Aggregate_`.\r\n- A suffix representing an [aggregate bitmask](#aggregate-bitmask).\r\n\r\nExample: `(:Aggregate_000000)`.\r\n\r\nThis concept is called [MOLAP](https://en.wikipedia.org/wiki/Online_analytical_processing#Multidimensional_OLAP_.28MOLAP.29) when applied to a multi-dimensional database store, [ROLAP](https://en.wikipedia.org/wiki/Online_analytical_processing#Relational_OLAP_(ROLAP)) when applied to a relational one. Let's call it **GOLAP** (for Graph OLAP), as we are using a graph database store (we could also call it **NOLAP** for Neo4j OLAP).\r\n\r\n### Aggregate Bitmask\r\n\r\nAn aggregate bitmask is a sequence of characters that **identifies** which part of a [sub-cube](https://en.wikipedia.org/wiki/OLAP_cube#Operations) an aggregate is calculated on.\r\n\r\nThe sequence is composed of N positions (one for each level), each position taking one of the following values:\r\n\r\n| Value | Description |\r\n|---|---|\r\n| x | Aggregate is calculated for all members of the level. |\r\n| 0 | No aggregate is calculated for the level. |\r\n| 1 | Aggregate is calculated for each member of the level. |\r\n\r\nThe order of the position is very important, and must remain the same. If the cube structure is modified (by the addition of extra dimensions or levels), aggregate strategy must be redefined.\r\n\r\nAggregate sequence will be composed of **six positions** for our sample cube, as it contains six levels (**Category**, **Product**, **Region**, **Country**, **Year**, **Month**).\r\n\r\nAggregate bitmask construction:\r\n\r\n| Product.Category | Product.Product | Place.Region | Place.Country | Time.Year | Time.Month | Description | Scope |\r\n|---|---|---|---|---|---|---|---|\r\n| 1 | 0 | x | 0 | x | 0 | By Product.Category | One-level |\r\n| 0 | 1 | x | 0 | x | 0 | By Product.Product | One-level |\r\n| 0 | 1 | 0 | 1 | x | 0 | By Product.Product x Place.Country | Two-levels |\r\n\r\n### Aggregate Creation\r\n\r\nThere are six **one-level** aggregates for our sample cube:\r\n\r\n| Aggregate | Description |\r\n|---|---|\r\n| Aggregate_10x0x0 | By Product.Category |\r\n| Aggregate_01x0x0 | By Product.Product |\r\n| Aggregate_x010x0 | By Place.Region |\r\n| Aggregate_x001x0 | By Place.Country |\r\n| Aggregate_x0x010 | By Time.Year |\r\n| Aggregate_x0x001 | By Time.Month |\r\n\r\n#### By Place.Region\r\n\r\n```cypher\r\nMATCH (r:Region)\u003c-[*2]-(meas:Measure)\r\nWITH DISTINCT r, SUM(meas.sales) AS SumSales, SUM(meas.units) As SumUnits\r\nCREATE (a:Aggregate_x010x0 {sales: SumSales, units: SumUnits})-[:AGGREGATE_OF]-\u003e(r);\r\n```\r\n\r\nOutput:\r\n\r\n```\r\nAdded 5 labels, created 5 nodes, set 10 properties, created 5 relationships, completed after 36 ms.\r\n```\r\n\r\nHere is how **Aggregate_x010x0** aggregate fits in the graph for **Europe** region:\r\n\r\n![](assets/region_aggregate_x010x0.png)\r\n\r\n*Aggregate node is coloured in grey.*\r\n\r\n#### By Place.Country\r\n\r\n```cypher\r\nMATCH (ct:Country)\u003c-[]-(meas:Measure)\r\nWITH DISTINCT ct, SUM(meas.sales) AS SumSales, SUM(meas.units) As SumUnits\r\nCREATE (a:Aggregate_x001x0 {sales: SumSales, units: SumUnits})-[:AGGREGATE_OF]-\u003e(ct);\r\n```\r\n\r\nOutput:\r\n\r\n```\r\nAdded 16 labels, created 16 nodes, set 32 properties, created 16 relationships, completed after 28 ms.\r\n```\r\n\r\nHere is how **Aggregate_x010x0** (by Place.Region) and **Aggregate_x001x0** (by Place.Country) aggregates fit in the graph for **Europe** and **France**:\r\n\r\n![](assets/country_aggregate_x001x0.png)\r\n\r\nThe higher the level, the greater the performance benefit when querying aggregate nodes (coloured in grey), as they prevent querying indivudual (i.e. detail) measure nodes (coloured in red).\r\n\r\nThe creation of the one-level aggregates gives the following meta-model:\r\n\r\n![](assets/model_aggregates_1level_neo4j.png)\r\n\r\n*One-level aggregate nodes are coloured in grey.*\r\n\r\n#### By Product.Product x Place.Country\r\n\r\nThis is a **two-levels** aggregate, as it is calculated by the combination of two levels.\r\n\r\n```cypher\r\nMATCH (prod:Product)\u003c-[]-(meas:Measure)\r\nMATCH (ct:Country)\u003c-[]-(meas:Measure)\r\nWITH DISTINCT prod, ct, SUM(meas.sales) AS SumSales, SUM(meas.units) As SumUnits\r\nCREATE (prod)\u003c-[:AGGREGATE_OF]-(a:Aggregate_0101x0 {sales: SumSales, units: SumUnits})-[:AGGREGATE_OF]-\u003e(ct);\r\n```\r\n\r\nOther aggregate creation commands can be found in file [aggregates.cypher](cypher/aggregates.cypher).\r\n\r\nThe creation of the two-levels aggregates gives the following meta-model:\r\n\r\n![](assets/model_aggregates_2levels_neo4j.png)\r\n\r\n*One-level aggregate nodes are coloured in grey, two-levels ones in green.*\r\n\r\nAll commands are consolidated in one single file [full_script.cypher](cypher/full_script.cypher).\r\n\r\n## Querying Aggregates\r\n\r\n[Aggregates](#aggregate-bitmask) can now be used in queries.\r\n\r\n### Query 1 - Sum of Sales for Violin in France\r\n\r\n```cypher\r\nMATCH (prod:Product {product: 'Violin'})\u003c-[]-(a:Aggregate_0101x0)-[]-\u003e(ct:Country {country: 'France'})\r\nRETURN\r\n    a.sales AS Sales,\r\n    a.units AS Units;\r\n```\r\n\r\n### Query 2 - Sum of Sales for Violin in 2016\r\n\r\n```cypher\r\nMATCH (prod:Product {product: 'Violin'})\u003c-[]-(a:Aggregate_01x010)-[]-\u003e(ct:Year {year: 2016})\r\nRETURN\r\n    a.sales AS Sales,\r\n    a.units AS Units;\r\n```\r\n\r\nIt doesn't necessarily make sens to create all possible aggregates.\r\n\r\nAggregate creation is based on a good understanding of business needs (i.e. how data is queried). Some OLAP engines such as [Microsoft SQL Server Analysis Services](https://www.microsoft.com/en-us/sql-server/business-intelligence) propose some tooling to help selecting good aggregate candidates based on previously executed queries. We could imagine implementing such a feature based on [Neo4j query log](https://neo4j.com/docs/operations-manual/current/monitoring/logging/query-logging/), but this is out of the scope of this workshop.\r\n\r\n## Conclusion\r\n\r\nThere are many other features that could be explored, such as:\r\n\r\n- [Schema discovery](https://docs.microsoft.com/en-us/sql/analysis-services/schema-rowsets/analysis-services-schema-rowsets).\r\n- [Custom-rollup](https://docs.microsoft.com/en-us/sql/analysis-services/multidimensional-models/parent-child-dimension-attributes-custom-rollup-operators).\r\n- [Writeback](https://docs.microsoft.com/en-us/sql/analysis-services/multidimensional-models/set-partition-writeback).\r\n- Benchmark.\r\n\r\nThis workshop was an opportunity to challenge (and learn) **Neo4j** and **Cypher** in an area it wasn't primarily designed for (even if we are dealing with lots of relationships).\r\n\r\nEach data engine is designed to address specific purposes, so we can't expect one being as feature-complete as another one for such different fields as **OLAP** and **Graphs**.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelcaradec%2FGraph-OLAP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichelcaradec%2FGraph-OLAP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelcaradec%2FGraph-OLAP/lists"}