{"id":20399307,"url":"https://github.com/sunilsoni/cassandra-data-modeling","last_synced_at":"2025-04-12T13:22:33.617Z","repository":{"id":128930338,"uuid":"121095795","full_name":"sunilsoni/Cassandra-Data-Modeling","owner":"sunilsoni","description":"Basic Rules of Cassandra Data Modeling","archived":false,"fork":false,"pushed_at":"2018-02-15T17:34:09.000Z","size":75456,"stargazers_count":56,"open_issues_count":0,"forks_count":37,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-26T08:06:50.770Z","etag":null,"topics":["cassandra","cassandra-cql","data-modeling","time-series-database"],"latest_commit_sha":null,"homepage":"https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunilsoni.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-11T07:04:47.000Z","updated_at":"2025-03-03T07:22:00.000Z","dependencies_parsed_at":"2023-03-29T00:34:50.537Z","dependency_job_id":null,"html_url":"https://github.com/sunilsoni/Cassandra-Data-Modeling","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunilsoni%2FCassandra-Data-Modeling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunilsoni%2FCassandra-Data-Modeling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunilsoni%2FCassandra-Data-Modeling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunilsoni%2FCassandra-Data-Modeling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunilsoni","download_url":"https://codeload.github.com/sunilsoni/Cassandra-Data-Modeling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248571991,"owners_count":21126559,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cassandra","cassandra-cql","data-modeling","time-series-database"],"created_at":"2024-11-15T04:28:19.626Z","updated_at":"2025-04-12T13:22:33.608Z","avatar_url":"https://github.com/sunilsoni.png","language":null,"readme":"= Cassandra Data Modeling\n\n== https://wiki.apache.org/cassandra/DataModel[Introduction:]\n\nCassandra is a partitioned row store, where rows are organized into tables with a required primary key.\n\nThe first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.\n\nThis allows pervasive denormalization to \"pre-build\" resultsets at update time, rather than doing expensive joins across the cluster.\n \n\n== https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling[Basic Rules of Cassandra Data Modeling]\n \n\n==  https://youtu.be/B_HTdrTgGNs[Introduction To Apache Cassandra with Patrick McFadin] \n\n\n==  https://www.youtube.com/watch?v=tg6eIht-00M\u0026t=2s[ Tech Talk: Cassandra Data Modeling TimeSeries with Patrick McFadin] \n\n\n==  https://www.youtube.com/watch?v=N2zIlVhKXTc\u0026t=29s[ Introduction to Cassandra Data Model | Edureka]\n\n== https://www.youtube.com/watch?v=Vv3QJxAdjic[Midwest.io 2014 - Time Series with Apache Cassandra - Patrick McFadin]\n\n== https://www.youtube.com/watch?v=-zyZ35YyT_8[cassandra data modeling - Practical considerations @ netflix]\n\nhttps://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/schema/v001.cql.tmpl\n\n\nStackoverflow::\nhttps://stackoverflow.com/questions/17945677/cassandra-uuid-vs-timeuuid-benefits-and-disadvantages[Cassandra UUID vs TimeUUID benefits and disadvantages?]\n \nSample Tables:\n\nCREATE TABLE sensor_readings (\nsensorID uuid,\ntime_bucket int,\ntimestamp bigint,\nreading decimal,\nPRIMARY KEY ((sensorID, time_bucket), timestamp)\n) WITH CLUSTERING ORDER BY (timestamp DESC);\n\nSELECT * FROM sensor_readings\nWHERE sensorID = 53755080-4676-11e4-916c-0800200c9a66\nAND time_bucket IN (1411840800, 1411844400)\nAND timestamp \u003e= 1411841700\nAND timestamp \u003c= 1411845300;\n\n\nCREATE TABLE IF NOT EXISTS ${keyspace}.traces (\n    trace_id        blob,\n    span_id         bigint,\n    span_hash       bigint,\n    parent_id       bigint,\n    operation_name  text,\n    flags           int,\n    start_time      bigint,\n    duration        bigint,\n    tags            list\u003cfrozen\u003ckeyvalue\u003e\u003e,\n    logs            list\u003cfrozen\u003clog\u003e\u003e,\n    refs            list\u003cfrozen\u003cspan_ref\u003e\u003e,\n    process         frozen\u003cprocess\u003e,\n    PRIMARY KEY (trace_id, span_id, span_hash)\n)\n    WITH compaction = {\n        'compaction_window_size': '1', \n        'compaction_window_unit': 'HOURS', \n        'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'\n    }\n    AND dclocal_read_repair_chance = 0.0\n    AND default_time_to_live = ${trace_ttl}\n    AND speculative_retry = 'NONE'\n    AND gc_grace_seconds = 10800; -- 3 hours of downtime acceptable on nodes\n\t\n\t\nCREATE TABLE IF NOT EXISTS ${keyspace}.duration_index (\n    service_name    text,      // service name\n    operation_name  text,      // operation name, or blank for queries without span name\n    bucket          timestamp, // time bucket, - the start_time of the given span rounded to an hour\n    duration        bigint,    // span duration, in microseconds\n    start_time      bigint,\n    trace_id        blob,\n    PRIMARY KEY ((service_name, operation_name, bucket), duration, start_time, trace_id)\n) WITH CLUSTERING ORDER BY (duration DESC, start_time DESC)\n    AND compaction = {\n        'compaction_window_size': '1', \n        'compaction_window_unit': 'HOURS', \n        'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'\n    }\n    AND dclocal_read_repair_chance = 0.0\n    AND default_time_to_live = ${trace_ttl}\n    AND speculative_retry = 'NONE'\n    AND gc_grace_seconds = 10800; -- 3 hours of downtime acceptable on nodes\n\t\n\t\n\t\nSequential writes can cause hot spots: If the application tends to write or\nupdate a sequential block of rows at a time, the writes will not be distributed\nacross the cluster. They all go to one node. This is frequently a problem for\napplications dealing with timestamped data","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunilsoni%2Fcassandra-data-modeling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunilsoni%2Fcassandra-data-modeling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunilsoni%2Fcassandra-data-modeling/lists"}