https://github.com/sunilsoni/cassandra-data-modeling

Basic Rules of Cassandra Data Modeling
https://github.com/sunilsoni/cassandra-data-modeling

cassandra cassandra-cql data-modeling time-series-database

Last synced: 3 months ago
JSON representation

Basic Rules of Cassandra Data Modeling

Host: GitHub
URL: https://github.com/sunilsoni/cassandra-data-modeling
Owner: sunilsoni
Created: 2018-02-11T07:04:47.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-02-15T17:34:09.000Z (over 7 years ago)
Last Synced: 2025-03-26T08:06:50.770Z (4 months ago)
Topics: cassandra, cassandra-cql, data-modeling, time-series-database
Homepage: https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
Size: 72 MB
Stars: 56
Watchers: 7
Forks: 37
Open Issues: 0
Metadata Files:
- Readme: README.adoc

Awesome Lists containing this project

README

        = Cassandra Data Modeling

== https://wiki.apache.org/cassandra/DataModel[Introduction:]

Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.

The first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.

This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.

 

== https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling[Basic Rules of Cassandra Data Modeling]

 

==  https://youtu.be/B_HTdrTgGNs[Introduction To Apache Cassandra with Patrick McFadin] 

==  https://www.youtube.com/watch?v=tg6eIht-00M&t=2s[ Tech Talk: Cassandra Data Modeling TimeSeries with Patrick McFadin] 

==  https://www.youtube.com/watch?v=N2zIlVhKXTc&t=29s[ Introduction to Cassandra Data Model | Edureka]

== https://www.youtube.com/watch?v=Vv3QJxAdjic[Midwest.io 2014 - Time Series with Apache Cassandra - Patrick McFadin]

== https://www.youtube.com/watch?v=-zyZ35YyT_8[cassandra data modeling - Practical considerations @ netflix]

https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/schema/v001.cql.tmpl

Stackoverflow::

https://stackoverflow.com/questions/17945677/cassandra-uuid-vs-timeuuid-benefits-and-disadvantages[Cassandra UUID vs TimeUUID benefits and disadvantages?]

 

Sample Tables:

CREATE TABLE sensor_readings (

sensorID uuid,

time_bucket int,

timestamp bigint,

reading decimal,

PRIMARY KEY ((sensorID, time_bucket), timestamp)

) WITH CLUSTERING ORDER BY (timestamp DESC);

SELECT * FROM sensor_readings

WHERE sensorID = 53755080-4676-11e4-916c-0800200c9a66

AND time_bucket IN (1411840800, 1411844400)

AND timestamp >= 1411841700

AND timestamp <= 1411845300;

CREATE TABLE IF NOT EXISTS ${keyspace}.traces (

    trace_id        blob,

    span_id         bigint,

    span_hash       bigint,

    parent_id       bigint,

    operation_name  text,

    flags           int,

    start_time      bigint,

    duration        bigint,

    tags            list>,

    logs            list>,

    refs            list>,

    process         frozen,

    PRIMARY KEY (trace_id, span_id, span_hash)

)

    WITH compaction = {

        'compaction_window_size': '1', 

        'compaction_window_unit': 'HOURS', 

        'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'

    }

    AND dclocal_read_repair_chance = 0.0

    AND default_time_to_live = ${trace_ttl}

    AND speculative_retry = 'NONE'

    AND gc_grace_seconds = 10800; -- 3 hours of downtime acceptable on nodes

	

	

CREATE TABLE IF NOT EXISTS ${keyspace}.duration_index (

    service_name    text,      // service name

    operation_name  text,      // operation name, or blank for queries without span name

    bucket          timestamp, // time bucket, - the start_time of the given span rounded to an hour

    duration        bigint,    // span duration, in microseconds

    start_time      bigint,

    trace_id        blob,

    PRIMARY KEY ((service_name, operation_name, bucket), duration, start_time, trace_id)

) WITH CLUSTERING ORDER BY (duration DESC, start_time DESC)

    AND compaction = {

        'compaction_window_size': '1', 

        'compaction_window_unit': 'HOURS', 

        'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'

    }

    AND dclocal_read_repair_chance = 0.0

    AND default_time_to_live = ${trace_ttl}

    AND speculative_retry = 'NONE'

    AND gc_grace_seconds = 10800; -- 3 hours of downtime acceptable on nodes

	

	

	

Sequential writes can cause hot spots: If the application tends to write or

update a sequential block of rows at a time, the writes will not be distributed

across the cluster. They all go to one node. This is frequently a problem for

applications dealing with timestamped data

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sunilsoni/cassandra-data-modeling

Awesome Lists containing this project

README