{"id":27370154,"url":"https://github.com/datastaxdevs/workshop-cassandra-fundamentals","last_synced_at":"2025-07-25T01:11:35.260Z","repository":{"id":41823529,"uuid":"509521846","full_name":"datastaxdevs/workshop-cassandra-fundamentals","owner":"datastaxdevs","description":"Welcome to the Apache Cassandra™ Fundamentals workshop! In this two-hour workshop, we shows the most important fundamentals and basics of the powerful distributed NoSQL database Apache Cassandra™.","archived":false,"fork":false,"pushed_at":"2023-01-20T10:02:52.000Z","size":17528,"stargazers_count":19,"open_issues_count":0,"forks_count":10,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-07-30T19:34:04.361Z","etag":null,"topics":["astradb","cassandra","database","nosql","workshop"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datastaxdevs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-01T16:28:46.000Z","updated_at":"2023-07-19T09:01:17.000Z","dependencies_parsed_at":"2023-02-12T01:15:54.463Z","dependency_job_id":null,"html_url":"https://github.com/datastaxdevs/workshop-cassandra-fundamentals","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastaxdevs%2Fworkshop-cassandra-fundamentals","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastaxdevs%2Fworkshop-cassandra-fundamentals/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastaxdevs%2Fworkshop-cassandra-fundamentals/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastaxdevs%2Fworkshop-cassandra-fundamentals/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datastaxdevs","download_url":"https://codeload.github.com/datastaxdevs/workshop-cassandra-fundamentals/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248688190,"owners_count":21145762,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astradb","cassandra","database","nosql","workshop"],"created_at":"2025-04-13T08:48:08.008Z","updated_at":"2025-04-13T08:48:09.833Z","avatar_url":"https://github.com/datastaxdevs.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎓 Apache Cassandra™ Fundamentals\n\nWelcome to the **Apache Cassandra™ Fundamentals** workshop! In this two-hour workshop, we shows the most important fundamentals and basics of the powerful distributed `NoSQL database Apache Cassandra™`.\n\nUsing **Astra DB**, the cloud based _Cassandra-as-a-Service_ platform delivered by DataStax, we will cover the very first steps for every developer who wants to try to learn a new database: creating tables and CRUD operations. \n\n![](images/splash.png)\n\nIt doesn't matter if you join our workshop live or you prefer to do at your own pace, we have you covered. In this repository, you'll find everything you need for this workshop:\n\n\u003e [🔖 Accessing HANDS-ON](#-start-hands-on)\n\n## 📋 Table of contents\n\n\u003cimg src=\"https://github.com/datastaxdevs/workshop-cassandra-fundamentals/blob/main/images/cassandra_fundamentals.png?raw=true\" align=\"right\" width=\"300px\"/\u003e\n\n1. [Objectives](#1-objectives)\n2. [Frequently asked questions](#2-frequently-asked-questions)\n3. [Materials for the Session](#3-materials-for-the-session)\n4. [Create your Database](#4-create-your-astra-db-instance)\n5. [Create tables](#5-create-tables)\n6. [Execute CRUD operations](#6-execute-crud-operations)\n7. [Homework](#7-homework)\n8. [What's NEXT ](#8-whats-next-)\n\u003cp\u003e\u003cbr/\u003e\n\n## 1. Objectives\n\n1️⃣ **Give you an understanding and how and where to position Apache Cassandra™**\n\n2️⃣ **Give an overview of the NoSQL ecosystem and its rationale**\n\n3️⃣ **Provide an overview of Cassandra Architecture**\n\n4️⃣ **Make you create your first tables and run your first statements**\n\n🚀 **Have fun with an interactive session**\n\n## 2. Frequently asked questions\n\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e 1️⃣ Can I run this workshop on my computer?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\n\u003cp\u003eThere is nothing preventing you from running the workshop on your own machine. If you do so, you will need the following:\n\u003col\u003e\n\u003cli\u003e\u003cb\u003egit\u003c/b\u003e installed on your local system\n\u003c/ol\u003e\n\u003c/p\u003e\nIn this readme, we try to provide instructions for local development as well - but keep in mind that the main focus is development on Gitpod, hence \u003cstrong\u003ewe can't guarantee live support\u003c/strong\u003e about local development in order to keep on track with the schedule. However, we will do our best to give you the info you need to succeed.\n\u003c/details\u003e\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e 2️⃣ What other prerequisites are required?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\n\u003cul\u003e\n\u003cli\u003eYou will need enough *real estate* on screen, we will ask you to open a few windows and it would not fit on mobiles (tablets should be OK)\n\u003cli\u003eYou will need an Astra account: don't worry, we'll work through that in the following\n\u003cli\u003eAs \"Intermediate level\" we expect you to know what java and Spring are. \n\u003c/ul\u003e\n\u003c/p\u003e\n\u003c/details\u003e\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e 3️⃣ Do I need to pay for anything for this workshop?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\n\u003cb\u003eNo.\u003c/b\u003e All tools and services we provide here are FREE. FREE not only during the session but also after.\n\u003c/details\u003e\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e 4️⃣ Will I get a certificate if I attend this workshop?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\nAttending the session is not enough. You need to complete the homework detailed below and you will get a nice badge that you can share on linkedin or anywhere else *(open badge specification)*\n\u003c/details\u003e\n\u003cp/\u003e\n\n## 3. Materials for the Session\n\nIt doesn't matter if you join our workshop live or you prefer to work at your own pace,\nwe have you covered. In this repository, you'll find everything you need for this workshop:\n\n- [Slide deck](/slides/slides.pdf)\n- [Discord chat](https://dtsx.io/discord)\n- [Questions and Answers](https://community.datastax.com/)\n- [Twitch backup](https://www.twitch.tv/datastaxdevs)\n\n----\n\n# 🏁 Start Hands-on\n\n## 4. Create your Astra DB instance\n\n_**`ASTRA DB`** is the simplest way to run Cassandra with zero operations at all - just push the button and get your cluster. No credit card required, 40M read/write operations and about 80GB storage monthly for free - sufficient to run small production workloads. If you end your credits the databases will pause, no charge_\n\nLeveraging [Database creation guide](https://awesome-astra.github.io/docs/pages/astra/create-instance/#c-procedure) create a database. *Right-Click the button* with *Open in a new TAB.*\n\n\u003ca href=\"https://astra.dev/yt-01-11-23\"\u003e\u003cimg src=\"images/create_astra_db_button.png?raw=true\" /\u003e\u003c/a\u003e\n\n|Field|Value|\n|---|---|\n|**Database Name**| `workshops`|\n|**Keyspace Name**| `sensor_data`|\n|**Regions**| Select `GOOGLE CLOUD`, then an Area close to you, then a region with no LOCKER 🔒 icons, those are the region you can use for free.   |\n\n\u003e **ℹ️ Note:** If you already have a database `workshops`, simply add a keyspace `sensor_data` using the `Add Keyspace` button on the bottom right hand corner of db dashboard page.\n\nWhile the database is being created, you will also get a **Security token**:\nsave it somewhere safe, as it will be needed to later in other workshop (In particular the string starting with `AstraCS:...`.)\n\n\u003e **⚠️ Important**\n\u003e ```\n\u003e The instructor will show the token creation on screen,\n\u003e but will then destroy it immediately for security reasons.\n\u003e ```\n\nThe status will change from `Pending` to `Active` when the database is ready, this will only take 2-3 minutes. You will also receive an email when it is ready.\n\n[🏠 Back to Table of Contents](#-table-of-content)\n\n## 5. Create tables\n\nOk, now that you have a database created the next step is to create tables to work with. \n\n_General Methodology Notes_: We'll work with a (rather simplified) _Internet of things_ application where we'll be recording temperatures coming from a network of sensors.\n\n- `networks` identified by a unique name represent a region, an area where you find related infrastructure.\n\n#### ✅ Step 5a. Navigate to the CQL Console and login to the database\n\nIn the Summary screen for your database, select **_CQL Console_** from the top menu in the main window. This will take you to the CQL Console and automatically log you in.\n\n\u003cdetails\u003e\n    \u003csummary\u003eShow me! \u003c/summary\u003e\n    \u003cimg src=\"images/astra-cql-console.gif\" /\u003e\n\u003c/details\u003e\n\n\u003e _Note_: if you are working with your own Cassandra cluster (other than Astra DB), you will reach the CQL Console differently.\n\u003e Moreover, in that case you have to manually create the keyspace once in the CQL Console: this is done with a command similar to\n\u003e `CREATE KEYSPACE sensor_data WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3};`.\n\u003e See the Cassandra documentation for more details on this.\n\n#### ✅ Step 5b. Describe keyspaces and USE one of them\n\nOk, now we're ready to rock. Creating tables is quite easy, but before we create one we need to tell the database which keyspace we are working with.\n\nFirst, let's **_DESCRIBE_** all of the keyspaces that are in the database. This will give us a list of the available keyspaces.\n\n📘 **Command to execute**\n```sql\nDESC KEYSPACES;\n```\n_\"desc\" is short for \"describe\", either is valid._\n\n\u003e CQL commands usually end with a semicolon `;`. If you hit Enter and nothing happens -- you don't even get your prompt back -- most likely it's because you have not ended the command with `;`. If in trouble, you can always get back to the prompt with `Ctrl-C` and start typing the command anew.\n\n📗 **Expected output**\n\n![Keyspaces in CQL](images/cql/01_desc_keyspaces.png)\n\n\u003e ℹ️ Depending on your setup you might see a different set of keyspaces than in the image. The one we care about for now is **_sensor_data_**. From here, execute the **_USE_** command with the **_sensor_data_** keyspace to tell the database our context is within **_sensor_data_**.\n\n\u003e Take advantage of the TAB-completion in the CQL Console. Try typing `use sens` and then pressing TAB, for example.\n\n📘 **Command to execute**\n```sql\nUSE sensor_data;\n```\n\n📗 **Expected output**\n\n![USE keyspace](images/cql/02_use_sensor_data.png)\n\nNotice how the prompt displays ```\u003cusername\u003e@cqlsh:sensor_data\u003e``` informing us we are **using** the **_sensor_data_** keyspace. Now we are ready to create our tables.\n\n#### ✅ Step 5c. Create the `networks` table\n\nAt this point we can execute a command to create the **networks** table.\nJust copy/paste the following command into your CQL console at the prompt.\nTry to identify the primary key, the partition key and the clustering columns\n(if any) for this table in the command:\n\n📘 **Command to execute**\n\n```sql\nCREATE TABLE IF NOT EXISTS networks (\n  name        TEXT,\n  description TEXT,\n  region      TEXT,\n  PRIMARY KEY ((name))\n);\n```\n\nThen **_DESCRIBE_** your keyspace tables to ensure it is there.\n\n📘 **Command to execute**\n\n```sql\nDESC TABLES;\n```\n📗 **Expected output**\n\n![A table created](images/cql/03_networks_table_created.png)\n\nAaaand **BOOM**, you created a table in your database. That's it.\nNow let's go ahead and create a couple more tables before we do\nsomething interesting with the data.\n\n#### ✅ Step 5d. Create the tables for `sensors` and `temperatures`\n\n- A network will contain several `sensors`. Sensors are uniquely identified by their name, such as `s1001`. The design of our application is such that we need to be able to _retrieve all `sensors` for a given `network`, sorted by the sensor name_. \n\n- Next, for each sensor you want to be able to retrieve `temperatures` sorted by descending date.\n\n📘 **Command to execute**\n\n```sql\nCREATE TABLE IF NOT EXISTS sensors_by_network (\n  network         TEXT,\n  sensor          TEXT,\n  latitude        DECIMAL,\n  longitude       DECIMAL,\n  characteristics MAP\u003cTEXT,TEXT\u003e,\n  PRIMARY KEY ((network),sensor)\n);\n\nCREATE TABLE IF NOT EXISTS temperatures_by_sensor_bad (\n  sensor TEXT,\n  timestamp TIMESTAMP,\n  value FLOAT,\n  PRIMARY KEY ((sensor),timestamp)\n) WITH CLUSTERING ORDER BY (timestamp DESC);\n```\n\n- `networks` to `sensors` is a one-to-many relationship yet there is no integrity constraint. This is on you, at application level to ensure the coherence. \n\n- You should notice than sensors are grouped by network (as the name stated. The partition key `network` groups all sensors for a given network on the same Cassandra node meaning a request with network in the where clause will access a single node.\n\n- `sensors` to `temperatures` is a also a one-to-many relation. Every temperature for a sensor will be saved in the same partition.\n\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eThis table has a major issue... can you guess what it is?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\nThe SIZE. The more the sensors capture information the bigger the partitions become. There is a good practice rule stating that the upper limit for a partition is 100MB or 100k records. You need to split values across multiple partitions. This technique is called \u003ci\u003ebucketing.\u003c/i\u003e\n\u003c/details\u003e\n\u003cp/\u003e\n\n📘 **Command to execute**\n\n```sql\nDROP TABLE  temperatures_by_sensor_bad;\n\nCREATE TABLE temperatures_by_sensor (\n  sensor TEXT,\n  date DATE,\n  timestamp TIMESTAMP,\n  value FLOAT,\n  PRIMARY KEY ((sensor, date),timestamp)\n) WITH CLUSTERING ORDER BY (timestamp DESC);\n```\n\n\u003e ℹ️ _Dropping a table can lead to a timeout in the user interface, do not worry, it is not harmful: the table is effectively deleted under the hood._\n\n**_DESCRIBE_** your keyspace tables: you should see all three listed.\n\n📘 **Command to execute**\n\n```sql\nDESC TABLES;\n```\n\n📗 **Expected output**\n\n![A table created](images/cql/04_post_tables_created.png)\n\nAnd tables list:\n\n![A table created](images/cql/04_post_tables_created_2.png)\n\nYou may wonder, how did we arrive at this particular structure for the `sensors_by_network` and `temperatures_by_sensors` tables ?\n\nThe answer lies in the methodology for data modeling\nwith Cassandra, which, at its very core, states: **first look at application's needs, determine the required workflows, then map them to a number of queries, finally design a table around each query**.\n\n- We create table `sensors_by_network` to support a query such as _\"get all sensors for a network `X`\"_\n\n- We create table `temperatures_by_sensors` to support a query such as _\"get all temperatures for a sensor `Y`\"_\n\n[🏠 Back to Table of Contents](#table-of-contents)\n\n## 6. Execute CRUD operations\n\nCRUD stands for \"**create, read, update, and delete**\". Simply put, they are the basic types of commands you need to work with ANY database in order to maintain data for your applications.\n\n#### ✅ Step 6a. (C)RUD = create = insert data, users\n\nOur tables are in place so let's put some data in them. This is done with the **INSERT** statement. We'll start by inserting 2 rows into the **_networks_** table.\n\nCopy and paste the following in your CQL Console:\n_(Once you have carefully examined the first of the following **INSERT** statements below, you can simply copy/paste the others which are very similar.)_\n\n📘 **Commands to execute**\n\n```sql\nINSERT INTO networks (name,description,region)\nVALUES ('forest-net',\n        'forest fire detection network',\n        'south');\n\nINSERT INTO networks (name,description,region)\nVALUES ('volcano-net',\n        'volcano monitoring network',\n        'north');   \n```\n\n#### ✅ Step 6b. (C)RUD = create = insert data, posts\n\nLet's run some more **INSERT** statements, this time for **sensors**. We'll insert data into the **_sensors_by_network_** table.\n\n_(Once you have carefully examined the first of the following **INSERT** statements below, you can simply copy/paste the others which are very similar.)_\n\n\u003e _Note_: in the following, we are using `MAP\u003c\u003e` which lets you define you our key/value mapping, thereby adding a bit of flexibility -- Cassandra Data models are strongly typed.\n\n📘 **Commands to execute**\n\n```sql\nINSERT INTO sensors_by_network \n(network,sensor,latitude,longitude,characteristics)\nVALUES ('forest-net','s1001',30.526503,-95.582815,\n       {'accuracy':'medium','sensitivity':'high'});\nINSERT INTO sensors_by_network \n(network,sensor,latitude,longitude,characteristics)\nVALUES ('forest-net','s1002',30.518650,-95.583585,\n       {'accuracy':'medium','sensitivity':'high'});     \nINSERT INTO sensors_by_network \n(network,sensor,latitude,longitude,characteristics)\nVALUES ('forest-net','s1003',30.515056,-95.556225,\n       {'accuracy':'medium','sensitivity':'high'});     \nINSERT INTO sensors_by_network \n(network,sensor,latitude,longitude,characteristics)\nVALUES ('volcano-net','s2001',44.460321,-110.828151,\n       {'accuracy':'high','sensitivity':'medium'});    \nINSERT INTO sensors_by_network \n(network,sensor,latitude,longitude,characteristics)\nVALUES ('volcano-net','s2002',44.463195,-110.830124,\n       {'accuracy':'high','sensitivity':'medium'});    \n```\n\nOk, we have a lovely bunch of sensors in our application.\n\nNow let's add temperature measurements in table **_temperatures_by_sensors_** as well! Let's do it with the following command (please note that the `INSERT` statements are similar to the ones seen above, with different columns and table name):\n\n\u003e _Note_: In a relational database you may have use a join on 3 tables `Networks \u003e Sensors \u003e Temperatures`. In the following, we are putting back the network name in temperature table and this is because it will be required in the where clause.\n\n📘 **Commands to execute**\n\n```sql\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-04','2020-07-04 00:00:01',80);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-04','2020-07-04 00:59:59',79);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-04','2020-07-04 12:00:01',97);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-04','2020-07-04 12:59:59',98);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-04','2020-07-04 00:00:01',82);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-04','2020-07-04 00:59:59',80);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-04','2020-07-04 12:00:01',100);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-04','2020-07-04 12:59:59',100);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-04','2020-07-04 00:00:01',81);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-04','2020-07-04 00:59:59',80);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-04','2020-07-04 12:00:01',99);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-04','2020-07-04 12:59:59',98);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-05','2020-07-05 00:00:01',81);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-05','2020-07-05 00:59:59',80);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-05','2020-07-05 12:00:01',98);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-05','2020-07-05 12:59:59',99);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-05','2020-07-05 00:00:01',82);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-05','2020-07-05 00:59:59',82);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-05','2020-07-05 12:00:01',100);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-05','2020-07-05 12:59:59',99);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-05','2020-07-05 00:00:01',83);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-05','2020-07-05 00:59:59',82);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-05','2020-07-05 12:00:01',101);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-05','2020-07-05 12:59:59',102);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-06','2020-07-06 00:00:01',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-06','2020-07-06 00:59:59',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-06','2020-07-06 12:00:01',106);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1001','2020-07-06','2020-07-06 12:59:59',107);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-06','2020-07-06 00:00:01',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-06','2020-07-06 00:59:59',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-06','2020-07-06 12:00:01',108);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-06','2020-07-06 12:59:59',110);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-06','2020-07-06 00:00:01',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-06','2020-07-06 00:59:59',90);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-06','2020-07-06 12:00:01',1315);\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1003','2020-07-06','2020-07-06 12:59:59',1429);\n```\n\n#### ✅ Step 6c. C(R)UD = read = read data\n\nNow that we've inserted a set of rows (two sets, to be precise), let's take a look at how to read the data back out. This is done with a **SELECT** statement. In its simplest form we could just execute a statement like the following **_**cough_** **_**cough_**:\n\n```sql\nSELECT * FROM networks;\n```\n\n```\n name        | description                   | region\n-------------+-------------------------------+--------\n  forest-net | forest fire detection network |  south\n volcano-net |    volcano monitoring network |  north\n```\n\nor\n\n```sql\nSELECT * FROM sensors_by_network;\n```\n\n📗 **Expected output**\n\n```\ntoken@cqlsh:sensor_data\u003e SELECT * FROM sensors_by_network;\n\n network     | sensor | characteristics                               | latitude  | longitude\n-------------+--------+-----------------------------------------------+-----------+-------------\n  forest-net |  s1001 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.526503 |  -95.582815\n  forest-net |  s1002 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.518650 |  -95.583585\n  forest-net |  s1003 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.515056 |  -95.556225\n volcano-net |  s2001 | {'accuracy': 'high', 'sensitivity': 'medium'} | 44.460321 | -110.828151\n volcano-net |  s2002 | {'accuracy': 'high', 'sensitivity': 'medium'} | 44.463195 | -110.830124\n```\n\nYou may have noticed my coughing fit a moment ago. Even though you can execute a **SELECT** statement with no partition key defined, this is NOT something you should do when using Apache Cassandra. We are doing it here for illustration purposes only and because our whole dataset is just a handful of values.\n\nGiven the data we inserted earlier, a more proper statement would be something like (while we are at it, we also explicitly specify which columns we want back):\n\n```sql\nSELECT sensor, characteristics, latitude, longitude \nFROM sensors_by_network\nWHERE network = 'forest-net';\n```\n\n📗 **Expected output**\n\n```\ntoken@cqlsh:sensor_data\u003e SELECT sensor, characteristics, latitude, longitude\n               ... FROM sensors_by_network\n               ... WHERE network = 'forest-net';\n\n sensor | characteristics                               | latitude  | longitude\n--------+-----------------------------------------------+-----------+------------\n  s1001 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.526503 | -95.582815\n  s1002 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.518650 | -95.583585\n  s1003 | {'accuracy': 'medium', 'sensitivity': 'high'} | 30.515056 | -95.556225\n ```\n\nThe key is to ensure we are **always selecting by some partition key** at a minimum, so to avoid the dreaded _full-cluster scans_ which yield performances that are generally unacceptable in production.\n\nOk, with that out of the way we can **READ** the data from the other table as well - remember we **INSERT**ed on both tables?\n\n📘 **Commands to execute**\n\n```sql\nSELECT * FROM temperatures_by_sensor;\n\nSELECT timestamp, value \nFROM temperatures_by_sensor\nWHERE sensor='s1002' \nAND date='2020-07-05';\n```\n\n(again, in the second **SELECT** we specify some columns - it is something we may want to do in most cases).\n\n\n📗 **Expected output**\n\n```\ntoken@cqlsh:sensor_data\u003e select timestamp, value from temperatures_by_sensor where sensor='s1002' and DATE='2020-07-05';\n\n timestamp                       | value\n---------------------------------+-------\n 2020-07-05 12:59:59.000000+0000 |    99\n 2020-07-05 12:00:01.000000+0000 |   100\n 2020-07-05 00:59:59.000000+0000 |    82\n 2020-07-05 00:00:01.000000+0000 |    82\n```\n\nOnce you execute the above **SELECT** statements you should see something like the expected output above. We have now **READ** the data we **INSERTED** earlier. Awesome job!\n\n📘 **Commands to execute**\n\n```sql\nSELECT * FROM temperatures_by_sensor\nWHERE sensor='s1002';\n```\n\n📗 **Expected output**\n\nThis is a surprise. \n\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e Can you explain the error message ?\u003c/b\u003e\u003c/summary\u003e\nAs you did not provide the full partition key (2 columns) Cassandra needs to perform a full scan of your cluster (request on every node).\n\nIt will be bad, it will be ugly, it will be your fault. Always code your applications as if the DBA was a serial killer and he knows your address.\n\u003c/p\u003e\n\u003c/details\u003e\n\u003cp/\u003e\n\n#### ✅ Step 6d. CR(U)D = update = update data\n\nAt this point we've **_CREATED_** and **_READ_** some data, but what happens when you want to change some existing data to some new value? That's where **UPDATE** comes into play.\n\n_The use case is as follows: We notice the sensor was not correctly calibrated and the data needs to be updated._\n\nLet's take one of the records we created earlier and modify it. Recall that we **_INSERTED_** the following record in the **_temperatures_by_sensors_** table.\n\n```sql\n// For reference\nINSERT INTO temperatures_by_sensor \n(sensor,date,timestamp,value)\nVALUES ('s1002','2020-07-05','2020-07-05 00:00:01', 82);\n```\n\n\u003e Let's say that at this particular moment the temperature was not 82 but 92 (Climate change ...).\n\nLooking at ```PRIMARY KEY ((sensor, date), timestamp)```, we know that  **sensor**, **date** and **timestamp** are used to define uniqueness of the row. We'll need all of them to update our record (plus, of course, some of the data columns, otherwise we are not changing anything in that row!).\n\n📘 **Commands to execute**\n\n```sql\nUPDATE temperatures_by_sensor \nSET value = 92\nWHERE sensor = 's1002'\nAND date = '2020-07-05'\nAND timestamp = '2020-07-05 00:00:01';\n\nSELECT *\nFROM temperatures_by_sensor \nWHERE sensor='s1002' \nAND DATE='2020-07-05';\n```\n\n📗 **Expected output**\n\n```\ntoken@cqlsh:sensor_data\u003e select *  from temperatures_by_sensor where sensor='s1002' and DATE='2020-07-05';\n\n sensor | date       | timestamp                       | value\n--------+------------+---------------------------------+-------\n  s1002 | 2020-07-05 | 2020-07-05 12:59:59.000000+0000 |    99\n  s1002 | 2020-07-05 | 2020-07-05 12:00:01.000000+0000 |   100\n  s1002 | 2020-07-05 | 2020-07-05 00:59:59.000000+0000 |    82\n  s1002 | 2020-07-05 | 2020-07-05 00:00:01.000000+0000 |    92\n\n(4 rows)\ntoken@cqlsh:sensor_data\u003e \n```\n\n\u003e *Note*: you could also achieve the same result with another `INSERT` statement,\n\u003e which will simply overwrite the previous values if the partition key is the same.\n\u003e This is because Cassandra _does not read before writing_, i.e. updates are inserts!\n\n#### ✅ Step 6e. CRU(D) = delete = remove data\n\nThe final operation from our **CRUD** acronym is **DELETE**. This is the operation we use when we want to remove data from the database.\nIn Apache Cassandra you can **DELETE** from the cell level all the way up to the partition\n_(meaning I could remove a single column in a single row or I could remove a whole partition)_ using the same **DELETE** command.\n\n_Generally speaking, it's best to perform as few delete operations as possible on the largest amount of data. Think of it this way, if you want to delete ALL data in a table, don't delete each individual cell, just **TRUNCATE** the table. If you need to delete all the rows in a partition, don't delete each row, **DELETE** the partition, and so on._\n\nWhen deleting a row on a given table, we have to specify the values of the primary key for that table. _(And don't forget\nthat, if your data model has the same information stored twice in different tables, it will be up to you to\nissue two different **DELETE** operations!)_\n\n📘 **Commands to execute**\n\n- Partition level delete\n\n```sql\n// Get a partition\nSELECT *  FROM temperatures_by_sensor \nWHERE sensor='s1002' \nAND date='2020-07-05';\n\n// Delete at Partition level\nDELETE FROM temperatures_by_sensor\nWHERE sensor='s1002' \nAND date='2020-07-05';\n\n// Read again\nSELECT *  from temperatures_by_sensor \nWHERE sensor='s1002' \nAND date='2020-07-05';\n```\n\n- Row-level delete\n\n```sql\n// Get a partition\nSELECT *  from temperatures_by_sensor \nWHERE sensor='s1002' \nAND date='2020-07-04';\n\n// Delete at Row level\nDELETE FROM temperatures_by_sensor\nwhere sensor='s1002' \nAND date='2020-07-04' \nAND timestamp='2020-07-04 00:00:01.000000+0000';\n\n// Read again\nSELECT *  from temperatures_by_sensor \nWHERE sensor='s1002' \nAND date='2020-07-04';\n```\n\n(Notice in the above, for your convenience, we read the tables, then delete the rows, then read them again).\n\n📗 **Expected output**\n\n![Deleting in CQL](images/cql/07_deleting.png)\n\nNotice the rows are now removed from both tables: it is as simple as that.\n\n#### ✅ Step 6f. Design\n\n```\nWhat is the table we need in order to:\n  - find hourly average temperatures ...\n  - for every sensor ...\n  - in a specified network ...\n  - for a given date range ?\nHow can you do that?\n```\n\nMaybe you select every sensors...\n\n- ...then for every sensors you select the list of temperatures...\n\n- ...but you could do the latest queries in parallel doing map reduce\n\nMaybe you can query all temperatures and then filter by network...\n\n- ... but you will need to add this column network....\n\n- ....\n\n- ....\n\n`STOP IT !!!!`\n\nWith Cassandra for a new request, you create a new table, even if its mean duplicating the data. I think you got it `^_^`\n\n\u003cp/\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e Can you find what the table looks like ?\u003c/b\u003e\u003c/summary\u003e\n\u003chr\u003e\n\u003cp\u003e\n\u003cpre\u003e\nCREATE TABLE temperatures_by_network (\n  network TEXT,\n  week DATE,\n  date_hour TIMESTAMP,\n  sensor TEXT,\n  avg_temperature FLOAT,\n  latitude DECIMAL,\n  longitude DECIMAL,\n  PRIMARY KEY ((network,week),date_hour,sensor)\n) WITH CLUSTERING ORDER BY (date_hour DESC, sensor ASC);\n\u003c/pre\u003e\n\u003c/details\u003e\n\u003cp/\u003e\n\n\n\n## 7. Homework\n\nTo submit the **homework**, please take a screenshot of the CQL Console showing the rows in tables\n`temperatures_by_sensor` and `sensors_by_network` before _and_ after executing the DELETE statements.\n\nYou should also complete two mini-courses (a few minutes each) about using CQL and designing tables:\n- Complete the mini-course [Cassandra Query Language](https://www.datastax.com/learn/cassandra-fundamentals/cql) and take a screenshot of the final screen (\"Congratulations!\" on the left + console output on the right).\n- Complete the mini-course [\"Cassandra Data Modeling / Digital Library\"](https://www.datastax.com/learn/data-modeling-by-example/digital-library-data-model) (link for hands-on at the bottom of the lessons). Take a screenshot of the final screen (\"Congratulations!\" on the left + console output on the right).\n\nDon't forget to [submit your homework](https://dtsx.io/homework-intro-to-cassandra) and be awarded a nice verified badge!\n\n## 8. What's NEXT ?\n\nWe've just scratched the surface of what you can do using Astra DB, built on Apache Cassandra.\nGo take a look at [DataStax for Developers](https://www.datastax.com/dev) to see what else is possible.\nThere's plenty to dig into!\n\nCongratulations: you made to the end of today's workshop.\n\nDon't forget to [submit your homework](https://dtsx.io/homework-intro-to-cassandra) and be awarded a nice verified badge!\n\n![Badge](images/badge/intro-to-cassandra.png)\n\n**... and see you at our next workshop!**\n\n\u003e Sincerely yours, The DataStax Developers\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatastaxdevs%2Fworkshop-cassandra-fundamentals","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatastaxdevs%2Fworkshop-cassandra-fundamentals","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatastaxdevs%2Fworkshop-cassandra-fundamentals/lists"}