{"id":20285447,"url":"https://github.com/mramshaw/python_cassandra","last_synced_at":"2026-04-07T20:31:23.932Z","repository":{"id":92905962,"uuid":"160065509","full_name":"mramshaw/Python_Cassandra","owner":"mramshaw","description":"Getting familiar with accessing Cassandra from Python","archived":false,"fork":false,"pushed_at":"2018-12-17T02:29:36.000Z","size":70,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-04T03:44:36.278Z","etag":null,"topics":["cassandra","cassandra-database","cassandra-driver","cql","cqlsh","database","docker","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mramshaw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-02T16:14:22.000Z","updated_at":"2018-12-17T02:29:38.000Z","dependencies_parsed_at":"2023-05-19T10:15:47.905Z","dependency_job_id":null,"html_url":"https://github.com/mramshaw/Python_Cassandra","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mramshaw/Python_Cassandra","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FPython_Cassandra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FPython_Cassandra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FPython_Cassandra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FPython_Cassandra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mramshaw","download_url":"https://codeload.github.com/mramshaw/Python_Cassandra/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FPython_Cassandra/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31528267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cassandra","cassandra-database","cassandra-driver","cql","cqlsh","database","docker","python"],"created_at":"2024-11-14T14:26:44.799Z","updated_at":"2026-04-07T20:31:23.908Z","avatar_url":"https://github.com/mramshaw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cassandra with Python\n\n![Cassandra](images/Cassandra.png)\n\nCassandra is a [NoSQL](http://en.wikipedia.org/wiki/NoSQL) database that originated at Facebook.\n\nCassandra is optimized for fast writes and fast reads over very large volumes of data.\n\nIn contrast with traditional databases that journal database changes and then write them to disk,\nCassandra journals database changes and then writes them to a ___write-back cache___ (also known\nas a ___write-behind cache___) - and only writes the cache to disk once the cache fills.\n\n```\n    Journal --\u003e Cache --\u003e Disk \n```\n\nThe Cassandra terms for these are the __commit log__, __Memtables__ and __SS Tables__ [which\nstands for ___Sorted String Tables___; these are sorted in row order and are immutable].\nThe database write is successful and returns once the data is written to the __Memtable__.\nHow this data gets written to disk and propagated then depends on the ___replication policy___\n(we will use simple replication).\n\nAs __SS Tables__ are immutable, deletes are handled via a logical delete indicator, which\nis referred to as a ___Tombstone___ in Cassandra. Compaction is used to remove logically\ndeleted records [the uncompacted original SS Table continues to exist until the JVM runs\n GC (garbage collection)].\n\nBy design, there is no single point of failure.\n\nIn terms of the [CAP or Brewer's theorem](http://en.wikipedia.org/wiki/Cap_theorem), Cassandra is an ___eventually-consistent___\ndatabase. This means that replicas of a row may have different versions of the data - but only for brief periods. The replicas\nwill __eventually__ be synchronized and become consistent (hence the term).\n\n![CAP and Cassandra](images/CAP_Cassandra.png)\n\n[This is a slight over-simplification, as Cassandra can be extensively tuned for performance/consistency.]\n\n## Motivation\n\nFamiliarization with `Cassandra` and `cql` with Python, using the [Datastax driver](http://datastax.github.io/python-driver/index.html).\n\nThis exercise follows on from my [Replicated Cassandra Database](http://github.com/mramshaw/Kubernetes/tree/master/Replicated%20Cassandra%20Database) exercise.\n\n## Contents\n\nThe content are as follows:\n\n* [Prerequisites](#prerequisites)\n* [Cassandra driver](#cassandra-driver)\n    * [Installation](#installation)\n    * [Verification](#verification)\n    * [Compression](#compression)\n    * [Metrics](#metrics)\n    * [Performance](#performance)\n* [Running Cassandra](#running-cassandra)\n    * [Run Cassandra with Docker](#run-cassandra-with-docker)\n    * [Run Cassandra with Python](#run-cassandra-with-python)\n* [Reference](#reference)\n* [Versions](#versions)\n* [To Do](#to-do)\n* [Credits](#credits)\n\n## Prerequisites\n\n* Python installed\n\n* `pip` installed\n\n## Cassandra driver\n\nThe installation of the Cassandra driver (for Python) is slightly involved.\n\nThere are also optional components (including non-Python components).\n\n#### Installation\n\nInstall the Cassandra driver as follows:\n\n    $ pip install --user cassandra-driver\n\nOr else:\n\n    $ pip install --user -r requirements.txt\n\n[This will also install some optional components, as discussed below.]\n\n#### Verification\n\nVerify installation as follows:\n\n```bash\n$ python -c 'import cassandra; print cassandra.__version__'\n3.16.0\n$\n```\n\nOr:\n\n```bash\n$ pip list --format=freeze | grep cassandra-driver\ncassandra-driver==3.16.0\n$\n```\n\n#### Compression\n\nOptionally, install `lz4` (gets installed with `cassandra-driver` if using `requirements.txt`):\n\n    $ pip install --user lz4\n\nVerify installation as follows:\n\n```bash\n$ python -c 'import lz4; print lz4.__version__'\n2.1.2\n$\n```\n\nOr:\n\n```bash\n$ pip list --format=freeze | grep lz4\n2.1.2\n$\n```\n\n#### Metrics\n\nOptionally, install `scales` (gets installed with `cassandra-driver` if using `requirements.txt`):\n\n    $ pip install --user scales\n\nThe driver has built-in support for capturing `Cluster.metrics` about the queries run. The scales library is required to support this.\n\n#### Performance\n\nOptionally, install `libev` for better performance.\n\nVerify the presence (or - as below - absence) of `libev` as follows:\n\n```bash\n$ python -c 'from cassandra.io.libevreactor import LibevConnection'\nTraceback (most recent call last):\n  File \"\u003cstring\u003e\", line 1, in \u003cmodule\u003e\n  File \"/home/owner/.local/lib/python2.7/site-packages/cassandra/io/libevreactor.py\", line 33, in \u003cmodule\u003e\n    \"The C extension needed to use libev was not found.  This \"\nImportError: The C extension needed to use libev was not found.  This probably means that you didn't have the required build dependencies when installing the driver.  See http://datastax.github.io/python-driver/installation.html#c-extensions for instructions on installing build dependencies and building the C extension.\n$\n```\n\nInstallation instructions are here:\n\n    http://datastax.github.io/python-driver/installation.html#libev-support\n\n[We will not be installing `libev`.]\n\n## Running Cassandra\n\nWe will test everything first with `Docker` and `cqlsh` and then we will use Python code to access our running Cassandra.\n\nTo make things clearer, pull the latest tagged `Cassandra` image, as follows:\n\n    $ docker pull cassandra:3.11.3\n\n[The current version is `3.11.3` as of this writing, but may change over time.]\n\n#### Run Cassandra with Docker\n\n[We will use Docker linking to expose Cassandra.]\n\nRun Cassandra as follows:\n\n    $ docker run --name python-cassandra cassandra:3.11.3\n\n[We could run this detached with the `-d` option, but then we would have to tail the log with `docker logs python-cassandra`.\n As it is, the log will be produced in this console, allowing us to watch both consoles at the same time.]\n\nIn another console, set up a current directory environment variable as follows:\n\n    $ export PWD=`pwd`\n\nRun `cqlsh` as follows:\n\n    $ docker run -it --link python-cassandra:cassandra --rm -v $PWD/cql:/cql cassandra:3.11.3 cqlsh cassandra -f /cql/users.cql\n\nIt should look more or less as follows:\n\n```bash\n$ docker run -it --link python-cassandra:cassandra --rm -v $PWD/cql:/cql cassandra:3.11.3 cqlsh cassandra -f /cql/users.cql\n\nCREATE TABLE k8s_test.users (\n    username text PRIMARY KEY,\n    password text\n) WITH bloom_filter_fp_chance = 0.01\n    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}\n    AND comment = ''\n    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}\n    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}\n    AND crc_check_chance = 1.0\n    AND dclocal_read_repair_chance = 0.1\n    AND default_time_to_live = 0\n    AND gc_grace_seconds = 864000\n    AND max_index_interval = 2048\n    AND memtable_flush_period_in_ms = 0\n    AND min_index_interval = 128\n    AND read_repair_chance = 0.0\n    AND speculative_retry = '99PERCENTILE';\n\n\n username | password\n----------+----------\n    Jesse |   secret\n    Frank | password\n\n(2 rows)\n$\n```\n\n[Note that Cassandra has defaulted a lot of the table values for us. Here the default Compaction Strategy is\n __Size-Tiered__, which seems appropriate for the current use case - where the records will be written once.]\n\nIn the event it looks as follows, Cassandra probably has not fully started (and it may be necessary to retry):\n\n```bash\n$ docker run -it --link python-cassandra:cassandra --rm -v $PWD/cql:/cql cassandra:3.11.3 cqlsh cassandra -f /cql/users.cql\nConnection error: ('Unable to connect to any servers', {'172.17.0.2': error(111, \"Tried connecting to [('172.17.0.2', 9042)]. Last error: Connection refused\")})\n$\n```\n\nNow we can kill Cassandra in the original console with Ctrl-C. Once it has stopped, remove `python-cassandra`:\n\n    $ docker rm python-cassandra\n\nClean up the data volumes as follows:\n\n    $ docker volume prune\n\n#### Run Cassandra with Python\n\n[We will use Docker port-mapping to expose Cassandra; port 9042 must be available on the local machine.]\n\nRun Cassandra as follows:\n\n    $ docker run --name python-cassandra -p 9042:9042 cassandra:3.11.3\n\nIn another console, set up a current directory environment variable as follows:\n\n    $ export PWD=`pwd`\n\nRun `cqlsh` to set up our keyspace and table as follows:\n\n    $ docker run -it --link python-cassandra:cassandra --rm -v $PWD/cql:/cql cassandra:3.11.3 cqlsh cassandra -f /cql/users.cql\n\n[This will leave our table empty.]\n\nRun command \u003ckbd\u003epython add_users.py\u003c/kbd\u003e to add some users. This should look like:\n\n```bash\n$ python add_users.py\n2018-12-16 21:19:34,667 [INFO] cassandra.policies: Using datacenter 'datacenter1' for DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes\n2018-12-16 21:19:34,721 [INFO] root: Created user: user_0\n2018-12-16 21:19:34,723 [INFO] root: Created user: user_1\n2018-12-16 21:19:34,725 [INFO] root: Created user: user_2\n2018-12-16 21:19:34,727 [INFO] root: Created user: user_3\n2018-12-16 21:19:34,728 [INFO] root: Created user: user_4\n2018-12-16 21:19:34,730 [INFO] root: Created user: user_5\n2018-12-16 21:19:34,731 [INFO] root: Created user: user_6\n2018-12-16 21:19:34,732 [INFO] root: Created user: user_7\n2018-12-16 21:19:34,733 [INFO] root: Created user: user_8\n2018-12-16 21:19:34,734 [INFO] root: Created user: user_9\n2018-12-16 21:19:34,734 [INFO] root: 10 users added\n$\n```\n\nRun command \u003ckbd\u003epython list_users.py\u003c/kbd\u003e to list some users. This should look like:\n\n```bash\n$ python list_users.py\n2018-12-16 21:26:35,618 [INFO] cassandra.policies: Using datacenter 'datacenter1' for DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes\nRow(username=u'user_7', password=u'password_7')\nRow(username=u'user_6', password=u'password_6')\nRow(username=u'user_1', password=u'password_1')\nRow(username=u'user_2', password=u'password_2')\nRow(username=u'user_4', password=u'password_4')\nRow(username=u'user_9', password=u'password_9')\nRow(username=u'user_3', password=u'password_3')\nRow(username=u'user_8', password=u'password_8')\nRow(username=u'user_5', password=u'password_5')\nRow(username=u'user_0', password=u'password_0')\n2018-12-16 21:26:35,654 [INFO] root: 10 users listed\n$\n```\n\n[Note that the users are listed in fairly random order. While the CQL Select statment does have an\n `Order By` clause, it does not have a run-time component and merely affects how indexes are read.]\n\nAnd kill Cassandra in the original console with Ctrl-C. Once it has stopped, remove `python-cassandra`:\n\n    $ docker rm python-cassandra\n\nFinally, clean up the data volumes as follows:\n\n    $ docker volume prune\n\n## Reference\n\nFor the details of using Cassandra with Docker:\n\n    http://hub.docker.com/_/cassandra/\n\nCassandra connection, Session and Cluster parameters (including defaults):\n\n    http://datastax.github.io/python-driver/api/cassandra/cluster.html\n\nMaterialized View Performance Penalty:\n\n    http://www.datastax.com/dev/blog/materialized-view-performance-in-cassandra-3-x\n\n[Materialized views seem to be a way of imposing a finer index on stored data. There is a performance penalty.]\n\n## Versions\n\n* Cassandra __3.11.3__\n* cassandra-driver __3.16.0__\n* lz4 __2.1.2__\n* pip __18.1__\n* python __2.7.12__\n* scales __1.0.9__\n\n## To Do\n\n- [x] Write Python code\n- [x] Replace print statements with logging\n- [ ] Investigate Cassandra Metrics with Python\n- [ ] More testing\n\n## Credits\n\nThere are many fine resources for learning Cassandra. The place to start is:\n\n    http://datastax.github.io/python-driver/getting_started.html\n\n[Well worth careful study for the sections on\n [type conversion](http://datastax.github.io/python-driver/getting_started.html#type-conversions),\n [consistency level](http://datastax.github.io/python-driver/getting_started.html#setting-a-consistency-level)\nand\n [prepared statements](http://datastax.github.io/python-driver/getting_started.html#id2).\n]\n\nAlso:\n\n    http://datastax.github.io/python-driver/installation.html\n\n[For the intricacies of installing the Python driver.]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmramshaw%2Fpython_cassandra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmramshaw%2Fpython_cassandra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmramshaw%2Fpython_cassandra/lists"}