{"id":21393140,"url":"https://github.com/silviucpp/erlcass","last_synced_at":"2025-04-06T20:08:09.742Z","repository":{"id":33215180,"uuid":"36858111","full_name":"silviucpp/erlcass","owner":"silviucpp","description":"High-Performance Erlang Cassandra driver based on DataStax cpp-driver","archived":false,"fork":false,"pushed_at":"2025-03-01T21:37:55.000Z","size":606,"stargazers_count":76,"open_issues_count":3,"forks_count":31,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-30T19:04:45.172Z","etag":null,"topics":["cassandra","cql","driver","erlang","high-performance"],"latest_commit_sha":null,"homepage":"http://silviucpp.github.io/erlcass","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/silviucpp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-06-04T08:30:49.000Z","updated_at":"2025-03-01T21:37:59.000Z","dependencies_parsed_at":"2024-01-12T01:06:20.424Z","dependency_job_id":"2c973281-4d37-42e1-8dcb-e95f2c1fb664","html_url":"https://github.com/silviucpp/erlcass","commit_stats":{"total_commits":354,"total_committers":13,"mean_commits":27.23076923076923,"dds":"0.24293785310734461","last_synced_commit":"d6dbc283ff549bd471767177c34fabf584408269"},"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlcass","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlcass/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlcass/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlcass/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/silviucpp","download_url":"https://codeload.github.com/silviucpp/erlcass/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543587,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cassandra","cql","driver","erlang","high-performance"],"created_at":"2024-11-22T14:09:01.800Z","updated_at":"2025-04-06T20:08:09.722Z","avatar_url":"https://github.com/silviucpp.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ErlCass\n\n[![Build Status](https://app.travis-ci.com/silviucpp/erlcass.svg?branch=master)](https://travis-ci.com/github/silviucpp/erlcass)\n[![GitHub](https://img.shields.io/github/license/silviucpp/erlcass)](https://github.com/silviucpp/erlcass/blob/master/LICENSE)\n[![Hex.pm](https://img.shields.io/hexpm/v/erlcass)](https://hex.pm/packages/erlcass)\n\n*An Erlang Cassandra driver, based on [DataStax cpp driver][1] focused on performance.*\n\n### Note for v4.0.0\n\n- Starting with `erlcass` version v4.x the native driver is based on Datastax cpp-driver \u003e 2.10.0 which is a massive \nrelease that includes many new features as well as architectural and performance improvements. \n\n- Some cluster configs were removed while other configs were added. For more info please see the [Changelog][5].\n- This new version adds support for speculative execution: For certain applications it is of the utmost importance to \nminimize latency. Speculative execution is a way to minimize latency by preemptively executing several instances of \nthe same query against different nodes. The fastest response is then returned to the client application, and the other \nrequests are cancelled. Speculative execution is disabled by default. (see `speculative_execution_policy`)\n\n### Update from 2.x to 3.0\n\nThis update breaks the compatibility with the other versions. All query results will return in case of success:\n- `ok` instead `{ok, []}` for all DDL and DML queries (because they never returns any column or row)\n- `{ok, Columns, Rows}` instead `{ok, Rows}`, where also each row is returned as a list not as a tuple as was before.\n\n### Implementation note\n\n#### How ErlCass affects the Erlang schedulers\n\nIt's well-known that NIF's can affect the Erlang schedulers performances in case the functions are not returning in less\nthan 1-2 ms and blocks the threads.\n\nBecause the DataStax cpp driver is async, `ErlCass` won't block the scheduler threads and all calls to the native\nfunctions will return immediately. The DataStax driver use its own thread pool for managing the requests.\nAlso, the responses are received on these threads and sent back to Erlang calling processes using `enif_send` in\nan async manner.\n\n#### Features\n\nList of supported features:\n\n- Asynchronous API\n- Synchronous API\n- Simple, Prepared, and Batch statements\n- [Avoid undesired tombstone while null binding][10] (only on protocol 4 or newer).\n- Paged queries\n- Asynchronous I/O, parallel execution, and request pipelining\n- Connection pooling\n- Automatic node discovery\n- Automatic reconnection\n- Configurable load balancing\n- Works with any cluster size\n- Authentication\n- SSL\n- Latency-aware routing\n- Performance metrics\n- Tuples and UDTs\n- Nested collections\n- Retry policies\n- Support for materialized view and secondary index metadata\n- Support for clustering key order, `frozen\u003c\u003e` and Cassandra version metadata\n- Reverse DNS with SSL peer identity verification support\n- Randomized contact points\n- Speculative execution\n\nMissing features from Datastax driver can be found into the [Todo List][9].\n\n#### Benchmark comparing with other drivers\n\nThe benchmark (`benchmarks/benchmark.erl`) is spawning N processes that will send a total of X request using the async\napi's and then waits to read X responses. In `benchmarks/benchmark.config` you can find the config's for every driver\nused in tests. During test in case of unexpected results from driver will log errors in console.\n\nTo run the benchmark yourself you should do:\n\n- change the cluster ip in `benchmark.config` for all drivers\n- run `make setup_benchmark` (this will compile the app using the bench profile and create the necessary schema)\n- use `make benchmark` as described above\n\nThe following test was run on a Ubuntu 16.04 LTS (Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz 4 cores) and the cassandra cluster was running on other 3\nphysical machines in the same LAN. The schema is created using `prepare_load_test_table` from `benchmarks/load_test.erl`.\nBasically the schema contains all possible data types and the query is based on a primary key (will return the same\nrow all the time which is fine because we test the driver performances and not the server one)\n\nTo create schema:\n\n```erlang\nmake setup_benchmark\n```\n\nTo run the benchmark:\n\n```erlang\nmake benchmark MODULE=erlcass PROCS=100 REQ=100000\n```\n\nWhere:\n\n- `MODULE`: the driver used to benchmark. Can be one of : `erlcass` or `marina`\n- `PROCS`: the number or erlang processes used to send the requests (concurrency level). Default 100.\n- `REQ`: the number of requests to be sent. Default 100000.\n\nThe results for 100 concurrent processes that sends 100k queries. Picked the average time from 3 runs:\n\n| cassandra driver     | Time (ms) | Req/sec  |\n|:--------------------:| ---------:|---------:|\n| [erlcass][8] v4.0.0    | 947     | 105544   |\n| [marina][7] 0.3.5    | 2360      | 42369    |\n\n\n#### Changelog\n\nChangelog is available [here][5].\n\n### Getting started:\n\nThe application is compatible with both `rebar` or `rebar3`.\n\nIn case you receive any error related to compiling of the DataStax driver you can try to run `rebar` with `sudo` in\norder to install all dependencies. Also you can check [wiki section][2] for more details\n\n### Data types\n\nIn order to see the relation between Cassandra column types and Erlang types please check this [wiki section][3]\n\n### Starting the application\n\n```erlang\napplication:start(erlcass).\n```\n\n### Setting the log level\n\n`Erlcass` is using OTP `logger` for logging the errors. Beside the fact that you can set in logger the desired log level,\nfor better performances it's better to set also in `erlcass` the desired level otherwise there will be a lot of\nresources consumed for messages that are going to be dropped anyway. Also the native driver performances can decrease \nbecause of the time spent in generating the logs and sending them from C++ into Erlang.  \n\nAvailable Log levels are:\n\n```erlang\n-define(CASS_LOG_DISABLED, 0).\n-define(CASS_LOG_CRITICAL, 1).\n-define(CASS_LOG_ERROR, 2).\n-define(CASS_LOG_WARN, 3). % default\n-define(CASS_LOG_INFO, 4).\n-define(CASS_LOG_DEBUG,5).\n-define(CASS_LOG_TRACE, 6).\n```\n\nIn order to change the log level for the native driver you need to set the `log_level` environment variable for\n`erlcass` into your app config file, example: `{log_level, 3}`.\n\n### Setting the cluster options\n\nThe cluster options can be set inside your `app.config` file under the `cluster_options` key:\n\n```erlang\n{erlcass, [\n    {log_level, 3},\n    {keyspace, \u003c\u003c\"keyspace\"\u003e\u003e},\n    {cluster_options,[\n        {contact_points, \u003c\u003c\"172.17.3.129,172.17.3.130,172.17.3.131\"\u003e\u003e},       \n        {latency_aware_routing, true},\n        {token_aware_routing, true},\n        {number_threads_io, 4},\n        {queue_size_io, 128000},\n        {core_connections_host, 1},\n        {tcp_nodelay, true},\n        {tcp_keepalive, {true, 60}},\n        {connect_timeout, 5000},\n        {request_timeout, 5000},\n        {retry_policy, {default, true}},\n        {default_consistency_level, 6}\n    ]}\n]},\n```\n\n### Tips for production environment:\n\n- Use `token_aware_routing` and `latency_aware_routing`\n- Don't use `number_threads_io` bigger than the number of your cores.\n- Use `tcp_nodelay` and enable `tcp_keepalive`\n- Don't use large values for `core_connections_host`. The driver is system call bound and performs better with less I/O threads \nand connections because it can batch a larger number of writes into a single system call (the driver will naturally attempt to coalesce these operations).\nYou may want to reduce the number of I/O threads to 2 or 3 and reduce the core connections to 1 (default).\n\nAll available options are described in the following [wiki section][4].\n\n### Add a prepare statement\n\nExample:\n\n```erlang\nok = erlcass:add_prepare_statement(select_blogpost,\n                                   \u003c\u003c\"select * from blogposts where domain = ? LIMIT 1\"\u003e\u003e),\n```\n\nIn case you want to overwrite the default consistency level for that prepare statement use a tuple for the\nquery argument: `{Query, ConsistencyLevelHere}`\n\nAlso this is possible using `{Query, Options}` where options is a proplist with the following options supported:\n\n- `consistency_level` - If it's missing the statement will be executed using the default consistency level value.\n- `serial_consistency_level` - This consistency can only be either `?CASS_CONSISTENCY_SERIAL` or\n`?CASS_CONSISTENCY_LOCAL_SERIAL` and if not present, it defaults to `?CASS_CONSISTENCY_SERIAL`. This option will be\nignored for anything else that a conditional update/insert.\n- `null_binding` - Boolean (by default `true`). Provides a way to disable the null values binding. [Binding null values][10] will create undesired tombstone in cassandra. \n\nExample:\n\n```erlang\nok = erlcass:add_prepare_statement(select_blogpost,\n        {\u003c\u003c\"select * from blogposts where domain = ? LIMIT 1\"\u003e\u003e, ?CASS_CONSISTENCY_LOCAL_QUORUM}).\n```\n\nor\n\n```erlang\nok = erlcass:add_prepare_statement(insert_blogpost, {\n        \u003c\u003c\"UPDATE blogposts SET author = ? WHERE domain = ? IF EXISTS\"\u003e\u003e, [\n        {consistency_level, ?CASS_CONSISTENCY_LOCAL_QUORUM},\n        {serial_consistency_level, ?CASS_CONSISTENCY_LOCAL_SERIAL}]\n}).\n```\n\n### Run a prepared statement query\n\nYou can bind the parameters in 2 ways: by name and by index. You can use `?BIND_BY_INDEX` and `?BIND_BY_NAME` from\n`execute/3` in order to specify the desired method. By default is binding by index.\n\nExample:\n\n```erlang\n%bind by name\nerlcass:execute(select_blogpost, ?BIND_BY_NAME, [{\u003c\u003c\"domain\"\u003e\u003e, \u003c\u003c\"Domain_1\"\u003e\u003e}]).\n\n%bind by index\nerlcass:execute(select_blogpost, [\u003c\u003c\"Domain_1\"\u003e\u003e]).\n\n%bind by index\nerlcass:execute(select_blogpost, ?BIND_BY_INDEX, [\u003c\u003c\"Domain_1\"\u003e\u003e]).\n```\n\nIn case of maps you can use `key(field)` and `value(field)` in order to bind by name.\n\n```erlang\n%table: CREATE TABLE test_map(key int PRIMARY KEY, value map\u003ctext,text\u003e)\n%statement: UPDATE examples.test_map SET value[?] = ? WHERE key = ?\n\n%bind by index\n\nerlcass:execute(identifier, [\u003c\u003c\"collection_key_here\"\u003e\u003e, \u003c\u003c\"collection_value_here\"\u003e\u003e, \u003c\u003c\"key_here\"\u003e\u003e]).\n\n%bind by name\n\nerlcass:execute(insert_test_bind, ?BIND_BY_NAME, [\n    {\u003c\u003c\"key(value)\"\u003e\u003e, CollectionIndex1},\n    {\u003c\u003c\"value(value)\"\u003e\u003e, CollectionValue1},\n    {\u003c\u003c\"key\"\u003e\u003e, Key1}\n]),\n```\n\n### Async queries and blocking queries\n\nFor sync operations use `erlcass:execute`, for async execution use : `erlcass:async_execute`.\n\nThe sync API will block the calling process (still async into the native code in order to avoid  freezing of the VM threads) until will get the result from the cluster.\n\nIn case of an async execution the calling process will receive a message of the following format: `{execute_statement_result, Tag, Result}` when the data from the server was retrieved.\n\nFor example:\n\n```erlang\n{ok, Tag} = erlcass:async_execute(...),\n    receive\n        {execute_statement_result, Tag, Result} -\u003e\n            Result\n    end.\n```\n\n### Non prepared statements queries\n\nIn order to run queries that you don't want to run them as prepared statements you can use:\n`query/1`, `query_async/1` or `query_new_statement/1` (in order to create a query statement that can be executed into a\nbatch query along other prepared or not prepared statements)\n\nThe same rules apply for setting the desired consistency level as on prepared statements (see Add prepare statement section).\n\n```erlang\nerlcass:query(\u003c\u003c\"select * from blogposts where domain = 'Domain_1' LIMIT 1\"\u003e\u003e).\n```\n\n### Batched queries\n\nIn order to perform batched statements you can use `erlcass:batch_async_execute/3` or `erlcass:batch_execute/3`.\n\nFirst argument is the batch type and is defined as:\n\n```erlang\n-define(CASS_BATCH_TYPE_LOGGED, 0).\n-define(CASS_BATCH_TYPE_UNLOGGED, 1).\n-define(CASS_BATCH_TYPE_COUNTER, 2).\n```\n\nThe second one is a list of statements (prepared or normal statements) that needs to be executed in the batch.\n\nThe third argument is a list of options in `{Key, Value}` format (proplist):\n\n- `consistency_level` - If it's missing the batch will be executed using the default consistency level value.\n- `serial_consistency_level` - That consistency can only be either `?CASS_CONSISTENCY_SERIAL` or\n`?CASS_CONSISTENCY_LOCAL_SERIAL` and if not present, it defaults to `?CASS_CONSISTENCY_SERIAL`. This option will be\nignored for anything else that a conditional update/insert.\n\nExample:\n\n```erlang\nok = erlcass:add_prepare_statement(insert_prep, \u003c\u003c\"INSERT INTO table1(id, age, email) VALUES (?, ?, ?)\"\u003e\u003e),\n\n{ok, Stm1} = erlcass:query_new_statement(\u003c\u003c\"UPDATE table2 set foo = 'bar'\"\u003e\u003e),\n\n{ok, Stm2} = erlcass:bind_prepared_statement(insert_prep),\nok = erlcass:bind_prepared_params_by_index(Stm2, [Id2, Age2, Email2]),\n\nok = erlcass:batch_execute(?CASS_BATCH_TYPE_LOGGED, [Stm1, Stm2], [\n    {consistency_level, ?CASS_CONSISTENCY_QUORUM}\n]).\n```\n\n### Paged queries\n\nIn order to perform paged query statements you can use `erlcass:async_execute_paged/2`, `erlcass:async_execute_paged/3` or `erlcass:execute_paged/2`.\n\nStatement paging is set with `erlcass:set_paging_size/2`.\n\nExample:\n\n```erlang\nok = erlcass:add_prepare_statement(paged_query_prep, \u003c\u003c\"SELECT val FROM table1\"\u003e\u003e),\n{ok, Stm} = erlcass:bind_prepared_statement(paged_query_prep),\nPageSize = 3,\nok = erlcass:set_paging_size(Stm, PageSize),\n{ok, Columns, Rows1, HasMore1} = erlcass:execute_paged(Stm, paged_query_prep),\n% Continue get more rows from same Stm until HasMore is false\n% In this example, Rows1 contains at most 3 rows [[val1], [val2], [val3]]\n%{ok, Columns, Rows2, HasMore2} = erlcass:execute_paged(Stm, paged_query_prep),\n```\n\n### Working with uuid or timeuuid fields:\n\n- `erlcass_uuid:gen_time()`   -\u003e Generates a V1 (time) UUID\n- `erlcass_uuid:gen_random()` -\u003e Generates a new V4 (random) UUID\n- `erlcass_uuid:gen_from_ts(Ts)` -\u003e Generates a V1 (time) UUID for the specified timestamp\n- `erlcass_uuid:min_from_ts(Ts)` -\u003e Sets the UUID to the minimum V1 (time) value for the specified timestamp,\n- `erlcass_uuid:max_from_ts(Ts)` -\u003e Sets the UUID to the maximum V1 (time) value for the specified timestamp,\n- `erlcass_uuid:get_ts(Uuid)` -\u003e Gets the timestamp for a V1 UUID,\n- `erlcass_uuid:get_version(Uuid)` -\u003e Gets the version for a UUID (V1 or V4)\n\n### Working with date, time fields:\n\n- `erlcass_time:date_from_epoch(EpochSecs)` -\u003e Converts a unix timestamp (in seconds) to the Cassandra `date` type.\nThe `date` type represents the number of days since the Epoch (1970-01-01) with the Epoch centered at the value 2^31.\n- `erlcass_time:time_from_epoch(EpochSecs)` -\u003e Converts a unix timestamp (in seconds) to the Cassandra `time` type.\nThe `time` type represents the number of nanoseconds since midnight (range 0 to 86399999999999).\n- `erlcass_time:date_time_to_epoch(Date, Time)` -\u003e Combines the Cassandra `date` and `time` types to Epoch time\nin seconds. Returns Epoch time in seconds. Negative times are possible if the date occurs before the Epoch (1970-1-1).\n\n### Getting metrics\n\nIn order to get metrics from the native driver you can use `erlcass:get_metrics().`\n\n##### requests\n\n- `min` - Minimum in microseconds\n- `max` - Maximum in microseconds\n- `mean` - Mean in microseconds\n- `stddev` - Standard deviation in microseconds\n- `median` - Median in microseconds\n- `percentile_75th` - 75th percentile in microseconds\n- `percentile_95th` - 95th percentile in microseconds\n- `percentile_98th` - 98th percentile in microseconds\n- `percentile_99th` - 99the percentile in microseconds\n- `percentile_999th` - 99.9th percentile in microseconds\n- `mean_rate` - Mean rate in requests per second\n- `one_minute_rate` - 1 minute rate in requests per second\n- `five_minute_rate` - 5 minute rate in requests per second\n- `fifteen_minute_rate` - 15 minute rate in requests per second\n\n##### stats\n\n- `total_connections` - The total number of connections\n\n##### errors\n\n- `connection_timeouts` - Occurrences of a connection timeout\n- `pending_request_timeouts` - Occurrences of requests that timed out waiting for a connection\n- `request_timeouts` - Occurrences of requests that timed out waiting for a request to finish\n\n### Low level methods\n\nEach query requires an internal statement (prepared or not). You can reuse the same statement object for multiple\nqueries performed in the same process.\n\n##### Getting a statement reference for a prepared statement query\n\n```erlang\n{ok, Statement} = erlcass:bind_prepared_statement(select_blogpost).\n```\n\n##### Getting a statement reference for a non prepared query\n\n```erlang\n{ok, Statement} = erlcass:query_new_statement(\u003c\u003c\"select * from blogposts where domain = 'Domain_1' LIMIT 1\"\u003e\u003e).\n```\n\n##### Bind the values for a prepared statement before executing\n\n```erlang\n%bind by name\nok = erlcass:bind_prepared_params_by_name(select_blogpost, [{\u003c\u003c\"domain\"\u003e\u003e, \u003c\u003c\"Domain_1\"\u003e\u003e}]);\n\n%bind by index\nok = erlcass:bind_prepared_params_by_index(select_blogpost, [\u003c\u003c\"Domain_1\"\u003e\u003e]);\n```\n\nFor mode details about bind by index and name please see: 'Run a prepared statement query' section\n\n[1]:https://github.com/datastax/cpp-driver\n[2]:https://github.com/silviucpp/erlcass/wiki/Getting-started\n[3]:https://github.com/silviucpp/erlcass/wiki/Data-types\n[4]:https://github.com/silviucpp/erlcass/wiki/Available-cluster-options\n[5]:https://github.com/silviucpp/erlcass/blob/master/CHANGELOG.md\n[6]:https://github.com/matehat/cqerl\n[7]:https://github.com/lpgauth/marina\n[8]:https://github.com/silviucpp/erlcass\n[9]:https://github.com/silviucpp/erlcass/wiki/Todo-list\n[10]:https://github.com/silviucpp/erlcass/wiki/Null-bindings-on-prepared-statements-and-undesired-tombstone-creation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Ferlcass","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsilviucpp%2Ferlcass","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Ferlcass/lists"}