{"id":16215345,"url":"https://github.com/smiklosovic/cassandra-bulkloader","last_synced_at":"2026-01-21T01:02:45.042Z","repository":{"id":43652706,"uuid":"208629482","full_name":"smiklosovic/cassandra-bulkloader","owner":"smiklosovic","description":"CLI tool generating Cassandra SSTables","archived":false,"fork":false,"pushed_at":"2022-02-26T00:46:28.000Z","size":24,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-07T19:47:08.490Z","etag":null,"topics":["bulk","cassandra","cli","loader","sstable","sstables"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smiklosovic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-15T17:20:56.000Z","updated_at":"2019-10-28T10:27:47.000Z","dependencies_parsed_at":"2022-07-25T22:32:51.288Z","dependency_job_id":null,"html_url":"https://github.com/smiklosovic/cassandra-bulkloader","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/smiklosovic/cassandra-bulkloader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smiklosovic%2Fcassandra-bulkloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smiklosovic%2Fcassandra-bulkloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smiklosovic%2Fcassandra-bulkloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smiklosovic%2Fcassandra-bulkloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smiklosovic","download_url":"https://codeload.github.com/smiklosovic/cassandra-bulkloader/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smiklosovic%2Fcassandra-bulkloader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28620572,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-20T23:49:58.628Z","status":"ssl_error","status_checked_at":"2026-01-20T23:47:29.996Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bulk","cassandra","cli","loader","sstable","sstables"],"created_at":"2024-10-10T11:14:37.826Z","updated_at":"2026-01-21T01:02:45.026Z","avatar_url":"https://github.com/smiklosovic.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cassandra-bulkloader\nCLI tool generating Cassandra SSTables\n\nThis tool is simply generating SSTables programmatically. It uses Cassandra's `CQLSSTableWriter`. \nAfter generation of SSTables is finished, you can load them by `sstableloader` tool as usually.\n\nThe project consists of three modules:\n\n* api - impl is coded against this module\n* impl - implementation of your population logic, depends on `api`\n* loader - implementation of whole loader CLI application, depends on `impl` and `api`.\n\n## Build \n\n`mvn clean instal`\n\n## Run\n\n```\njava \\\n  -cp /path/to/impl-1.0.jar:/path/to/loader-1.0.jar \\\n  com.instaclustr.cassandra.bulkloader.CLIApplication \\\n  _command_ \\\n  _arguments_\n```\n\nNo `command` executes default command - `help`:\n\n```\nUsage: \u003cmain class\u003e [-V] COMMAND\n  -V, --version   print version information and exit\nCommands:\n  csv     tool for bulk-loading of data from csv\n  random  tool for bulk-loading of random data\n```\n\n### `random` command\n```\ntool for bulk-loading of random data\n  -d, --output-dir=[DIRECTORY]\n                            Destination where SSTables will be generated.\n  -k, --keyspace=[KEYSPACE] Keyspace for which SSTables will be generated.\n  -t, --table=[TABLE]       Table for which SSTables will be generated.\n  -s, --schema=[PATH]       Path to CQL schema where CREATE TABLE statement is\n                              specified.\n      --sorted              Whether input data are already sorted (in terms of\n                              CQL)\n      --partitioner=\u003cpartitioner\u003e\n                            Paritioner used for SSTable generation, defaults to\n                              'murmur'\n      --bufferSize=\u003cbufferSize\u003e\n                            How much data will be buffered before being written as\n                              a new SSTable, in megabytes. Defaults to 128\n      --numberOfRecords=\u003cnumberOfRecords\u003e\n                            Number of records to generate when using random\n                              command\n      --threads=\u003cthreads\u003e   Number of threads to use for generation.\n  -f, --file=\u003cfile\u003e         file to digest, irrelevant for random loader\n  -h, --help                Show this help message and exit.\n  -V, --version             Print version information and exit.\n```\n\n### `csv` command\n\n`csv` command has same arguments as `random` but `--file` is mandatory. There is supposed to be CSV file which \nis representing rows. Each row will be parsed into list of strings passed to `RowMapper` implementation where you \nhave to map them to list of objects for Cassandra INSERT statement as values.\n\n## Row generation\n\nIn order to generate data, in case of `random` generator, you have to implement interface \n`com.instaclustr.cassandra.bulkloader.RowMapper` in `api` module. This implementation should \nbe placed in `impl` module.\n\n## RowMapper interface\n\n```\npackage com.instaclustr.cassandra.bulkloader;\n\nimport java.util.List;\n\npublic interface RowMapper {\n\n    /**\n     * Maps list of strings from whatever input representing\n     * a row to list of objects to insert into Cassandra.\n     *\n     * @param row where values are consisting of list of strings\n     * @return list of objects to put to insert statement\n     */\n    List\u003cObject\u003e map(final List\u003cString\u003e row);\n\n    /**\n     * Logically same as {@link #map(List)} but all data per row\n     * needs to be generated inside of the method. The number\n     * of items in the returned list has to match number of columns\n     * in a row. Each such object represents value which will be\n     * passed to Cassandra INSERT statement.\n     *\n     * This method is called repeatedly. Number of calls\n     * is equal to paramter `--numberOfRecords`.\n     *\n     * @return list of objects to put to insert statement\n     */\n    List\u003cObject\u003e random();\n\n    /**\n     * @return string representation of INSERT INTO statement. Question marks in VALUES are not\n     * meant to be replaced.\n     * \u003cp\u003e\n     * For example: 'INSERT INTO keyspace.table (\"field1, \"field2\", ...) VALUES (?, ?, ?)'\n     */\n    String insertStatement();\n}\n\n```\n\n## SPI mechanism\n\nThere is Java SPI mechanism for implementation discovery so it means that besides implementing API,\nyou have to change `impl/src/main/resources/META-INF/services/com.instaclustr.cassandra.bulkloader.RowMapper` \nfile containing FQCN of your implemenation on one line.\n\nOnce impl jar is placed on the class path, it will be automatically discovered by `loader` module so \nyou do not need to use any command-line arguments. Mere putting of that JAR on the class path does the job.\n\nThis in practice means that you need to compile only `impl` module which contains one class so the compilation \nand JAR building will take literally few seconds (less the 1 sec here). The command line arguments and all will look \njust same.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmiklosovic%2Fcassandra-bulkloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmiklosovic%2Fcassandra-bulkloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmiklosovic%2Fcassandra-bulkloader/lists"}