{"id":21268980,"url":"https://github.com/qubole/spark-acid","last_synced_at":"2025-07-11T05:30:43.218Z","repository":{"id":48871420,"uuid":"198199946","full_name":"qubole/spark-acid","owner":"qubole","description":"ACID Data Source for Apache Spark based on Hive ACID ","archived":false,"fork":false,"pushed_at":"2021-07-07T18:10:26.000Z","size":442,"stargazers_count":96,"open_issues_count":21,"forks_count":36,"subscribers_count":18,"default_branch":"master","last_synced_at":"2024-04-17T22:49:45.565Z","etag":null,"topics":["acid","big-data","hive","hive-acid","spark"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qubole.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-22T10:23:01.000Z","updated_at":"2023-10-12T07:28:56.000Z","dependencies_parsed_at":"2022-09-06T21:40:42.511Z","dependency_job_id":null,"html_url":"https://github.com/qubole/spark-acid","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fspark-acid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fspark-acid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fspark-acid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fspark-acid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qubole","download_url":"https://codeload.github.com/qubole/spark-acid/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225693821,"owners_count":17509227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acid","big-data","hive","hive-acid","spark"],"created_at":"2024-11-21T08:06:58.592Z","updated_at":"2024-11-21T08:06:59.209Z","avatar_url":"https://github.com/qubole.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hive ACID Data Source for Apache Spark\n\nA Datasource on top of Spark Datasource V1 APIs, that provides Spark support for [Hive ACID transactions](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions).\n\nThis datasource provides the capability to work with Hive ACID V2 tables, both Full ACID tables as well as Insert-Only tables.\n\nfunctionality availability matrix\n\nFunctionality | Full ACID table | Insert Only Table |\n------------- | --------------- | ----------------- |\nREAD | ``\u003e= v0.4.0`` | ``\u003e= v0.4.0`` |\nINSERT INTO / OVERWRITE | ``\u003e= v0.4.3`` |  ``\u003e= v0.4.4`` |\nCTAS | ``\u003e= v0.4.3`` |  ``\u003e= v0.4.4`` |\nUPDATE | ``\u003e= v0.5.0`` |  Not Supported |\nDELETE | ``\u003e= v0.5.0`` |  Not Supported |\nMERGE | ``\u003e v0.5.0`` | Not Supported |\nSTREAMING INSERT | ``\u003e= v0.5.0`` | ``\u003e= v0.5.0`` |\n\n*Note: In case of insert only table for support of write operation compatibility check needs to be disabled*\n\n## Quick Start\n\n- [QuickStart](#quickstart)\n    - [Prerequisite](#prerequisite)\n    - [Config](#config)\n    - [Run](#run)\n- [Usage](#usage)\n    - [Read ACID Table](#read-acid-table)\n    - [Batch Write into ACID Table](#batch-write-into-acid-table)\n    - [Stream Write into ACID Table](#stream-write-into-acid-table)\n    - [Update](#updates)\n    - [Delete](#deletes)\n    - [Merge](#merge)\n- [Version Compatibility](#version-compatibility)\n- [Developer Resources](#developer-resources)\n- [Design Consideration](#design-constraints)\n- [Contributing](#contributing)\n- [Report Bugs](#reporting-bugs-or-feature-requests)\n- [Known Issues](#known-issues)\n    \n\n## QuickStart\n\n### Prerequisite\nThese are the pre-requisites to using this library:\n\n1. You have Hive Metastore DB with version 3.1.2 or higher. Please refer to [Hive Metastore](https://cwiki.apache.org/confluence/display/Hive/Design#Design-MetastoreArchitecture) for details.\n2. You have a Hive Metastore Server running with version 3.1.1 or higher, as Hive ACID needs a standalone Hive Metastore Server to operate. Please refer to [Hive Configuration](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration) for configuration options.\n\n### Config\n\nChange configuration in `$SPARK_HOME/conf/hive-site.xml` to point to already configured HMS server endpoint. If you meet the above pre-requisites, this is probably already configured.\n\n```xml\n\u003cconfiguration\u003e\n  \u003cproperty\u003e\n  \u003cname\u003ehive.metastore.uris\u003c/name\u003e\n    \u003c!-- hostname must point to the Hive metastore URI in your cluster --\u003e\n    \u003cvalue\u003ethrift://hostname:10000\u003c/value\u003e\n    \u003cdescription\u003eURI for spark to contact the hive metastore server\u003c/description\u003e\n  \u003c/property\u003e\n\u003c/configuration\u003e\n```\n\n### Run\n\nThere are a few ways to use the library while running spark-shell\n\n       `spark-shell --packages qubole:spark-acid:0.6.0-s_2.11\n\n2. If you built the jar yourself, copy the `spark-acid-assembly-0.6.0.jar` jar into `$SPARK_HOME/assembly/target/scala.2_11/jars` and run\n\n       spark-shell\n\n#### DataFrame API\n\nTo operate on Hive ACID table from Scala / pySpark, the table can be directly accessed using this datasource. Note the short name of this datasource is `HiveAcid`. Hive ACID table are tables in HiveMetastore so any operation of read and/or write needs `format(\"HiveAcid\").option(\"table\", \"\u003ctable name\u003e\"\")`. _Direct read and write from the file is not supported_\n\n    scala\u003e val df = spark.read.format(\"HiveAcid\").options(Map(\"table\" -\u003e \"default.acidtbl\")).load()\n    scala\u003e df.collect()\n\n#### SQL\n\nTo read an existing Hive acid table through pure SQL, there are two ways:\n\n1. Use SparkSession extensions framework to add a new Analyzer rule (HiveAcidAutoConvert) to Spark Analyser. This analyzer rule automatically converts an _HiveTableRelation_ representing acid table to _LogicalRelation_ backed by HiveAcidRelation.\n   \n   \tTo use this, initialize SparkSession with the extension builder as mentioned below:\n   \n            val spark = SparkSession.builder()\n              .appName(\"Hive-acid-test\")\n              .config(\"spark.sql.extensions\", \"com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension\")\n              .enableHiveSupport()\n              .\u003cOTHER OPTIONS\u003e\n              .getOrCreate()\n   \n            spark.sql(\"select * from default.acidtbl\")\n\n2. Create a dummy table that acts as a symlink to the original acid table. This symlink is required to instruct Spark to use this datasource against an existing table.\n\n\tTo create the symlink table:\n\n\t\tspark.sql(\"create table symlinkacidtable using HiveAcid options ('table' 'default.acidtbl')\")\n\n        spark.sql(\"select * from symlinkacidtable\")\n\n\n   _Note: This will produce a warning indicating that Hive does not understand this format_\n\n     WARN hive.HiveExternalCatalog: Couldn’t find corresponding Hive SerDe for data source provider com.qubole.spark.hiveacid.datasource.HiveAcidDataSource. Persisting data source table `default`.`sparkacidtbl` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.\n\n   _Please ignore it, as this is a sym table for Spark to operate with and no underlying storage._\n\n## Usage\n\nThis section talks about major functionality provided by the data source and example code snippets for them.\n\n### Create/Drop ACID Table\n#### SQL Syntax\nSame as CREATE and DROP supported by Spark SQL.\n\n##### Examples\n\nDrop Existing table\n\n\tspark.sql(\"DROP TABLE if exists acid.acidtbl\")\n\nCreate table\n\n\tspark.sql(\"CREATE TABLE acid.acidtbl (status BOOLEAN, tweet ARRAY\u003cSTRING\u003e, rank DOUBLE, username STRING) STORED AS ORC TBLPROPERTIES('TRANSACTIONAL' = 'true')\")\n\n_Note: Table property ``'TRANSACTIONAL' = 'true'`` is required to create ACID table_\n\nCheck if it is transactional\n\n\tspark.sql(\"DESCRIBE extended acid.acidtbl\").show()\n\n### Read ACID Table\n\n#### DataFrame API\nRead acid table\n\n\tval df = spark.read.format(\"HiveAcid\").options(Map(\"table\" -\u003e \"acid.acidtbl\")).load()\n\tdf.select(\"status\", \"rank\").filter($\"rank\" \u003e \"20\").show()\n\nRead acid via implicit API\n    \n    import com.qubole.spark.hiveacid._\n    \n    val df = spark.read.hiveacid(\"acid.acidtbl\")\n    df.select(\"status\", \"rank\").filter($\"rank\" \u003e \"20\").show()\n\n#### SQL SYNTAX\nSame as SELECT supported by Spark SQL.\n\n##### Example\n\n    spark.sql(\"SELECT status, rank from acid.acidtbl where rank \u003e 20\")\n\n_Note: ``com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension`` has to be added to ``spark.sql.extensions`` for above._\n\n### Batch Write into ACID Table\n\n#### DataFrame API\n\nInsert into\n\n\tval df = spark.read.parquet(\"tbldata.parquet\")\n\tdf.write.format(\"HiveAcid\").option(\"table\", \"acid.acidtbl\").mode(\"append\").save()\n\nInsert overwrite\n\n\tval df = spark.read.parquet(\"tbldata.parquet\")\n\tdf.write.format(\"HiveAcid\").option(\"table\", \"acid.acidtbl\").mode(\"overwrite\").save()\n\nInsert into using implicit API\n\n    import com.qubole.spark.hiveacid._\n    \n    val df = spark.read.parquet(\"tbldata.parquet\")\n    df.write.hiveacid(\"acid.acidtbl\", \"append\")\n\n#### SQL Syntax\nSame as INSERT supported by Spark SQL\n\n##### Example\nInsert into the table select as\n\n\tspark.sql(\"INSERT INTO acid.acidtbl select * from sample_data\")\n\nInsert overwrite the table select as\n\n\tspark.sql(\"INSERT OVERWRITE TABLE acid.acidtbl select * from sample_data\")\n\nInsert into\"\n\n\tspark.sql(\"INSERT INTO acid.acidtbl VALUES(false, array(\"test\"), 11.2, 'qubole')\")\n\n_Note: ``com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension`` has to be added to ``spark.sql.extensions`` for above SQL statements._\n### Stream Write into ACID Table\nACID table supports streaming writes and can also be used as a Streaming Sink. \nStreaming write happens under transactional guarantees which allows other\nconcurrent writes to the same table either streaming writes or batch writes.\nFor exactly-once semantics, ``spark.acid.streaming.log.metadataDir`` is specified to\nstore the latest batchId processed. Note, that concurrent streaming writes \nto the same table should have different metadataDir specified.\n\n    val query = newDf\n      .writeStream\n      .format(\"HiveAcid\")\n      .options(Map(\n        \"table\" -\u003e\"acid.acidtbl\",\n        \"spark.acid.streaming.log.metadataDir\"-\u003e\"/tmp/metadataDir\"))\n      .outputMode(OutputMode.Append)\n      .option(\"checkpointLocation\", \"/tmp/checkpointDir\")\n      .start()\n\n### Updates\n\n#### SQL Syntax\n\n    UPDATE tablename SET column = updateExp [, column = updateExp ...] [WHERE expression]\n    \n* ``column`` must be a column of the table being updated.\n* ``updateExp`` is an expression that Spark supports in the SELECT clause. Subqueries are not supported.\n* ``WHERE`` clause specifies the row to be updated.\n* Partitioning columns cannot be updated.\n* Bucketed table are not supported currently.\n\n#### Example\n    \n    spark.sql(\"UPDATE acid.acidtbl set rank = rank - 1, status = true where rank \u003e 20 and rank \u003c 25 and status = false\")\n\n_Note: ``com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension`` has to be added to ``spark.sql.extensions`` for above._\n### Deletes\n\n#### SQL syntax\n    DELETE FROM tablename [WHERE expression]\n\n* ``WHERE`` clause specifies rows to be deleted from ``tablename``.\n* Bucketed tables are not supported currently.\n\n##### Example\n\n    DELETE from acid.acidtbl where rank = 1000\n\n_Note: ``com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension`` has to be added to ``spark.sql.extensions`` for above._\n\n### Merge\n\n#### SQL syntax\n    MERGE INTO \u003ctarget table\u003e [AS T] USING \u003csource table\u003e [AS S]\n    ON \u003cboolean merge expression\u003e\n    WHEN MATCHED [AND \u003cboolean expression1\u003e] THEN \u003cmatch_clause\u003e\n    WHEN MATCHED [AND \u003cboolean expression2\u003e] THEN \u003cmatch_clause\u003e\n    WHEN NOT MATCHED [AND \u003cboolean expression3\u003e] THEN INSERT VALUES ( \u003cinsert value list\u003e )\n\n    \u003cmatch_clause\u003e ::\n                        UPDATE SET \u003cset clause list\u003e\n                        DELETE\n    \n    \u003cinsert value list\u003e :: \n                        value 1 [, value 2, value 3, ...]\n                        [value 1, value 2, ...] * [, value n, value n+1, ...]\n    \n    \u003cupdate set list\u003e ::\n                        target_col1 = value 1 [, target_col2 = value 2 ...]\n                    \n\n* ``\u003ctarget table\u003e`` needs to be Full ACID table. ``T`` is optional placeholder for target alias. \n* ``\u003csource table\u003e`` needs to be a table defined. You can use functions like ``createOrReplaceTempView`` to store source DataFrames as tables.\n``S`` is optional placeholder for source alias.\n* ``\u003cmerge expression\u003e`` are the join expressions used as merge condition\n* Match clauses (UPDATE and DELETE)\n    * At most 2 match clauses are allowed i.e., minimum 0 and maximum 2.\n    * Only UPDATE and DELETE operations are supported in match clause.\n    * ``\u003cboolean expression1\u003e `` and ``\u003cboolean expression2\u003e`` are optional match conditions. \n    * If 2 match clauses are specified:\n        * Both should be different operations.\n        * First match clause should have a match condition.\n        * If a target row qualifies for both match clauses as their match conditions overlap, \n          then only the first clause will be executed on them.\n* INSERT clause \n    * is only supported for non-matched clause.\n    * supports ``*`` to be used anywhere in value list and it resolves into source table columns.\n    * values to be inserted should exactly match the number of target columns after ``*`` resolution and \n      also match corresponding data type.\n* Cardinality Check: SQL standard enforces that one row of target doesn't match multiple rows of source.\n  This check is enforced and runtime exception is thrown if it is violated.\n\n#### Example\n\n    MERGE INTO target as t USING source as s\n    ON t.id = s.id\n    WHEN MATCHED AND t.city = 'Bangalore' THEN UPDATE t.city = s.city\n    WHEN MATCHED AND t.dept = 'closed' THEN DELETE\n    WHEN NOT MATCHED AND t.city = ('Bangalore', 'San Jose') THEN INSERT VALUES (*, '07', '2020')\n\n#### Performance consideration\n* MERGE operation is a pretty loaded statement and can be expensive in nature.\n* MERGE operation will perform ``Right Outer Join`` between target and source. \n  This can lead to full table scan of the target table. \n  Please consider partitioning the target table and only mentioning required partitions in merge condition for MERGE operations.\n* When only INSERT clause is present, ``Left Anti Join`` between source and target will be performed. It will be \n  cheaper than ``Right Outer Join`` between target and source.\n* Cardinality check (as described above) also requires Join. We reuse the same ``Right Outer Join`` done for MERGE operation \n  to avoid extra Join for this check. When only INSERT clause is present this check is not done as it is not required.\n\n## Version Compatibility\n\n### Compatibility with Apache Spark Versions\n\nACID datasource has been tested to work with Apache Spark 2.4.3, but it should work with older versions as well. However, because of a Hive dependency, this datasource needs Hadoop version 2.8.2 or higher due to [HADOOP-14683](https://jira.apache.org/jira/browse/HADOOP-14683)\n\n_NB: Hive ACID V2 is supported in Hive 3.0.0 onwards and for that hive Metastore db needs to be [upgraded](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) to 3.0.0 or above._\n\n### Data Storage Compatibility\n\n1. ACID datasource does not control data storage format and layout, which is managed by Hive. It works with data written by Hive version 3.0.0 and above. Please see [Hive ACID storage layout](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-BasicDesign).\n\n2. ACID datasource works with data stored on local files, HDFS as well as cloud blobstores (AWS S3, Azure Blob Storage etc).\n\n## Developer resources\n### Build\n\n1. First, build the dependencies and publish it to local. The *shaded-dependencies* sub-project is an sbt project to create the shaded hive metastore and hive exec jars combined into a fat jar `spark-acid-shaded-dependencies`. This is required due to our dependency on Hive 3 for Hive ACID, and Spark currently only supports Hive 1.2. To compile and publish shaded dependencies jar:\n\n    cd shaded-dependencies\n    sbt clean publishLocal\n\n2. Next, build the main project:\n\n\tsbt assembly\n\nThis will create the `spark-acid-assembly-0.6.0.jar` which can be now used in your application.\n\n### Test\nTests are run against a standalone docker setup. Please refer to [Docker setup] (docker/README.md) to build and start a container.\n\n_NB: Container run HMS server, HS2 Server and HDFS and listens on port 10000,10001 and 9000 respectively. So stop if you are running HMS or HDFS on same port on host machine._\n\nTo run the full integration test:\n\n    sbt test\n\n\n### Release\n\nTo release a new version use\n\n    sbt release\n\nTo publish a new version use\n\n    sbt spPublish\n\nRead more about [sbt release](https://github.com/sbt/sbt-release)\n\n\n### Design Constraints\n\n1. This datasource when it needs to read data, it talks to the HiveMetaStore Server to get the list of transactions that have been committed, and using that, the list of files it should read from the filesystem (_uses s3 listing_). Given the snapshot of list of file is created by using listing, to avoid inconsistent copy of data, on cloud object store service like S3 guard should be used.\n\n2. This snapshot of list of files is created at the RDD level. These snapshot are at the RDD level so even when using same table in single SQL it may be operating on two different snapshots\n\n    spark.sql(\"select * from a join a)\n\n3. The files in the snapshot needs to be protected till the RDD is in use. By design concurrent reads and writes on the Hive ACID works with the help of locks, where every client (across multiple engines) that is operating on ACID tables is expected to acquire locks for the duration of reads and writes. The lifetime of RDD can be very long, to avoid blocking other operations like inserts this datasource _DOES NOT_ acquire lock but uses an alternative mechanism to protect reads. Other way the snapshot can be protected is by making sure the files in the snapshot are not deleted while in use. For the current datasoure any table on which Spark is operating `Automatic Compaction` should be disabled. This makes sure that cleaner does not clean any file. To disable automatic compaction on table\n\n         ALTER TABLE \u003c\u003e SET TBLPROPERTIES (\"NO_AUTO_COMPACTION\"=\"true\")\n\n\tWhen the table is not in use cleaner can be enabled and all the files that needs cleaned will get queued up for cleaner. Disabling compaction do have performance implication on reads/writes as lot of delta file may need to be merged when performing read.\n\n4. Note that even though reads are protected admin operation like `TRUNCATE` `ALTER TABLE DROP COLUMN` and `DROP` have no protection as they clean files with intevention from cleaner. These operations should be performed when Spark is not using the table.\n\n## Contributing\n\n1. You can join the group for queries and discussions by sending email to: spark-acid+subscribe@googlegroups.com\n   On subscribing, you will be sent email to confirm joining the group.\n\n2. We use [Github Issues](https://github.com/qubole/spark-acid/issues) to track issues.\n   Please feel free to open an issue for any queries, bugs and feature requests.\n\n3. Pull Request can be raised against any open issues and are most welcome. \n   Processes or guidelines for the same is not formal currently.\n\n## Reporting bugs or feature requests\n\nPlease use the github issues for the acid-ds project to report issues or raise feature requests. \nYou can also join this group to discuss them: spark-acid+subscribe@googlegroups.com\n\n## Known Issues\n\n1. Insert in static partitions is not supported via spark acid. For example query like \"insert into tbl partition (p1=1) ....\" will not work. It is because spark currently does not support partitioned datasources. It only supports partitions in Hive table relation or a file based relation. But spark acid relation is neither of them.\n2. Because of an open source issue [HIVE-21052](https://issues.apache.org/jira/browse/HIVE-21052), users started hitting the issue described by [@amoghmargoor](https://github.com/amoghmargoor) in [this](https://issues.apache.org/jira/browse/HIVE-21052?focusedCommentId=17152785\u0026page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17152785) comment for **partitioned** tables.\nThe workaround of the issue HIVE-21052 is that we don't set dynamic partition flag when making lock request in spark acid. HIVE-21052 will **not** have any impact on the functionality and it is expected to continue to work well. However users will have to do some manual cleanups. They are not serious in nature and can be performed once in a while. These are:\n\n- If transaction is successful\n   - For every transaction on partitioned table, now a null entry gets created in COMPLETED_TXN_COMPONENTS table when a transaction is moved from TXN_COMPONENTS. This entry does not get removed after compaction. Note that there would be only one such null entry in COMPLETED_TXN_COMPONENTS table for a transaction. For example if your transaction has touched 100 partitions only one entry of null partitions would get created. So the null entries should not overwhelm the table.\n   - To delete this null entry from COMPLETED_TXN_COMPONENTS table, manually run this query once in a while on metastore db. Note that this is only applicable on partitioned tables\n   \n       ``DELETE FROM completed_txn_components WHERE ctc_partition IS NULL AND ctc_writeid IS NULL  AND ctc_table = \u003cTABLE_NAME\u003e``\n                                                    \n- If transaction is aborted\n\n   - Transaction now remains in TXN_COMPONENTS and TXNS tables. Any future reads are not impacted by this aborted transaction though.\n   - The cleanup involves 2 simple steps to be followed for partitioned tables in the following order. These are:\n      1. Delete files in the object store which were written by the aborted transaction. To find out the write id the simple sql query is: \n           \n            ``SELECT t.TXN_ID,  T2W_WRITEID as WRITE_ID from TXNS as t JOIN TXN_COMPONENTS as tc ON tc.TC_TXNID = t.TXN_ID JOIN TXN_TO_WRITE_ID as tw on t.TXN_ID = tw.T2W_TXNID and t.TXN_STATE = 'a' and tc.TC_PARTITION is NULL``\n            \n            For example if your write ID is 4, then you will need to cleanup  all delta/delta_delete/base directories with name: delta_0000004_0000004/delete_delta_0000004_0000004/base_0000004\n      2. Delete entry from TXN_COMPONENTS table for aborted transaction. Complete sql query looks like\n      \n          ``WITH aborted_transactions AS (\n            SELECT\n                t.txn_id,  \n                tw.t2w_writeid AS write_id\n            FROM\n                txns AS t \n                JOIN txn_components AS tc \n                    ON t.txn_id = tc.tc_txnid\n                JOIN txn_to_write_id AS tw \n                    ON t.txn_id = tw.t2w_txnid\n                    AND t.txn_state = 'a' \n                    AND tc.tc_partition IS NULL\n            )\n            DELETE FROM txn_components WHERE tc_txnid IN (SELECT txn_id FROM aborted_transactions)`` \n         ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqubole%2Fspark-acid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqubole%2Fspark-acid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqubole%2Fspark-acid/lists"}