{"id":31939701,"url":"https://github.com/alibaba/table-computing","last_synced_at":"2025-10-14T08:45:04.946Z","repository":{"id":53270067,"uuid":"397802682","full_name":"alibaba/table-computing","owner":"alibaba","description":"Table-Computing (Simplified as TC) is a high performance and low latency computing framework, 10x faster than Flink for complicated use cases, distributed and light weighted, relational operation, simple to use, write less and do more.","archived":false,"fork":false,"pushed_at":"2023-03-06T15:30:23.000Z","size":349,"stargazers_count":37,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-08-03T08:20:47.206Z","etag":null,"topics":["big-data","data-analysis","java","stream-processing","table-computing","tc"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibaba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-19T03:20:33.000Z","updated_at":"2025-04-03T09:21:47.000Z","dependencies_parsed_at":"2022-08-19T19:20:25.948Z","dependency_job_id":null,"html_url":"https://github.com/alibaba/table-computing","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/alibaba/table-computing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Ftable-computing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Ftable-computing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Ftable-computing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Ftable-computing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibaba","download_url":"https://codeload.github.com/alibaba/table-computing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Ftable-computing/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279018302,"owners_count":26086345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-analysis","java","stream-processing","table-computing","tc"],"created_at":"2025-10-14T08:45:02.805Z","updated_at":"2025-10-14T08:45:04.937Z","avatar_url":"https://github.com/alibaba.png","language":"Java","funding_links":[],"categories":["大数据"],"sub_categories":[],"readme":"# Table-Computing \n\nWelcome to the Table-Computing GitHub.\n\nTable-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework.\nRelational operation, simple to use, write less and do more.\nFrom our using experience TC can achieve milliseconds latency and 10+ times faster than Flink for complicated use cases.\nFor the same streaming task we use TC achieved 10+ times computing resource saving.\n\n## Why we develop this framework \nRelational operation is an effective tool to process and analyze data, SQL is a widely used implementation of relational operation. \nBut SQL is not Turing-compete, we need UDF/Stored-procedure/UDAF/UDTF etc. to solve complicated business scenario. \nIf we need complicated WHERE criteria, JOIN criteria, a new Scalar Function, Transform Function, Aggregation Function, Window Function etc. we cannot use SQL easily do this.\nSQL is also not very efficient for complicated case, whether SQL can high-powered execute depend on the SQL plan optimizer has optimized the use case which we \nare using in the complicated business scenario. But more complicated scenario more difficult to guarantee every SQL use case had been optimized by the optimizer.\nBesides SQL that we can also use Flink DataStream/DataSet but we need very long code to implement a complex data processing task and we also need \nto design the Execution-graph this is a complex art we need compound the operator or disjoint them then observe whether the adjusted graph is more efficient \nand the task delay is acceptable, if not where is the bottleneck of this Execution-graph and how to resolve. Think that complex task usually include dozens \nof operators which have lots of combinations, trying those maybe-efficient combinations is a heavy work.\n\n## Example\nComputes the last hour top 100 sales volume ranking list every half hour\n```\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba\u003c/groupId\u003e\n    \u003cartifactId\u003etable-computing\u003c/artifactId\u003e\n    \u003cversion\u003e1.0.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n```java\nMysqlDimensionTable mysqlDimensionTable = new MysqlDimensionTable(\"jdbc:mysql://localhost:3306/e-commerce\",\n        \"commodity\",\n        \"userName\",\n        \"password\",\n        Duration.ofHours(1),\n        new ColumnTypeBuilder()\n        .column(\"id\", Type.INT)\n        .column(\"name\", Type.VARCHAR)\n        .column(\"price\", Type.INT)\n        .build(),\n        \"id\"\n        );\n\nMap\u003cString, Type\u003e columnTypeMap = new ColumnTypeBuilder()\n        .column(\"__time__\", Type.BIGINT)\n        .column(\"id\", Type.BIGINT)\n        .column(\"commodity_id\", Type.INT)\n        .column(\"count\", Type.INT)\n        .build();\n\nKafkaStreamTable kafkaStreamTable = new KafkaStreamTable(bootstrapServers,\n        \"consumerGroupId\",\n        topic,\n        0,\n        columnTypeMap);\nkafkaStreamTable.start();\n\nStreamProcessing sp = new StreamProcessing();\nString[] hashBy = new String[]{\"commodity_id\"};\nRehash rehashForSlideWindow = sp.rehash(\"uniqueNameForSlideWindow\", hashBy);\nString[] returnedColumns = new String[]{\"commodity_id\",\n        \"sales_volume\",\n        \"saleroom\",\n        \"window_start\"};\nSlideWindow slideWindow = new SlideWindow(Duration.ofHours(1),\n        Duration.ofMinutes(30),\n        hashBy,\n        \"__time__\",\n        new AggTimeWindowFunction() {\n            @Override\n            public Comparable[] agg(List\u003cComparable\u003e partitionByColumns, List\u003cRow\u003e rows, long windowStart, long windowEnd) {\n                return new Comparable[]{\n                        partitionByColumns.get(0),\n                        AggregationUtil.sumInt(rows, \"count\"),\n                        AggregationUtil.sumInt(rows, \"total_price\"),\n                        windowStart\n                };\n            }\n        }, returnedColumns);\nslideWindow.setWatermark(Duration.ofSeconds(2));\n\nhashBy = new String[]{\"window_start\"};\nRehash rehashForSessionWindow = sp.rehash(\"uniqueNameForSessionWindow\", hashBy);\nSessionWindow sessionWindow = new SessionWindow(Duration.ofSeconds(1),\n        hashBy,\n        \"window_start\",\n        new TimeWindowFunction() {\n            @Override\n            public List\u003cComparable[]\u003e transform(List\u003cComparable\u003e partitionByColumns, List\u003cRow\u003e rows, long windowStart, long windowEnd) {\n                int[] top100 = WindowUtil.topN(rows, \"sales_volume\", 100);\n                List\u003cComparable[]\u003e ret = new ArrayList\u003c\u003e(100);\n                for (int i = 0; i \u003c top100.length; i++) {\n                    ret.add(rows.get(top100[i]).getAll());\n                }\n                return ret;\n            }\n        }, returnedColumns);\nsessionWindow.setWatermark(Duration.ofSeconds(3));\n\nsp.compute(new Compute() {\n    @Override\n    public void compute(int myThreadIndex) throws InterruptedException {\n        Table table = kafkaStreamTable.consume();\n        TableIndex tableIndex = mysqlDimensionTable.curTable();\n        table = table.leftJoin(tableIndex.getTable(), new JoinCriteria() {\n            @Override\n            public List\u003cInteger\u003e theOtherRows(Row thisRow) {\n                // Use tableIndex.getRows but not mysqlDimensionTable.curTable().getRows. Consider the second\n                // mysqlDimensionTable.curTable() may correspond to the newly reloaded dimension table which\n                // is not consistent with the first mysqlDimensionTable.curTable() and tableIndex.getTable()\n                return tableIndex.getRows(thisRow.getInteger(\"commodity_id\"));\n            }},\n            new As().\n                as(\"id\", \"order_id\").\n                build(),\n            new As().\n                as(\"name\", \"commodity_name\").\n                as(\"price\", \"commodity_price\").\n                build());\n        List\u003cTable\u003e tables = rehashForSlideWindow.rehash(table, myThreadIndex);\n        table = slideWindow.slide(tables);\n        tables = rehashForSessionWindow.rehash(table, myThreadIndex);\n        table = sessionWindow.session(tables);\n        if (table.size() \u003e 0) {\n            table.print();\n            //you can elegantly finish the streaming task when terminate condition is satisfied\n            Thread.currentThread().interrupt();\n        }\n    }\n});\n```\nDistributed deploy your table-computing task:\n\njava -Xmx100g -XX:MaxDirectMemorySize=500g -Dself=localhost:8888 -Dall=localhost:8888,localhost:9999 -jar my_task.jar\n\njava -Xmx100g -XX:MaxDirectMemorySize=500g -Dself=localhost:9999 -Dall=localhost:8888,localhost:9999 -jar my_task.jar\n\n\n\n## Optimize：\n1. Use only 1 thread concurrency to test the 1 thread throughput, then use upstream data volume divide 1 thread throughput to get the\n StreamProcessing concurrent thread number. The thread number should not be too large since thread race will lead to unnecessary\n resource consumption which maybe give rise to OOM (no enough CPU time to release the unused memory)\n2. -Xmx parameter should be appropriate. Since the table data are all store on the off-heap memory to improve performance too large\n -Xmx will lead to belatedly memory release which may give rise to OOM, while too small -Xmx will lead to too frequently GC to reduce \n the throughput.\n3. Not only the old GC stop the world young GC also stop the world transiently, more threads means more garbage generation\n means more GC means more often STW means thread CPU usage cannot be raised, you may find use more StreamProcessing \n thread cannot increase the throughput now you should start a new JVM (can be on the same machine use localhost:anotherPort). \n Actually use N thread can not get N times throughput you may need start a new JVM, you can also use `top -H -p pid` to \n see whether the compute-X named threads CPU usage approximate 100% to make your decision. \n\n\n\n## Notice：\n1. For no continuous data case the AbstractStreamTable will return an empty table after sleep 100ms (default)\n to trigger computing, else the watermark data/window data/rehashed or rebalanced to other server/thread data will never be computed\n2. Reading dimension table thread will block until the dimension table finished loading\n\n\n\n## Copyright and License\nTable-Computing is provided under the [Apache-2.0 license](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Ftable-computing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibaba%2Ftable-computing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Ftable-computing/lists"}