{"id":21629694,"url":"https://github.com/sfu-dis/tesseract","last_synced_at":"2025-06-15T18:13:07.959Z","repository":{"id":51252491,"uuid":"336977453","full_name":"sfu-dis/tesseract","owner":"sfu-dis","description":"Tesseract: Efficient Online Schema Evolution for Snapshot Databases (VLDB 2023)","archived":false,"fork":false,"pushed_at":"2023-04-01T08:10:19.000Z","size":6693,"stargazers_count":23,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-25T10:04:39.841Z","etag":null,"topics":["ddl","mvcc","online-ddl","schema-evolution","snapshot-isolation","transaction-processing","transactional-ddl"],"latest_commit_sha":null,"homepage":"https://www.vldb.org/pvldb/vol16/p140-hu.pdf","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sfu-dis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-08T06:09:59.000Z","updated_at":"2025-01-24T06:45:28.000Z","dependencies_parsed_at":"2023-01-20T04:06:52.599Z","dependency_job_id":null,"html_url":"https://github.com/sfu-dis/tesseract","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfu-dis%2Ftesseract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfu-dis%2Ftesseract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfu-dis%2Ftesseract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfu-dis%2Ftesseract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sfu-dis","download_url":"https://codeload.github.com/sfu-dis/tesseract/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248411947,"owners_count":21099031,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ddl","mvcc","online-ddl","schema-evolution","snapshot-isolation","transaction-processing","transactional-ddl"],"created_at":"2024-11-25T02:08:29.189Z","updated_at":"2025-04-11T13:51:20.168Z","avatar_url":"https://github.com/sfu-dis.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Tesseract: Online Schema Evolution is (almost) Free for Snapshot Databases\n\nThis repository implements [Tesseract](https://www.vldb.org/pvldb/vol16/p140-hu.pdf). See details in our [VLDB 2023 paper](https://www.vldb.org/pvldb/vol16/p140-hu.pdf) below. If you use our work, please cite:\n\n```\nOnline Schema Evolution is (Almost) Free for Snapshot Databases\nTianxun Hu, Tianzheng Wang and Qingqing Zhou.\nPVLDB 16(2) (VLDB 2023).\n```\n\n#### Software dependencies\n* cmake\n* python2\n* [clang; libcxx; libcxxabi](https://github.com/llvm/llvm-project)\n* libnuma\n* libibverbs\n* libgflags\n* libgoogle-glog\n* liburing\n\nUbuntu\n```\napt-get install -y cmake gcc-11 g++-11 clang-10 libc++-8-dev libc++abi-8-dev\napt-get install -y libnuma-dev libibverbs-dev libgflags-dev libgoogle-glog-dev liburing-dev\n```\n\n#### Environment configurations\nMake sure you have enough huge pages.\n\n* Tesseract uses `mmap` with `MAP_HUGETLB` (available after Linux 2.6.32) to allocate huge pages. Almost all memory allocations come from the space carved out here. Assuming the default huge page size is 2MB, the command below will allocate 2x MB of memory:\n```\nsudo sh -c 'echo [x pages] \u003e /proc/sys/vm/nr_hugepages'\n```\nThis limits the maximum for --node-memory-gb to 10 for a 4-socket machine (see below).\n\n* `mlock` limits. Add the following to `/etc/security/limits.conf` (replace \"[user]\" with your login):\n```\n[user] soft memlock unlimited\n[user] hard memlock unlimited\n```\n*Re-login to apply.*\n\n--------\n#### Build it\nWe do not allow building in the source directory. Suppose we build in a separate directory:\n\n```\n$ mkdir build\n$ cd build\n$ cmake ../ -DCMAKE_BUILD_TYPE=[Debug/Release/RelWithDebInfo]\n$ make -jN\n```\n\nCurrently the code can compile under Clang 10.0+. E.g., to use Clang 10.0, issue the following `cmake` command instead:\n```\n$ CC=clang-10.0 CXX=clang++-10.0 cmake ../ -DCMAKE_BUILD_TYPE=[Debug/Release/RelWithDebInfo]\n```\n\nAfter `make` there will be six executables under `build`: \n\n`tesseract_DDL_COPY` that runs DDLs with Tesseract approach;\n\n`tesseract_DDL_LAZY_COPY` that runs DDLs with lazy approach;\n\n`tesseract_DDL_OPT_LAZY_COPY` that runs DDLs with Tesseract-lazy approach;\n\n`tesseract_DDL_BLOCK` that runs DDLs with blocking approach;\n\n`tesseract_DDL_SI` that runs DDLs with naive SI approach;\n\n`tesseract_NO_DDL` that runs with no DDLs;\n\n#### Run it\n```\n$run.sh \\\n       [executable] \\\n       [benchmark] \\\n       [scale-factor] \\\n       [num-threads] \\\n       [duration (seconds)] \\\n       \"[other system-wide runtime options]\" \\\n       \"[other benchmark-specific runtime options]\"`\n```\n\n#### Run example\n```\n# Add column with Tesseract approach under TPC-CD:\n./run.sh ./tesseract_DDL_COPY tpcc_org 50 31 10 \"-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=0 -pcommit_queue_length=500000 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_large_ddl_begin_timestamp=1\" \"-d 2 -e 0 -s 1\"\n\n# Add column with Tesseract-lazy approach under TPC-CD:\n./run.sh ./tesseract_DDL_LAZY_COPY tpcc_org 50 31 10 \"-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=1 -pcommit_queue_length=500000 -enable_lazy_background=1 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_lazy_on_conflict_do_nothing=1 -late_background_start_ms=0\" \"-d 2 -e 0 -s 1\"\n\n# Add column with lazy approach under TPC-CD:\n./run.sh ./tesseract_DDL_LAZY_COPY tpcc_org 50 31 10 \"-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=1 -pcommit_queue_length=500000 -enable_lazy_background=1 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_lazy_on_conflict_do_nothing=1 -late_background_start_ms=0\" \"-d 2 -e 0 -s 1\"\n\n# Add column with Blocking approach under TPC-CD:\n./run.sh ./tesseract_DDL_BLOCK tpcc_org 50 31 10 \"-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=0 -pcommit_queue_length=500000 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25\" \"-d 2 -e 0\"\n```\n\n#### System-wide runtime options\n\n`-node_memory_gb`: how many GBs of memory to allocate per socket.\n\n`-null_log_device`: Whether to flush log buffer.\n\n`-tmpfs_dir`: location of the log buffer's mmap file. Default: `/tmpfs/`.\n\n`cdc_threads`: number of CDC threads.\n\n`scan_threads`: number of scan threads.\n\n`enable_parallel_scan_cdc`: Whether enable parallel scan and CDC.\n\n#### Benchmark-specific runtime options\n\n`-d 2`: DDL transaction starts after 2 seconds\n\n`-e 0`: DDL workload, see below:\n```\nTPC-CD:\n    0: Add column\n    1: Table Split\n    2: Preaggregate\n    3: Create Index\n    4: Table join\n    9: Add constraint\n    10: Add column and constraint\n    \nODDLB:\n    0: Add column\n    2: Add constraint\n    3: Add column and constraint\n```\n\n`-w D`: ODDLB - write-heavy workload D.\n\n`-s 1`: TPC-CD - pick a random home warehouse, 0 if not\n\n`-s 100000000`: ODDLB - number of records in the database table.\n\n`-r 10`: 10 queries per transaction.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfu-dis%2Ftesseract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsfu-dis%2Ftesseract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfu-dis%2Ftesseract/lists"}