{"id":14972071,"url":"https://github.com/alibaba/innodb-java-reader","last_synced_at":"2025-10-25T16:46:10.460Z","repository":{"id":41010035,"uuid":"232306581","full_name":"alibaba/innodb-java-reader","owner":"alibaba","description":"A library and command-line tool to access MySQL InnoDB data file directly in Java","archived":false,"fork":false,"pushed_at":"2023-03-06T15:36:49.000Z","size":5256,"stargazers_count":476,"open_issues_count":12,"forks_count":118,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-04T04:12:45.561Z","etag":null,"topics":["command-line-tool","heatmap","innodb","java","mysql","mysql-database","mysqldump"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibaba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-01-07T11:03:54.000Z","updated_at":"2025-04-02T01:13:58.000Z","dependencies_parsed_at":"2023-10-20T16:37:39.304Z","dependency_job_id":null,"html_url":"https://github.com/alibaba/innodb-java-reader","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Finnodb-java-reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Finnodb-java-reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Finnodb-java-reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Finnodb-java-reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibaba","download_url":"https://codeload.github.com/alibaba/innodb-java-reader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248655160,"owners_count":21140436,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","heatmap","innodb","java","mysql","mysql-database","mysqldump"],"created_at":"2024-09-24T13:46:20.711Z","updated_at":"2025-10-25T16:46:05.406Z","avatar_url":"https://github.com/alibaba.png","language":"Java","readme":"# MySQL InnoDB Java Reader\n\n[![Build Status](https://travis-ci.org/alibaba/innodb-java-reader.svg?branch=master)](https://travis-ci.org/alibaba/innodb-java-reader)\n[![codecov](https://codecov.io/gh/alibaba/innodb-java-reader/branch/master/graph/badge.svg)](https://codecov.io/gh/alibaba/innodb-java-reader)\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.alibaba.database/innodb-java-reader/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.alibaba.database/innodb-java-reader)\n[![GitHub release](https://img.shields.io/github/release/alibaba/innodb-java-reader.svg)](https://github.com/alibaba/innodb-java-reader/releases)\n[![javadoc](https://javadoc.io/badge2/com.alibaba.database/innodb-java-reader/javadoc.svg)](https://javadoc.io/doc/com.alibaba.database/innodb-java-reader)\n[![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)](http://www.apache.org/licenses/LICENSE-2.0)\n\ninnodb-java-reader is a java implementation to access MySQL InnoDB storage engine file directly. With the library or command-line tool, it provides read-only features like examining pages, looking up record by primary key, secondary key and generating page heatmap by LSN or filling rate. Innodb-java-reader can be a tool to dump/query table by offloading from MySQL server. Moreover, this project is useful for prototyping and learning MySQL.\n\n[1. Background](#1-background)\n\n[2. Prerequisites](#2-prerequisites)\n\n[3. Features](#3-features)\n\n[4. Quick Start](#4-quickstart)\n\n[5. API usage](#5-api-usage)\n\n[6. Command-line tool](#6-command-line-tool)\n\n[7. Building](#7-building)\n\n[8. Benchmark](#8-benchmark)\n\n[9. Future works](#9-future-works)\n\n## 1. Background\n\nInnoDB is a general-purpose storage engine that balances high reliability and high performance in MySQL, since 5.6 InnoDB has become the default MySQL storage engine. In Alibaba, I encountered one performance issue related to MySQL, and this led me to deep dive into InnoDB internal mechanism. To better understand how InnoDB stores data, I introduce this project, and I choose Java language to implement because it is widely used and more understandable. Some of the works are inspired by [Jeremy Cole](https://blog.jcole.us/)'s blog about InnoDB, which helps me a lot. \n\nCurrently this project is production-ready and is able to work in real environment.\n\n## 2. Prerequisites\n\n* Supported MySQL version: 5.6, 5.7, 8.0.\n* Make sure [InnoDB row format](https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html) is either `COMPACT` or `DYNAMIC`.\n* Enable `innodb_file_per_table` , which will create standalone `*.ibd` file for each table.\n* InnoDB file page size is set to 16K.\n\n## 3. Features\n\nThe row format of a table determines how rows are physically stored, which in turn can affect the performance of queries and DML operations. `innodb-java-reader` supports `COMPACT` or `DYNAMIC` page format and can work smartly to choose the right page format decoder to read pages.\n\n`innodb-java-reader` supports operations like examining pages' information, looking up record by primary key and secondary key, range querying by primary key and secondary key, querying records by page number, dumping table and generating page heatmap \u0026 filling rate.\n\nSupported column types are listed below. Java type mapping refer to [docs](docs/mysql_to_java_type.md).\n\n| Type | Support column types |\n| ---- | -------------------- |\n| Numeric | TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, FLOAT, REAL, DOUBLE, DECIMAL, NUMERIC |\n| String and Binary | CHAR, VARCHAR, BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT |\n| Date and Time | DATETIME, TIMESTAMP, TIME (*support precision*), YEAR, DATE |\n| Other | BOOL, BOOLEAN, ENUM, SET, BIT |\n\n## 4. Quickstart\n\n### Dependency\n\n**Maven**\n\n```\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.alibaba.database\u003c/groupId\u003e\n  \u003cartifactId\u003einnodb-java-reader\u003c/artifactId\u003e\n  \u003cversion\u003e1.0.10\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nTo use snapshot version, please refer to this [doc](docs/how_to_use_snapshot_version.md).\n\n#### API examples\n\nHere's an example to look up record in a table by primary key.\n\n```\nString createTableSql = \"CREATE TABLE t (id int(11), a bigint(20)) ENGINE=InnoDB;\";\nString ibdFilePath = \"/usr/local/mysql/data/test/t.ibd\";\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  GenericRecord record = reader.queryByPrimaryKey(ImmutableList.of(4));\n  Object[] values = record.getValues();\n  System.out.println(Arrays.asList(values));\n}\n```\n\nMore usage you can jump to [API usage](#5-api-usage) section. The best place to better explore is to look at the examples for some common use cases addressed here in [innodb-java-reader-demo](innodb-java-reader-demo/src/main/java/com/alibaba/innodb/java/reader).\n\n#### Command-line examples\n\nHere's an example to dump all records with command-line tool.\n\nYou can download latest version of `innodb-java-reader-cli.jar` from [release page](https://github.com/alibaba/innodb-java-reader/releases) or [build](#7-building) from source.\n\n`t.ibd` is the InnoDB ibd file path. `t.sql` is where the output of `SHOW CREATE TABLE \u003ctable_name\u003e` saved as content, you can generate table definitions by executing `mysqldump -d -u\u003cusername\u003e -p\u003cpassword\u003e -h \u003chostname\u003e \u003cdbname\u003e` in command-line.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-all -o output.dat\n```\nThe result is the same by running `mysql -N -uroot -e \"select * from test.t\" \u003e output`.\n\nBut to be aware that if pages are not flushed from InnoDB Buffer pool to disk, then **the result maybe not consistent**. How long do dirty pages usually stay dirty in memory? That is a tough question, InnoDB leverages WAL in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism controls when there need pages to flush, like Page Cleaner thread, adaptive flushing, etc.\n\nHere's another example to generate heatmap.\n\nAssume we have a table without secondary index, and the primary key is built by inserting rows in key order. Then run the following command.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c gen-lsn-heatmap -args ./out.html\n```\n\nThe heatmap shows as below.\n\n![](http://img.neoremind.com/wp-content/uploads/2020/05/table-pk.png)\n\nThe pages are allocated and filled perfectly as color changes from blue (LSN is smallest) to red (LSN is biggest), from the beginning of the file towards to the end.\n\nMore usage you can jump to [Command-line tool](#6-command-line-tool) section.\n\n\n## 5. API usage\n\nYou can prepare some data beforehand. All the following examples will be based on a table named `t`.\n\n```\nCREATE TABLE `t`\n(`id` int(11) NOT NULL,\n`a` bigint(20) NOT NULL,\n`b` varchar(64) NOT NULL,\nPRIMARY KEY (`id`)) ENGINE=InnoDB;\n\ndelimiter ;;\ndrop procedure if EXISTS idata;\ncreate procedure idata()\n  begin\n    declare i int;\n    set i=1;\n    while(i\u003c=5)do\n      insert into t values(i, i * 2, REPEAT(char(97+((i - 1) % 26)), 8));\n      set i=i+1;\n    end while;\n  end;;\ndelimiter ;\ncall idata();\n```\n\nAfter creating and populating the very simple table, there should be 5 rows.\n\n```\nmysql\u003e select * from t;\n+----+----+----------+\n| id | a  | b        |\n+----+----+----------+\n|  1 |  2 | aaaaaaaa |\n|  2 |  4 | bbbbbbbb |\n|  3 |  6 | cccccccc |\n|  4 |  8 | dddddddd |\n|  5 | 10 | eeeeeeee |\n+----+----+----------+\n```\n\n### 5.1 Setting table definition\n\nThere are two ways to specify a table definition, or `TableDef` within the library.\n\n#### Using SQL\n\nRun `SHOW CREATE TABLE` statement in MySQL command-line and copy the output as a string. Inside `innodb-java-reader`, it leverages [JSqlParser](https://github.com/JSQLParser/JSqlParser) and [antlr4](https://github.com/antlr/antlr4) to parse SQL to AST and get the table definition.\n\nYou can generate all table definitions by executing `mysqldump -d -u\u003cusername\u003e -p\u003cpassword\u003e -h \u003chostname\u003e \u003cdbname\u003e` in command-line.\n\nFor example,\n```\nString createTableSql = \"CREATE TABLE `t`\\n\"\n        + \"(`id` int(11) NOT NULL ,\\n\"\n        + \"`a` bigint(20) NOT NULL,\\n\"\n        + \"`b` varchar(64) NOT NULL,\\n\"\n        + \"PRIMARY KEY (`id`))\\n\"\n        + \"ENGINE=InnoDB;\";\n```\n\n#### Using API\n\nCreate a `TableDef` instance with all `Column`s. `Column` can be created in fluent style by setting the required column `name`, `type`, while there are optional settings to specify nullable, charset or if the column is primary key.\n\nFor variable-length or fixed-length column types like` VARCHAR`, `VARBINARY`, `CHAR`, column type can be declared with a length that indicates the maximum length you want to store, just like what you define a DDL in MySQL. For integer types, the display width of the integer column will be ignored.\n\nFor example, to create table with single primary key.\n```\nTableDef tableDef = new TableDef().setDefaultCharset(\"utf8mb4\")\n    .addColumn(new Column().setName(\"id\").setType(\"int(11)\").setNullable(false).setPrimaryKey(true))\n    .addColumn(new Column().setName(\"a\").setType(\"bigint(20)\").setNullable(false))\n    .addColumn(new Column().setName(\"b\").setType(\"varchar(64)\").setNullable(false))\n    .addColumn(new Column().setName(\"c\").setType(\"varchar(1024)\").setNullable(true));\n```\n\nTo create table with multiple column primary key.\n```\nTableDef tableDef = new TableDef()\n    .setDefaultCharset(\"utf8mb4\")\n    .addColumn(new Column().setName(\"id\").setType(\"int(11)\").setNullable(false)\n    .addColumn(new Column().setName(\"a\").setType(\"bigint(20)\").setNullable(false))\n    .addColumn(new Column().setName(\"b\").setType(\"varchar(64)\").setNullable(false))\n    .addColumn(new Column().setName(\"c\").setType(\"varchar(1024)\").setNullable(true))\n    .setPrimaryKeyColumns(ImmutableList.of(\"a\", \"b\"));\n```\n\nTable without primary key is also supported. By default, a 6 bytes ROW ID will be treated as primary key.\n\n### 5.2 Creating TableReader\n\nThread-safe class `TableReader` enables you to call all the useful APIs.\n\nWith try-with-resources statement, you can ensure that IO resource used by `TableReader` is closed at the end of all invocations. By default, `TableReader` leverage **buffer IO**, pages are read from page cache into `DirectByteBuffer` and then copy to heap to manage their lifecycle. This framework is also open for extension to use **mmap** or **direct io**.\n\nThere are two constructors, one needs to provide tablespace file `*.ibd` file path and *create table* sql, while the other needs the `*.ibd` file path and `TableDef`.\n\nFor example,\n\n```\nString createTableSql = \"CREATE TABLE `tb11`\\n\"\n        + \"(`id` int(11) NOT NULL ,\\n\"\n        + \"`a` bigint(20) NOT NULL,\\n\"\n        + \"`b` varchar(64) NOT NULL,\\n\"\n        + \"PRIMARY KEY (`id`))\\n\"\n        + \"ENGINE=InnoDB;\";\nString ibdFilePath = \"/usr/local/mysql/data/test/t.ibd\";\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  // API invocation goes here...\n}\n```\n\nMoreover, there is a useful factory utility which can facilitate the process of creating `TableReader`. In this case, table definition is no longer needed, so you can skip table definition using SQL or API as section 5.1 describes.\n\nFor example,\n\n```\nString createTableSql = \"CREATE TABLE `tb11`\\n\"\n        + \"(`id` int(11) NOT NULL ,\\n\"\n        + \"`a` bigint(20) NOT NULL,\\n\"\n        + \"`b` varchar(64) NOT NULL,\\n\"\n        + \"PRIMARY KEY (`id`))\\n\"\n        + \"ENGINE=InnoDB;\";\n\nTableDefProvider tableDefProvider = new SqlTableDefProvider(createTableSql);\nTableReaderFactory tableReaderFactory = TableReaderFactory.builder()\n    .withProvider(tableDefProvider)\n    .withDataFileBasePath(\"/usr/local/mysql/data/test/\")\n    .build();\nTableReader reader = tableReaderFactory.createTableReader(\"tb11\");\ntry {\n  reader.open();\n  // API invocation goes here...\n} finally {\n  reader.close();\n}\n```\n\nYou can also provide a sql file path, the file\ncontains multiple SQLs, the table name should match the ibd file name, or else the tool is not able to \nidentify the ibd file to read, you can generate the file by executing `mysqldump -d -u\u003cusername\u003e -p\u003cpassword\u003e -h \u003chostname\u003e \u003cdbname\u003e` in command-line.\n\n```\nTableDefProvider tableDefProvider = new SqlFileTableDefProvider(\"/path/mysqldump_ddl.sql\");\nTableReaderFactory tableReaderFactory = TableReaderFactory.builder()\n    .withProvider(tableDefProvider)\n    .withDataFileBasePath(\"/usr/local/mysql/data/test/\")\n    .build();\nTableReader reader = tableReaderFactory.createTableReader(\"t\");\ntry {\n  reader.open();\n  // API invocation goes here...\n} finally {\n  reader.close();\n}\n```\n\nThe provider is extensible, in the future, we plan to support `MysqlFrmTableDefProvider` as well.\n\n### 5.3 Examining a tablespace file\n\n#### Listing all pages\n\nThis will give you a high-level overview about InnoDB file structure, as it results in a list of `AbstractPage`, for example, you can get all contiguous pages of their basic information.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  long numOfPages = reader.getNumOfPages();\n  List\u003cAbstractPage\u003e pages = reader.readAllPages();\n}\n```\n\n`AbstractPage` is the parent class of all pages. The page definition can be found in `fil0fil.h`. `innodb-java-reader` supports some of the commonly used page types like FspHdr/Xdes page, insert buffer bitmap page, index page, blob page, SDI page (only in MySQL 8.0 or later) and allocated page (unused page).\n\n![](http://img.neoremind.com/wp-content/uploads/2020/05/abstract-page.png)\n\n`AbstractPage` base class includes 38 bytes `FilHeader` and 8 bytes `FilTrailer` for all page type. The raw byte array body will be extracted accordingly for sub-classes. You can find the APIs regarding how to access the detailed structure for different types under [page](innodb-java-reader/src/main/java/com/alibaba/innodb/java/reader/page) package in Javadoc.\n\nFor example, the demo table `t` will result as below.\n\n```\n0,FILE_SPACE_HEADER,numPagesUsed=4,size=6,xdes.size=1\n1,IBUF_BITMAP\n2,INODE,inode.size=2\n3,INDEX\n4,ALLOCATED\n5,ALLOCATED\n```\n\nMoreover, `Iterator\u003cAbstractPage\u003e getPageIterator()` is useful to get pages iteratively.\n\n#### Viewing one page\n\nYou can query page one by one. For supported page types, you can check the internal information.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  AbstractPage page = reader.readPage(3);\n}\n```\n\n### 5.4 Querying a tablespace file\n\n#### Query all records\n\nThis will walk through the B+ tree index in ascending order, you can take it as a full-table scan operation as well. First it locates to the root page of the primary key, and do a depth-first traversal recursively, along the traversal it will collects all the records.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  List\u003cGenericRecord\u003e recordList = reader.queryAll();\n  for (GenericRecord record : recordList) {\n    Object[] values = record.getValues();\n    System.out.println(Arrays.asList(values));\n    assert record.getPrimaryKey() == record.get(\"id\");\n  }\n}\n```\n\noutput result as blow.\n\n```\n[1, 2, aaaaaaaa]\n[2, 4, bbbbbbbb]\n[3, 6, cccccccc]\n[4, 8, dddddddd]\n[5, 10, eeeeeeee]\n```\n\nGenericRecord represents one row.\n\n- To retrieve a column data through column name, you can invoke `Object get(String columnName)` .\n- To retrieve a column data by column index, you can invoke `Object get(int index)` .\n- To retrieve the primary key, you can invoke `Object getPrimaryKey()` .\n\n`queryAll` accepts an optional argument `Predicate\u003cRecord\u003e ` to filter.\n\nThis feature enables you to dump data if data persists in InnoDB file by offloading MySQL.\n\n#### Query by page number\n\nThis only works for index page type.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  List\u003cGenericRecord\u003e recordList = reader.queryByPageNumber(3);\n}\n```\n\nFor leaf B+ tree page, the result record will be rows of a table.\n\nFor non-leaf page in multi-level B+ tree index, the result record will be the primary with all the other columns as `NULL`. You can check whether it is leaf or not and get the child page number in clustered index.\n\n```\nif (!record.isLeafRecord()) {\n  System.out.println(record.getChildPageNumber());\n}\n```\n\n#### Query by primary key\n\nB+ tree is an efficient data structure to do point and range query, it requires limited number of disk IO operations even for a very large table since the depth of the tree is usually not very big, that is why B+ tree scales nicely.\n\nTo look up record by primary key, innodb-java-reader will start from the root page in clustered index and do point-query in B+ tree index.\n\nIf the page is leaf, then it will do binary search in page directory slots to locate the nearest record (the highest key that smaller than the target key, `innodb-java-reader` leverages [search-insertion-position for sorted array](https://leetcode.com/problems/search-insert-position/) algorithm to do that), and walk through the records one by one as they are singly linked in one page until the target value found or return null if not present.\n\nFor non-leaf page, the record is simply the child page number, so `innodb-java-reader` will go deeper in the multiple-level B+ tree to the child page and run recursively.\n\nPrimary key parameter is a list of objects, single column primary key (list size will be 1) and multiple column primary key are supported.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  GenericRecord record = reader.queryByPrimaryKey(ImmutableList.of(4));\n  Object[] values = record.getValues();\n  System.out.println(Arrays.asList(values));\n  assert record.getPrimaryKey() == record.get(\"id\");\n  System.out.println(\"id=\" + record.get(\"id\"));\n  System.out.println(\"a=\" + record.get(\"a\"));\n}\n```\n\nNote that in MySQL 5.7 or earlier version, usually page 3 will be the root page of the clustered index, page 4 will be the root of the first secondary key, etc. After MySQL 8.0 or later, page 3 is usually the SDI page with data dictionary, and root page will usually go next to page 4. `innodb-java-reader` assumes the root page is either page 3 or 4 and can work smartly to determine where to start.\n\n#### Range query by primary key\n\n`rangeQueryByPrimaryKey` method requires at least 4 arguments: lower key, lower operator, upper key and upper operator. Operators include `\u003e`, `\u003e=`, `\u003c`, `\u003c=` and `nop` (works on unlimited bound).\n\nMySQL InnoDB engine will have its own way to execute a range query, here in innodb-java-reader, it will use a naive and simple way: go deep into the leaf node of B+ tree index, and visit page by page, record by record, the algorithm looks like below:\n\n1. Lookup the record greater than or equal to the lower bound target key.\n2. Lookup the record smaller than the upper bound target key.\n3. Start from the record found in step 1, go ahead by the singly linked record pointer to visit each record next until `SUPREMUM` record found, which mean the end of the page has met.\n4. There are pointers stored in the `FilHeader`, point to the logical previous and next page. Go to the next page and query all records from the `INFINIMUM` record. Repeat the process in step 3. If the page is where the record smaller than the upper bound target key resides, then it will compare record read with the target end key, so that we can exit nicely.\n\nFor example, the lower and upper bound target key can be empty list, which means no limit is specified.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  List\u003cGenericRecord\u003e recordList = reader.rangeQueryByPrimaryKey(\n          ImmutableList.of(5), ComparisonOperator.GT,\n          ImmutableList.of(8), ComparisonOperator.LT);\n  recordList = reader.rangeQueryByPrimaryKey(null, null);\n  recordList = reader.rangeQueryByPrimaryKey(5, null);\n}\n```\n\n`rangeQueryByPrimaryKey` accepts an optional argument `Predicate\u003cRecord\u003e ` to filter and `List\u003cString\u003e` to project selected columns.\n\n#### Iterator pattern\n\nFor extremely large tablespace, querying like `queryAll` or `rangeQueryByPrimaryKey` would cause out of memory error since data cannot fit into memory. The iterator pattern will help you out, it will load page by page until you really visit these records.\n\nFor example, `getQueryAllIterator` will return an iterator to visit all records.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  Iterator\u003cGenericRecord\u003e iterator = reader.getQueryAllIterator();\n  int count = 0;\n  while (iterator.hasNext()) {\n    GenericRecord record = iterator.next();\n    Object[] values = record.getValues();\n    System.out.println(Arrays.asList(values));\n    count++;\n  }\n  System.out.println(count);\n}\n```\n\nFor example, `getRangeQueryIterator` will return an iterator to visit targeted records based on the lower and upper bound just like what `rangeQueryByPrimaryKey` does.\n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  Iterator\u003cGenericRecord\u003e iterator = reader.getRangeQueryIterator(\n          ImmutableList.of(5), ComparisonOperator.GTE,\n          ImmutableList.of(8), ComparisonOperator.LTE);\n  while (iterator.hasNext()) {\n    GenericRecord record = iterator.next();\n    Object[] values = record.getValues();\n    System.out.println(Arrays.asList(values));\n  }\n}\n```\n\n#### Query by secondary key\n\n`getRecordIteratorBySk` will return an iterator to scan table by secondary key, the order will be the same as in secondary key. \n\n```\ntry (TableReader reader = new TableReaderImpl(ibdFilePath, createTableSql)) {\n  reader.open();\n  Iterator\u003cGenericRecord\u003e iterator = reader.getRecordIteratorBySk(\"key_a\",\n      ImmutableList.of(2L), ComparisonOperator.GTE,\n      ImmutableList.of(9L), ComparisonOperator.LT);\n  while (iterator.hasNext()) {\n    GenericRecord record = iterator.next();\n    Object[] values = record.getValues();\n    System.out.println(Arrays.asList(values));\n  }\n}\n```\n\nProjection and ordering work as below. Covering index is supported, it skips the operation to look up record back to clustered key (primary key), which will usually be more performant.\n\n```\nboolean isAsc = false;\nIterator\u003cGenericRecord\u003e iterator = reader.getRecordIteratorBySk(\"key_a\",\n    ImmutableList.of(6L), ComparisonOperator.GTE,\n    null, ComparisonOperator.NOP,\n    ImmutableList.of(\"id\", \"a\", \"b\"), isAsc);\n```\n\nNote that if table has ever been altered to add or remove indices, the secondary key root page number may be incorrect, and cause error, please goes to [FAQ](docs/FAQ.md).\n\n#### Filtering and projection\n\nFiltering works on `queryAll` and `rangeQueryByPrimaryKey`, this is more likely index condition pushdown.\n\nProjection works for almost all APIs, for example.\n\n```\n// range query with projection\nList\u003cGenericRecord\u003e recordList = reader.rangeQueryByPrimaryKey(\n    ImmutableList.of(5), ComparisonOperator.GT,\n    ImmutableList.of(8), ComparisonOperator.LT,\n    ImmutableList.of(\"a\"));\n\n// range query with no limit, equivalent to query all, with projection\nIterator\u003cGenericRecord\u003e iterator = reader.getRangeQueryIterator(\n          null, ComparisonOperator.NOP,\n          null, ComparisonOperator.NOP,\n          ImmutableList.of(\"a\"));\n```\n\n#### Ordering\n\nOrdering works on `getQueryAllIterator` and `getRangeQueryIterator`, for example.\n\n```\nboolean ascOrder = false;\nreader.getRangeQueryIterator(\n  ImmutableList.of(2), ComparisonOperator.GTE,\n  ImmutableList.of(5), ComparisonOperator.LT,\n  ascOrder);\n```\n\n## 6 Command-line tool\n\n### 6.1 Usage\n\nYou can download latest version of `innodb-java-reader-cli.jar` from [release page](https://github.com/alibaba/innodb-java-reader/releases) or [build](#7-building) from source.\n\nUsage shows as below.\n\n````\nusage: java -jar innodb-java-reader-cli.jar [-args \u003carg\u003e] [-c \u003carg\u003e]\n       [-delimiter \u003carg\u003e] [-desc] [-h] [-i \u003carg\u003e] [-iomode \u003carg\u003e] [-json]\n       [-jsonpretty] [-nullstring \u003carg\u003e] [-o \u003carg\u003e] [-projection \u003carg\u003e]\n       [-quotemode \u003carg\u003e] [-s \u003carg\u003e] [-showheader] [-skname \u003carg\u003e]\n       [-skordinal \u003carg\u003e] [-skrootpage \u003carg\u003e]\n -args \u003carg\u003e                             arguments\n -c,--command \u003carg\u003e                      mandatory. command to run, valid\n                                         commands are:\n                                         show-all-pages,show-pages,query-b\n                                         y-page-number,query-by-pk,query-b\n                                         y-sk,query-all,range-query-by-pk,\n                                         gen-lsn-heatmap,gen-filling-rate-\n                                         heatmap,get-all-index-page-fillin\n                                         g-rate\n -delimiter,--delimiter \u003carg\u003e            field delimiter, default is tab\n -desc,--desc                            if records sorted in descending\n                                         order, works for query all and\n                                         range query\n -h,--help                               usage\n -i,--ibd-file-path \u003carg\u003e                mandatory. innodb file path with\n                                         suffix of .ibd\n -iomode,--output-io-mode \u003carg\u003e          output io mode, valid modes are:\n                                         buffer,mmap,direct\n -json,--json-style                      set to true if you would like to\n                                         show page info in json format\n                                         style\n -jsonpretty,--json-pretty-style         set to true if you would like to\n                                         show page info in json pretty\n                                         format style\n -nullstring,--null-string \u003carg\u003e         null value string, default is\n                                         \"null\"\n -o,--output \u003carg\u003e                       save result to file instead of\n                                         console, the argument is the file\n                                         path\n -projection,--projection \u003carg\u003e          projection list with column names\n                                         delimited by comma\n -quotemode,--quote-mode \u003carg\u003e           value quote mode, valid modes\n                                         are: all,nonnull,nonnumeric,none,\n                                         default is none\n -s,--create-table-sql-file-path \u003carg\u003e   create table sql file path, the\n                                         sql is DDL as SHOW CREATE TABLE\n                                         \u003ctable_name\u003e, the file can\n                                         contain multiple SQLs, the table\n                                         name should match the ibd file\n                                         name, or else the tool is not\n                                         able to identify the ibd file to\n                                         read, you can generate the file\n                                         by executing mysqldump -d\n                                         -u\u003cusername\u003e -p\u003cpassword\u003e -h\n                                         \u003chostname\u003e \u003cdbname\u003e` in\n                                         command-line.\n -showheader,--show-header               set to true if you want to show\n                                         table header when dumping table\n -skname,--skname \u003carg\u003e                  secondary key name\n -skordinal,--skordinal \u003carg\u003e            secondary key ordinal in DDL\n -skrootpage,--skrootpage \u003carg\u003e          secondary key root page number\n````\n\nYou can customize log4j configuration by adding `-Dlog4j.configuration=file:/path/log4j.properties` in command.\n\n### 6.2 Examples\n\n#### Listing all pages\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c show-all-pages\n```\n\nOutput:\n\n```\n=====page number, page type, other info=====\n0,FILE_SPACE_HEADER,space=141,numPagesUsed=4,size=6,xdes.size=1\n1,IBUF_BITMAP\n2,INODE,inode.size=2\n3,INDEX,root.page=true,index.id=176,level=0,numOfRecs=5,num.dir.slot=2,garbage.space=0\n4,ALLOCATED\n5,ALLOCATED\n```\n\n#### Examining some pages\n\nArguments are page numbers, separated by comma.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c show-pages -args \"3,4,5\"\n```\n\n`ToString` method will be invoked for every page examined and print on console. You can add `--json-style` or `--json-pretty-style` to print out information in more human readable way.\n\n#### Querying all records\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-all\n```\n\nThe result is the same as `mysql -N -uroot -e \"select * from test.t\" \u003e output.dat`\n\nField is delimited by `tab`, you can specify `-delimiter \",\"` to use comma as delimiter.\n\n#### Querying by page number\n\nArgument is page number, the results is all the records within the page, only index page type is supported.\n\nFor B+tree non-leaf page, the records are keys only, for leaf page, the records are full tuples.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-by-page-number -args 3\n```\n\n #### Querying by primary key\n\nArgument is the target key.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-by-pk -args 5\n```\n\nFor composite primary key, fields will be delimited by `,`, you can change the delimiter by applying `-Dinnodb.java.reader.composite.key.delimiter` or setting environment.\n\nFor example,\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-by-pk -args abc,123,bcd\n```\n\n#### Range querying by primary key\n\nArguments are `lower operator;lower bound;upper operator;upper bound` separated by `;`.\n\nOperators include `\u003e`, `\u003e=`, `\u003c`, `\u003c=` and `nop` (works on unlimited bound).\n\nYou can change delimiter by `-Dinnodb.java.reader.range.query.key.delimiter` or set environment.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c range-query-by-pk -args \"\u003e=;1;\u003c;3\"\n\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c range-query-by-pk -args \"\u003e=;1;nop;null\" // no upper limit\n```\n\nFor composite key, args will like `\u003e;abc,123,bcd;\u003c;xyz,5,jkl` or `\u003e;abc,123,bcd;\u003c;xyz,5,null`.\n\n#### Querying by secondary key\n\nArgument is like \"Range querying by primary key\", you should provide the key name. For example, the following command result will be the same as \"SELECT * FROM t WHERE a = 1\";\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-by-sk -args \"\u003e=;1;\u003c=;1\" -skname \"key_a\"\n```\n\n#### Dump data\n\nYou can use command-line tool to dump data, but dirty pages might not be flushed to disk, so the data consistency is what you must consider. You can dump records by `query-all` or `range-query-by-pk` like below.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c query-all -o output.dat\n```\n\nThe `output.dat` file contains record per line with tab delimited for fields.\n\nThe result is the same as `mysql -N -uroot -e \"select * from test.t\" \u003e output.dat`\n\n#### Sepecify dump IO mode\n\nBy default, dumping data will use `mmap` to write to file, you can specify `-iomode buffer` or `-iomode direct` as well. Note if no `-o` is used, system IO redirect is not efficient.\n\n#### Generating LSN heatmap\n\nArguments are the output html file path, the heatmap width and height.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c gen-lsn-heatmap -args \"./out.html 800 1000\"\n```\n\nHere is an example if we have a table with random primary key insertion order. Then many pages will be revisited, as illustrated in the image below, most of the pages are \"red\" colored, which means those pages LSN are close to each other.\n\n![](http://img.neoremind.com/wp-content/uploads/2020/05/random-pk.png)\n\nAnother example will be a table with two indices, one is primary key built by inserting rows in key order, the other is a secondary key with random insertion order. As you can see, the primary key index is written to in ascending order as they are visited from the beginning of the file until the end. Pages of the secondary keys are \"red\" colored, which means those pages LSN are close to each other.\n\n![](http://img.neoremind.com/wp-content/uploads/2020/05/table-secondary-index2.png)\n\n#### Generating filling rate heatmap\n\nArguments are the output html file path, the heatmap width and height.\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c gen-filling-rate-heatmap -args \"./out.html 800 1000\"\n```\n\nFilling rate, also known as page filling factor, means how efficient for InnoDB to make use of storage space. InnoDB store records in row-oriented layout, usually this is good for OLTP scenario. While in big data industry, columnar storage format is more preferred, because for performance, it can read required data, skip unnecessary deserialization, leverage specific encoding and better for compression, so the storage space is much more saved. Although, row-oriented format is not friendly in term of file size, we still want to know the space occupied by data, InnoDB file can be fragmented due to logical deletion or B+ tree splitting. Filling rate for every page is calculated by examining `used space / page size`, used space equals to `heap_top_position + page_directory_slots_bytes + FilTrailer - garbage_space`. This is different from `data_free` value when you examine a table through `information_schema.TABLES`, `data_free` means the space allocated on disk for, but not used.\n\nAssume we build a table by inserting rows in sequential order. The page filling rate will be more than 90 percent initially.\n\n![](docs/images/filling-rate1.png)\n\nAfter deleting some rows. Looking at the filling rate heatmap, we can see some pages are fragmented and the filling rate drops dramatically.\n\n![](docs/images/filling-rate2.png)\n\nAfter `OPTIMIZED TABLE \u003cT\u003e`, the table filling rate will go back to more than 90 percent.\n\n![](docs/images/filling-rate3.png)\n\n#### Get all index page filling rate\n\n```\njava -jar innodb-java-reader-cli.jar \\\n  -ibd-file-path /usr/local/mysql/data/test/t.ibd \\\n  -create-table-sql-file-path t.sql \\\n  -c get-all-index-page-filling-rate\n```\n\n## 7 Building\n\n`innodb-java-reader` is a standard Maven project. Simply run the following command from the project root directory, make sure all unit testcases are passed.\n\n```\nmvn clean install\n```\n\nUse the executable jar `innodb-java-reader-cli/target/innodb-java-reader-cli.jar` to run command.\n\n## 8 Benchmark\n\nFor benchmark of `innodb-java-reader`, `mysql -e \"select..\" \u003e output` and `mysqldump`, please [visit here](docs/benchmark.md).\n\nTPC-H `LINEITEM` table scan result is as below.\n![](http://img.neoremind.com/wp-content/uploads/2020/05/tpch_benchmark.png)\n\n## 9 Future works\n\n* Support MySQL 8.0 newly introduced LOB page.\n* Load table metadata from system tablespace.\n* Support compressed table.\n","funding_links":[],"categories":["数据库中间件","Java"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Finnodb-java-reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibaba%2Finnodb-java-reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Finnodb-java-reader/lists"}