{"id":28325663,"url":"https://github.com/grailbio/gql","last_synced_at":"2025-06-23T15:31:41.977Z","repository":{"id":89120784,"uuid":"219051134","full_name":"grailbio/gql","owner":"grailbio","description":"Query language for bioinformatic data","archived":false,"fork":false,"pushed_at":"2019-11-01T20:07:58.000Z","size":305,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-06-02T06:21:21.119Z","etag":null,"topics":["bam","bioinformatics","golang"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grailbio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-01T19:38:08.000Z","updated_at":"2024-01-20T19:30:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"949faf32-3f85-4d6e-8cb7-7a48c9ea64a0","html_url":"https://github.com/grailbio/gql","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/grailbio/gql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grailbio%2Fgql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grailbio%2Fgql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grailbio%2Fgql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grailbio%2Fgql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grailbio","download_url":"https://codeload.github.com/grailbio/gql/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grailbio%2Fgql/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261504196,"owners_count":23168774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bam","bioinformatics","golang"],"created_at":"2025-05-25T21:14:07.724Z","updated_at":"2025-06-23T15:31:41.964Z","avatar_url":"https://github.com/grailbio.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grail query language\n\nGQL is a bioinformatics query language. You can think of it as an SQL with a funny\nsyntax.\n\n- It can read and write files on S3 directly.\n\n- It can read common bioinformatics files, such as TSV, BAM, BED as tables.\n\n- It can handle arbitrarily large files regardless of the memory capacity of the\n  machine. All data processing happens in a streaming fashion.\n\n- It supports [distributed execution](#distributed-execution) of key functions\n  using [bigslice](https://bigslice.io).\n\n- GQL language syntax is very different from SQL, but if you squint enough, you\n  can see the correspondence with SQL. GQL syntax differs from SQL for several\n  reasons. First, GQL needs to support hierarchical information. For example,\n  sequencing or quality data in a BAM record is treated as a subtable inside a\n  parent BAM table.  SQL handles such queries poorly - for example, SQL cannot\n  read tables whose names are written in a column in another table. Second, some\n  GQL functions, such as `transpose`, have no corresponding SQL counterpart.\n\n## Installing GQL\n\n    go install github.com/grailbio/gql\n\n## Running GQL\n\nIf you invoke `gql` without argument, it starts an interactive prompt,\n`gql\u003e`.  Below are list of interactive commands:\n\n- `help` : shows a help message. You can also invoke 'help name', where _name_\n  is a function or a variable name to show the description of the name or the\n  variable. For example, \"help map\" will show help message for `map` builtin function.\n\n- `logdir [dir]` : sends log messages to files under the given directory.\n  If the directory is omitted, messages will be sent to stderr.\n\n- `quit` : quits gql\n\n- Any other command will be evaluated as an GQL expression.  In an interactive\n  mode, a newline will start evaluation, so an expression must fit in one line.\n\nYou can also invoke GQL with a file containing gql statements.  Directory\n\"scripts\" contains simple examples.\n\nWhen evaluating GQL in a script, an expression can span multiple lines. An\nexpression is terminated by ';' or EOF.\n\nYou can pass a sequence of `-flag=value` or just `-flag` after the script name.\nThe `flag` will become a global variable and will be accessible as `flag` in the\nscript.  In the above example, we pass two flags \"in\" and \"out\" to the script\n`testdata/convert.gql`.\n\n    gql testdata/convert.gql -in=/tmp/test.tsv -out=/tmp/test.btsv\n\n### Basic functions\n\n\n#### Reading a table\n\nImagine you have the following TSV file, `gql/testdata/file0.tsv`:\n\n        A\tB\tC\n        10\tab0\tcd0\n        11\tab1\tcd1\n\n\nGQL treats a tsv file as a 2D table.  Function [read](#read) loads a TSV\nfile.\n\n    read(`gql/testdata/file0.tsv`) \n\n\n| #|  A|   B|   C|\n|--|---|----|----|\n| 0| 10| ab0| cd0|\n| 1| 11| ab1| cd1|\n\n\n\nThe first column, '#' is not a real column. It just shows the row number.\n\n#### Selection and projection using `filter` and `map`\n\nOperator \"|\" feeds contents of a table to a function.  [filter](#filter) is a\nselection function. It picks rows that matches `expr` and combines them into a\nnew table.  Within `expr`, you can refer to a column by prefixing `\u0026`.  `\u0026A`\nmeans \"value of the column `A` in the current row\".\n\"var := expr\" assigns the value of expr to the variable.\n\n    f0 := read(`gql/testdata/file0.tsv`);\n    f0 | filter(\u0026A==10)\n\n\n| #|  A|   B|   C|\n|--|---|----|----|\n| 0| 10| ab0| cd0|\n\n\n\n\u003e Note: `\u0026A==10` is a shorthand for `|_| _.A==10`, which is a function that takes\n argument `_` and computes `_.A==10`. GQL syntactically translates an expression\n that contains '\u0026' into a function. So you can write the above example as\n `f0 | filter(|_|{_.A==10})`. It produces the same result. We will discuss functions\n and '\u0026'-expansions in more detail [later](#functions).\n\n\u003e Note: Expression `table | filter(...)` can be also written as `filter(table, ...)`.\n These two are exactly the same, but the former is usually easier to\n read.  This syntax applies to any GQL function that takes a table as the first\n argument: `A(x) | B(y)` is the same as `B(A(x), y)`. Thus:\n\n    f0 | filter(\u0026A==11)\n\n\n| #|  A|   B|   C|\n|--|---|----|----|\n| 0| 11| ab1| cd1|\n\n\n\nFunction [map](#map) projects a table.\n\n    f0 | map({\u0026A, \u0026C})\n\n\n| #|  A|   C|\n|--|---|----|\n| 0| 10| cd0|\n| 1| 11| cd1|\n\n\n\nIn fact, `map` and `filter` are internally one function that accepts different\nkinds of arguments. You can combine projection and selection in one function,\nlike below:\n\n    f0 | map({\u0026A, \u0026C}, filter:=string_has_suffix(\u0026C, \"0\"))\n\n\n| #|  A|   C|\n|--|---|----|\n| 0| 10| cd0|\n\n\n\n\n    f0 | filter(string_has_suffix(\u0026C, \"0\"), map:={\u0026A, \u0026C})\n\n\n| #|  A|   C|\n|--|---|----|\n| 0| 10| cd0|\n\n\n\n#### Sorting\n\nFunction [sort](#sort) sorts the table by ascending order of the argument. Giving \"-\u0026A\"\nas the sort key will sort the table in descending order of column A:\n\n    f0 | sort(-\u0026A)\n\n\n| #|  A|   B|   C|\n|--|---|----|----|\n| 0| 11| ab1| cd1|\n| 1| 10| ab0| cd0|\n\n\n\nThe sort key can be an arbitrary expression. For example, `f0 | sort(\u0026C+\u0026B)` will\nsort rows by ascending order of the concatenation of columns C and B.\n`f0 | sort(\u0026C+\u0026B)` will produce the same table as `f0` itself.\n\n#### Joining multiple tables\n\nImagine file `gql/testdata/file1.tsv` with the following contents.\n\n        C\tD\tE\n        10\tef0\tef1\n        12\tgh0\tgh1\n\n\nFunction [join](#join) joins multiple tables into one table.\n\n    f1 := read(`gql/testdata/file1.tsv`);\n    join({f0, f1}, f0.A==f1.C, map:={A:f0.A, B:f0.B, C: f0.C, D:f1.D})\n\n\n| #|  A|   B|   C|   D|\n|--|---|----|----|----|\n| 0| 10| ab0| cd0| ef0|\n\n\n\nThe first argument to join is the list of tables to join. It is of the form\n`{name0: table0, ..., nameN, tableN}`. Tag `nameK` can be used to name row\nvalues for `tableK` in the rest of the join expression. If you omit the `nameI:`\npart of the table list, join tries to guess a reasonable name from the column\nvalue.\n\nNOTE: See [struct comprehension](#struct-comprehension) for more details\nabout how column names are computed when they are omitted.\n\nThe second argument is the join condition.  The optional 'map:=rowspec' argument\nspecifies the list of columns in the final result.  Join will produce Cartesian\nproducts of rows in the two tables, then create a new table consisting of the\nresults that satisfy the given join condition. For good performance, the join\ncondition should be a conjunction of column equality constraints (`table0.col0 ==\ntable1.col0 \u0026\u0026 ... \u0026\u0026 table1.col2 == table2.col2`). Join recognizes this form of\nconditions and executes the operation more efficiently.\n\n\n#### Cogroup and reduce\n\nImagine a file 'gql/testdata/file2.tsv' like below:\n\n        A\tB\n        cat\t1\n        dog\t2\n        cat\t3\n        bat\t4\n\n\nFunction [cogroup](#cogroup) is a special kind of join, similar to \"select ... group by\"\nin SQL. Cogroup aggregates rows in a table by a column.  The result is a\ntwo-column table where the \"key\" column is the aggregation key, and \"value\"\ncolumn is a subtable that stores the rows wit the given key.\n\n    read(`gql/testdata/file2.tsv`) | cogroup(\u0026A)\n\n\n| #| key|                     value|\n|--|----|--------------------------|\n| 0| bat|             [{A:bat,B:4}]|\n| 1| cat| [{A:cat,B:1},{A:cat,B:3}]|\n| 2| dog|             [{A:dog,B:2}]|\n\n\n\nColumn '[{A:cat,B:1},{A:cat,B:3}]` is a shorthand notation for\nthe following table:\n\n| A   | B |\n|-----|---|\n| cat | 1 |\n| cat | 3 |\n\n\nFunction [reduce](#reduce) is similar to [cogroup](#cogroup), but it applies a user-defined\nfunction over rows with the same key. The function must be commutative. The\nresult is a two-column table with nested columns. Reduce is generally faster\nthan cogroup.\n\n    read(`gql/testdata/file2.tsv`) | reduce(\u0026A, |a,b|(a+b), map:=\u0026B)\n\n\n| #| key| value|\n|--|----|------|\n| 0| cat|     4|\n| 1| dog|     2|\n| 2| bat|     4|\n\n\n\nFunctions such as `cogroup` and `reduce` exist because it is much amenable for\ndistributed execution compared to generic joins. We explain distributed\nexecution in a [later section](#distributed-execution)\n\n\n### Distributed execution\n\nFunctions `map`, `reduce`, `cogroup`, `sort`, and `minn` accept argument\n`shards`. When set, it causes the functions to execute in parallel.  By default\nthey will run on the local machine, using multiple CPUs. Invoking GQL with the\nfollowing flags will cause these functions to use AWS EC2 spot instances.\n\n    gql scripts/cpg-frequency.gql\n\nTo set bigslice up, follow the instructions in https://bigslice.io.\n\n## GQL implementation overview\n\nGQL is a dataflow language, much like other SQL variants. Each table, be it a\nleaf table created by `read` and other functions, or an intermediate table\ncreated by functions like `map`, `join` is becomes a dataflow node, and rows\nflow between nodes. Consider a simple example for file foo.tsv:\n\n     read(`foo.tsv`) | filter(\u0026A \u003e 10) | sort(-\u0026B) | write(`blah.tsv`)\n\nwhere foo.tsv is something like below:\n\n         A      B\n         10     ab0\n         11     ab1\n\nGQL creates four nodes connected sequentially.\n\n     read(`foo.tsv`) ---\u003e filter(\u0026A \u003e 10) ----\u003e sort(-\u0026B) ----\u003e write(`blah.tsv`)\n\nA dataflow graph is driven from the tail. In this example, `write` pulls rows\nfrom `sort`, which pulls rows from `filter`, and so on. Tables are materialized\nonly when necessary. `read` and `filter` just pass through rows one by one,\nwhereas `sort` needs to read all the rows from the source and materialize them\non disk.\n\n       f0 := read(`f0.tsv`); f1 := read(`f1.tsv`); f2 := read(`f2.tsv`)\n       join({f0, f1, f2}, f0.A==f1.A \u0026\u0026 f1.C==f2.C)\n\n`Join` function merges nodes.\n\n     read(`f0.tsv`) --\u003e sort(f0.A) \\\n                                    merge(f0.A==f1.A) ---\u003e sort(f1.C) \\\n     read(`f1.tsv`) --\u003e sort(f1.A) /                                   merge(f1.C==f2.C)\n                                                                      /\n     read(`f2.tsv`) ----------------------------------\u003e sort(f2.C)   /\n\n\nA function that performs bigslice-based distributed execution ships the code\nto remote machines. For example:\n\n       read(`s3://bucket/f.tsv`) | cogroup({\u0026A}, shards:=1024)\n\nThe expression \"read(`s3://bucket/f.tsv`)\" along with all the variable bindings\nneeded to run the expression is serialized and shipped to every bigslice shard.\n\n## GQL syntax\n\n    toplevelstatements := (toplevelstatement ';')* toplevelstatement ';'?\n\n    toplevelstatement :=\n      'load' path\n      | expr\n      | assignment\n      | 'func' symbol '(' params* ')' expr // shorthand for symbol:=|symbol...| expr\n\n    assignment := variable ':=' expr\n\n    expr :=\n      symbol                  // variable reference\n      | literal               // 10, 5.5, \"str\", 'x', etc\n      | expr '(' params* ')'  // function call\n      | expr '|' expr         // data piping\n      | expr '||' expr        // logical or\n      | expr '\u0026\u0026' expr        // logical and\n      | 'if' expr expr 'else' expr // conditional\n      | 'if' expr expr 'else' 'if' expr 'else' ... // if .. else if .. else if .. else ..\n      | '{' structfield, ..., structfield '}'\n      | expr '.' colname\n      | block\n      | funcdef\n\n    structfield := expr | colname ':' expr\n    literal := int | float | string | date | 'NA'\n    string := \"foo\" | `foo`\n    date := iso8601 format literal, either 2018-03-01 or 2018-03-01T15:40:41Z or 2018-03-01T15:40:41-7:00\n\n    funcdef := '|' params* '|' expr\n    block := '{' (assignment ';')* expr '}'\n\n'?' means optional, '*' means zero or more repetitions, and (...) means grouping of parts.\n\n## Data types\n\n### Table\n\nWhen you read a TSV file, GQL internally turns it into a *Table*. Table contains\na sequence of *rows*, or *structs*.  We use terms structs and rows\ninterchangeably.  Tables are created by loading a TSV or other table-like files\n(*.prio, *.bed, etc). It is also created as a result of function invocation\n(`filter`, `join`, etc).  Invoking `read(\"foo.tsv\")` creates a table from file\n\"foo.tsv\". Function `write(table, \"foo.tsv\")` saves the contents of the table in\nthe given file.  The `read` function supports the following file formats:\n\n  - *.tsv : Tab-separated-value file. The list of columns must be listed in the\n  first row of the file. GQL tries to guess the column type from the file\n  contents.\n\n  - .prio : [Fragment file](https://sg.eng.grail.com/grail/grail/-/blob/go/src/grail.com/bio/fragments/f.go). Each fragment is mapped into a row of the following format:\n\n| Column name | type     |\n|-------------|----------|\n| reference   | string   |\n| start       | int   |\n| length      | int   |\n| plusstrandread       | int   |\n| duplicate       | bool   |\n| corgcount       | int   |\n| pvalue       | float   |\n| methylationstates       | array table of integers |\n| methylationcoverage | array table |\n| methylationbasequalitiesread1 | array table of integers |\n| methylationbasequalitiesread2 | array table of integers |\n\nAn array table is a table with two columns, 'position' and 'value'. The position\ncolumn stores the position of the given value, relative to the reference.  For\nexample, imagine that the original fragment object has following fields:\n\n```\n    frag := F{\n      Reference: \"chr11\",\n      FirstCpG: 12345,\n      MethylationStates: []MethylationState{UNMETHYLATED, METHYLATED, VARIANT},\n      ...\n    }\n```\n\nThen, the corresponding GQL representation of methylationstates will be\n\n```\n    table({position: 12345, value: 1},\n          {position: 12346, value: 2},\n          {position: 12347, value: 4})\n```\n\nNote: In `fragment.proto`, UNMETHYLATED=1, METHYLATED=2, VARIANT=4.\n\n  - .bam, .pam : [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf) or [PAM](https://github.com/grailbio/bio/blob/master/encoding/pam/README.md) file\n\n  - .bed: [BED file](https://genome.ucsc.edu/FAQ/FAQformat.html), three to four columns\n\n  - .bincount : bincount file. Each row has the following columns:\n\n| Column name | type     |\n|-------------|----------|\n| chrom   | string   |\n| start       | int   |\n| end      | int   |\n| gc       | int   |\n| count       | int   |\n| length       | int   |\n| density       | float   |\n\n  - .cpg_index : CpG index file\n\n  - .btsv : BTSV is a binary version of TSV. It is a format internally used by GQL to\n    save and restore table contents.\n\nThe `write` function currently supports `*.tsv` and `*.btsv` file types. If\npossible, save the data in *.btsv format. It is faster and more compact.\n\nUnlike SQL, rows in a table need not be homogeneous - they can technically have\ndifferent sets of columns. Such a table can be created, for example, by\nflattening tables with inconsistent schemata. We don't recommend creating such\ntables though.\n\n### Row (struct)\n\nStruct represents a row in a table. We also use the term \"row\" to means the same\nthing. We use term \"column\" or \"field\" to refer to an individual value in a row.\nA column usually stores a scalar value, such as an integer or a string.  But a\ncolumn may store a another table. This happens when a column the pathname of\nanother TSV file (or bincounts, BAM, etc). GQL automatically creates a subtable\nfor such columns.\n\nSeveral builtin functions are provided to manipulate rows. They are described in\nmore detail later in this document.\n\n  - `{col1:expr1, col2:expr2, ..., colN:exprN}` creates a new row with N columns\n    named `col1`, `col2`, ..., `colN`. See\n    [struct comprehension](#struct-comprehension) for more details.\n\n  - `pick(table, expr)` extracts the first row that satisfies expr from a table\n\n\n### Scalar values\n\nGQL supports the following scalar types.\n\n  - Integer (int64)\n\n  - Floating point (float64)\n\n  - String\n\n    Strings are enclosed in either with \"doublequotes\" or \\`backquotes\\`, like in\n    Go.  Note that singlequotes are not supported.\n\n  - Filename: filename is a string that refers to another file.  A filename\n    contains a pathname that looks like a table (*.tsv, *.prio, *.bed, etc) are\n    automatically opened as a subtable.\n\n  - Char: utf8 character. 'x', 'y'.\n\n  - DateTime (date \u0026 time of day)\n\n  - Date  (date without a time-of-day component)\n\n  - Time  (time of day)\n\n    Date, DateTime, and Time are written in\n    [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) format.\n    Below are samples:\n\n        2018-04-16\n        2018-04-16T15:19:35Z\n        2018-04-16T15:19:35-7:00\n        15:19:35Z\n        15:19:35-8:00\n\n  - Null Null is a special value indicating \"value isn't there'.  A TSV cell\n    with value \"NA\", \"null\", or \"\" becomes a Null in GQL. Symbol \"NA\" also\n    produce null.\n\n    In GQL, a null value is treated as ∞. Thus:\n\n        1 \u003c NA\n        1 \u003e -NA\n        \"foo\" \u003c NA\n        \"foo\" \u003e -NA\n\n## Control-flow expressions\n\n    expr0 || expr1\n    expr0 \u0026\u0026 expr1\n    if expr0 expr1 else expr2\n    if expr0 expr1 else if expr2 expr3 else ...\n\nThese expressions have the similar grammars and meanings to Go's.\nThe 'if' construct is slightly different:\n\n- The then and else parts are expressions, and they need not be enclosed in\n  '{...}'.\n\n- The 'else' part is required.\n\n- 'If' is an expression. Its value is that of the 'then' expression if the\n  condition is true, and that of the 'else' expression otherwise. In the below example, the value of y is \"bar\".\n\n      x := 10;\n      y := if x \u003e 10 \"foo\" else \"bar\"\n\n## Code blocks\n\n      { assignments... expr }\n\nAn expression of form `{ assignments... expr }` introduces local variables.  The\nfollowing example computes 110 (= 10 * (10+1)). The variables \"x\" and \"y\" are\nvalid only inside the block.\n\n      { x := 10; y := x+1; x * y }\n\nA block is often used as a body of a function, which we describe next.\n\n## Functions\n\nAn expression of form `|args...| expr` creates a function.  It can be assigned\nto a variable and invoked later. Consider the following example:\n      udf := |row| row.date \u003e= 2018-03-05;\n      read(`foo.tsv`) | filter(|row|udf(row))\n\nIt is the same as\n\n      read(`foo.tsv`) | filter(\u0026date \u003e= 2018-03-05)\n\nGQL also provides syntax sugar\n\n      func udf(row) row.date \u003e= 2018-03-05\n\nIt is the same as `udf := |row| row.date \u003e= 2018-03-05`\n\nThe function body is often a [code block](#code-blocks).\n\n      func ff(arg) {\n         x := arg + 1;\n         y := x * x\n         y * 3\n      };\n      ff(4)\n\nThe above example will print 75. The variables x and y defined in the body of\n`ff` are local to the function invocation, just like in regular Go functions.\n\n\u003e Note: function arguments are lexically bound. Functions can be nested, and they\nact as a closure.\n\nThe '\u0026'-expressions introduced in [earlier examples](#basic-functions) are\nsyntax sugar for user-defined functions. It is translated into a\n[function](#functions) by the GQL parser. The translation rules are the\nfollowing:\n\n- '\u0026'-translation applies only to a function-call argument. It is an error for\n  '\u0026' to appear anywhere else.\n\n- When a function-call arg contains '\u0026' anywhere, then the entire argument is\n  translated into a function with formal argument named '_', and every\n  occurrences of form '\u0026col' is rewritten to become '_.col'.\n\nOriginal:      table | map({x:\u0026col0, y:\u0026col1})\nAfter rewrite: table | map(|_|{x:_.col0, y:_.col1})\n\nOriginal:      table | map({x:\u0026col0, y:\u0026col1}) | sort(\u0026x)\nAfter rewrite: table | map(|_|{x:_.col0, y:_.col1}) | sort(|_|_.x)\n\n\nThe '\u0026' rule applies recursively to the entire argument, so it may behave\nnonintuitively if '\u0026' appears inside a nested function call. Consider the\nfollowing example, which is a bit confusing.\n\nOriginal:      table | map(map(\u0026col))\nAfter rewrite: table | map(|_|map(_.col))\n\nSo we recommend using '\u0026' only for simple expressions like the earlier examples.\nFor nested maps and other complex examples, it's cleaner to use an explicit\nfunction syntax of form `|args...|expr`.\n\n## Deprecated GQL constructs\n\nThe following expressions are deprecated and will be removed soon.\n\n- '$var\"\n\n\"$var\" is very similar to \"\u0026var\". Expression `table | filter($A==10)` will act\nthe same as `table | filter($A==10)`. The difference is that '$'-rule is\nhard-coded into a few specific builtin functions, such as `map`, `filter`, and\n`sort`, whereas '\u0026' is implemented as a generic language rule.\n\n- func(args...) { statements... }\n\nThis is an old-style function syntax. Use `|args...| expr` instead.  The\ndifference between the two is that 'func` syntax requires '{' and '}' even if\nthe body is a single expression. The '|args...|' form takes '{...}' only when\nthe body is a [code block](#code-blocks).\n\n## Importing a GQL file\n\nThe load statement can be used to load a gql into another gql file.\n\nAssume file `file1.gql` has the following contents:\n\n       x := 10\n\nAssume file `file2.gql` has the following contents:\n\n       load `file1.gql`\n       x * 2\n\nIf you evaluate file2.gql, it will print \"20\".\n\nIf a gql file can contain multiple load statements.  The load statement must\nappear before any other statement.\n\n## Builtin functions\n\n### Table manipulation\n\n#### map\n\n\n    _tbl | map(expr[, expr, expr, ...] [, filter:=filterexpr] [, shards:=nshards])\n\nArg types:\n\n- _expr_: one-arg function\n- _filterexpr_: one-arg boolean function (default: ::|_|true::)\n_ _nshards_: int (default: 0)\n\nMap picks rows that match _filterexpr_ from _tbl_, then applies _expr_ to each\nmatched row.  If there are multiple _expr_s, the resulting table will apply each\nof the expression to every matched row in _tbl_ and combine them in the output.\n\nIf _filterexpr_ is omitted, it will match any row.\nIf _nshards_ \u003e 0, it enables distributed execution.\nSee the [distributed execution](#distributed-execution) section for more details.\n\nExample: Imagine table ⟪t0⟫ with following contents:\n\n|col0 | col1|\n|-----|-----|\n|Cat  | 3   |\n|Dog  | 8   |\n\n    t0 | map({f0:\u0026col0+\u0026col0, f1:\u0026col1*\u0026col1})\n\nwill produce the following table\n\n|f0      | f1   |\n|--------|------|\n|CatCat  | 9    |\n|DogDog  | 64   |\n\nThe above example is the same as below.\n\n    t0 | map(|r|{f0:r.col0+r.col0, f1:r.col1*r.col1}).\n\n\nThe next example\n\n    t0 | map({f0:\u0026col0+\u0026col0}, {f0:\u0026col0}, filter:=\u0026col1\u003e4)\n\nwill produce the following table\n\n|f0      |\n|--------|\n|DogDog  |\n|Dog     |\n\n\n\n#### filter\n\n\n    tbl | filter(expr [,map:=mapexpr] [,shards:=nshards])\n\nArg types:\n\n- _expr_: one-arg boolean function\n- _mapexpr_: one-arg function (default: ::|row|row::)\n- _nshards_: int (default: 0)\n\nFunctions [map](#map) and filter are actually the same functions, with slightly\ndifferent syntaxes.  ::tbl|filter(expr, map=mapexpr):: is the same as\n::tbl|map(mapexpr, filter:=expr)::.\n\n\n#### reduce\n\n\n    tbl | reduce(keyexpr, reduceexpr [,map:=mapexpr] [,shards:=nshards])\n\nArg types:\n\n- _keyexpr_: one-arg function\n- _reduceexpr_: two-arg function\n- _mapexpr_: one-arg function (default: ::|row|row::)\n- _nshards_: int (default: 0)\n\nReduce groups rows by their _keyexpr_ value. It then invokes _reduceexpr_ for\nrows with the same key.\n\nArgument _reduceexpr_ is invoked repeatedly to combine rows or values with the same key.\n\n  - The optional 'map' argument specifies argument to _reduceexpr_.\n    The default value (identity function) is virtually never a good function, so\n    You should always specify a _mapexpr_ arg.\n\n  - _reduceexpr_must produce a value of the same type as the input args.\n\n  - The _reduceexpr_ must be a commutative expression, since the values are\n    passed to _reduceexpr_ in an specified order. If you want to preserve the\n    ordering of values in the original table, use the [cogroup](#cogroup)\n    function instead.\n\n  - If the source table contains only one row for particular key, the\n    _reduceexpr_ is not invoked. The 'value' column of the resulting table will\n    the row itself, or the value of the _mapexpr_, if the 'map' arg is set.\n\nIf _nshards_ \u003e0, it enables distributed execution.\nSee the [distributed execution](#distributed-execution) section for more details.\n\nExample: Imagine table ::t0:::\n\n|col0 | col1|\n|-----|-----|\n|Bat  |  3  |\n|Bat  |  4  |\n|Bat  |  1  |\n|Cat  |  4  |\n|Cat  |  8  |\n\n::t0 | reduce(\u0026col0, |a,b|a+b, map:=\u0026col1):: will create the following table:\n\n|key  | value|\n|-----|------|\n|Bat  | 8    |\n|Cat  | 12   |\n\n::t0 | reduce(\u0026col0, |a,b|a+b, map:=1):: will count the occurrences of col0 values:\n\n|key  | value|\n|-----|------|\n|Bat  | 3    |\n|Cat  | 2    |\n\n\nA slightly silly example, ::t0 | reduce(\u0026col0, |a,b|a+b, map:=\u0026col1*2):: will\nproduce the following table.\n\n|key  | value|\n|-----|------|\n|Bat  | 16   |\n|Cat  | 24   |\n\n\u003e Note: ::t0| reduce(t0, \u0026col0, |a,b|a+b.col1):: looks to be the same as\n::t0 | reduce(\u0026col0, |a,b|a+b, map:=\u0026col1)::, but the former is an error. The result of the\n_reduceexpr_ must be of the same type as the inputs. For this reason, you\nshould always specify a _mapexpr_.\n\n\n#### flatten\n\n\n    flatten(tbl0, tbl1, ..., tblN [,subshard:=subshardarg])\n\nArg types:\n\n- _tbl0_, _tbl1_, ... : table\n- _subshardarg_: boolean (default: false)\n\n::tbl | flatten():: (or ::flatten(tbl)::) creates a new table that concatenates the rows of the subtables.\nEach table _tbl0_, _tbl1_, .. must be a single-column table where each row is a\nanother table. Imagine two tables ::table0:: and ::table1:::\n\ntable0:\n\n|col0 | col1|\n|-----|-----|\n|Cat  | 10  |\n|Dog  | 20  |\n\ntable1:\n\n|col0 | col1|\n|-----|-----|\n|Bat  | 3   |\n|Pig  | 8   |\n\nThen ::flatten(table(table0, table1)):: produces the following table\n\n|col0 | col1|\n|-----|-----|\n|Cat  | 10  |\n|Dog  | 20  |\n|Bat  | 3   |\n|Pig  | 8   |\n\n::flatten(tbl0, ..., tblN):: is equivalent to\n::flatten(table(flatten(tbl0), ..., flatten(tblN)))::.\nThat is, it flattens each of _tbl0_, ..., _tblN_, then\nconcatenates their rows into one table.\n\nParameter _subshard_ specifies how the flattened table is sharded, when it is used\nas an input to distributed ::map:: or ::reduce::. When _subshard_ is false (default),\nthen ::flatten:: simply shards rows in the input tables (_tbl0_, _tbl1_ ,..., _tblN_). This works\nfine if the number of rows in the input tables are much larger than the shard count.\n\nWhen ::subshard=true::, then flatten will to shard the individual subtables\ncontained in the input tables (_tbl0_, _tbl1_,...,_tblN_). This mode will work better\nwhen the input tables contain a small number (~1000s) of rows, but each\nsubtable can be very large. The downside of subsharding is that the flatten\nimplementation must read all the rows in the input tables beforehand to figure\nout their size distribution. So it can be very expensive when input tables\ncontains many rows.\n\n\n#### concat\n\n\n    concat(tbl...)\n\nArg types:\n\n- _tbl_: table\n\n::concat(tbl1, tbl2, ..., tblN):: concatenates the rows of tables _tbl1_, ..., _tblN_\ninto a new table. Concat differs from flatten in that it attempts to maintain\nsimple tables simple: that is, tables that are backed by (in-memory) values\nare retained as in-memory values; thus concat is designed to build up small(er)\ntable values, e.g., in a map or reduce operation.\n\n\n\n#### cogroup\n\n\n    tbl | cogroup(keyexpr [,mapexpr=mapexpr] [,shards=nshards])\n\nArg types:\n\n- _keyexpr_: one-arg function\n- _mapexpr_: one-arg function (default: ::|row|row::)\n- _nshards_: int (default: 1)\n\nCogroup groups rows by their _keyexpr_ value.  It is the same as Apache Pig's\nreduce function. It achieves an effect similar to SQL's \"GROUP BY\" statement.\n\nArgument _keyexpr_ is any expression that can be computed from row contents. For\neach unique key as computed by _keyexpr_, cogroup emits a two-column row of form\n\n    {key: keyvalue, value: rows}\n\nwhere _keyvalue_ is the value of keyexpr, and _rows_ is a table containing all\nthe rows in tbl with the given key.\n\nIf argument _mapexpr_ is set, the _value_ column of the output will be the\nresult of applying the _mapexpr_.\n\nExample: Imagine table t0:\n\n|col0 | col1|\n|-----|-----|\n|Bat  |  3  |\n|Bat  |  1  |\n|Cat  |  4  |\n|Bat  |  4  |\n|Cat  |  8  |\n\n::t0 | cogroup(\u0026col0):: will create the following table:\n\n|key  | value|\n|-----|------|\n|Bat  | tmp1 |\n|Cat  | tmp2 |\n\nwhere table tmp1 is as below:\n\n|col0 | col1|\n|-----|-----|\n|Bat  |  3  |\n|Bat  |  1  |\n|Bat  |  4  |\n\ntable tmp2 is as below:\n\n|col0 | col1|\n|-----|-----|\n|Cat  |  4  |\n|Cat  |  8  |\n\n::t0 | cogroup(\u0026col0, map:=\u0026col1):: will create the following table:\n\n|key  | value|\n|-----|------|\n|Bat  | tmp3 |\n|Cat  | tmp4 |\n\nEach row in table tmp1 is a scalar, as below\n\n|  3  |\n|  1  |\n|  4  |\n\nSimilarly, table tmp2 looks like below.\n\n|  4  |\n|  8  |\n\nThe cogroup function always uses bigslice for execution.  The _shards_ parameter\ndefines parallelism. See the \"distributed execution\" section for more details.\n\n\n#### firstn\n\n\n    tbl | firstn(n)\n\nArg types:\n\n- _n_: int\n\nFirstn produces a table that contains the first _n_ rows of the input table.\n\n\n#### minn\n\n\n    tbl | minn(n, keyexpr [, shards:=nshards])\n\nArg types:\n\n- _n_: int\n- _keyexpr_: one-arg function\n- _nshards_: int (default: 0)\n\nMinn picks _n_ rows that stores the _n_ smallest _keyexpr_ values. If _n_\u003c0, minn sorts\nthe entire input table.  Keys are compared lexicographically.\nNote that we also have a\n::sort(keyexpr, shards:=nshards):: function that's equivalent to ::minn(-1, keyexpr, shards:=nshards)::\n\nThe _nshards_ arg enables distributed execution.\nSee the [distributed execution](#distributed-execution) section for more details.\n\nExample: Imagine table t0:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Bat  |  3  | abc |\n|Bat  |  4  | cde |\n|Cat  |  4  | efg |\n|Cat  |  8  | ghi |\n\n::minn(t0, 2, -\u0026col1):: will create\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  |  8  | ghi |\n|Cat  |  4  | efg |\n\n::minn(t0, -\u0026col0):: will create\n\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  |  4  | efg |\n|Cat  |  8  | ghi |\n\nYou can sort using multiple keys using {}. For example,\n::t0 | minn(10000, {\u0026col0,-\u0026col2}):: will sort two rows first by col0, then by -col2 in case of a\ntie.\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  |  8  | ghi |\n|Cat  |  4  | efg |\n|Bat  |  4  | cde |\n|Bat  |  3  | abc |\n\n\n\n#### sort\n\n\n    tbl | sort(sortexpr [, shards:=nshards])\n\n::tbl | sort(expr):: is a shorthand for ::tbl | minn(-1, expr)::\n\n#### join\n\n\n    join({t0:tbl0,t1:tbl1,t2:tbl2}, t0.colA==t1.colB \u0026\u0026 t1.colB == t2.colC [, map:={colx:t0.colA, coly:t2.colC}])\n\nArg types:\n\n- _tbl0_, _tbl1_, ..: table\n\nJoin function joins multiple tables into one. The first argument lists the table\nname and its mnemonic in a struct form. The 2nd arg is the join condition.\nThe ::map:: arg specifies the format of the output rows.\n\nImagine the following tables:\n\ntable0:\n\n|colA | colB|\n|-----|-----|\n|Cat  | 3   |\n|Dog  | 8   |\n\ntable1:\n\n|colA | colC|\n|-----|-----|\n|Cat  | red |\n|Bat  | blue|\n\n\nExample:\n\n1. ::join({t0:table0, t1:table1}, t0.colA==t1.colA, map:={colA:t0.colA, colB: t0.colB, colC: t1.colC})::\n\nThis expression performs an inner join of t0 and t1.\n\n|colA | colB| colC|\n|-----|-----|-----|\n|Cat  | 3   | red |\n\n\n2. ::join({t0:table0, t1:table1}, t0.A?==?t1.A,map:={A:t0.A, A2:t1.A,B:t0.B, c:t1.C})::\n\nThis expression performs an outer join of t0 and t1.\n\n|   A|  A2|  B|    c|\n|----|----|---|-----|\n|  NA| bat| NA| blue|\n| cat| cat|  3|  red|\n| dog|  NA|  8|   NA|\n\n\nThe join condition doesn't need to be just \"==\"s connected by \"\u0026\u0026\"s. It can be\nany expression, although join provides a special fast-execution path for flat,\nconjunctive \"==\"s, so use them as much as possible.\n\nCaution: join currently is very slow on large tables. Talk to ysaito if you see\nany problem.\n\n\nTODO: describe left/right joins (use ==?, ?==)\nTODO: describe cross joins (set non-equality join conditions, such as t0.colA \u003e= t1.colB)\n\n#### transpose\n\n\n    tbl | transpose({keycol: keyexpr}, {col0:expr0, col1:expr1, .., valcol:valexpr})\n\nArg types:\n\n_keyexpr_: one-arg function\n_expri_: one-arg function\n_valexpr_: one-arg function\n\nTranspose function creates a table that transposes the given table,\nsynthesizing column names from the cell values. _tbl_ must be a two-column table\ncreated by [cogroup](#cogroup). Imagine table t0 created by cogroup:\n\nt0:\n\n|key  |value |\n|-----|------|\n|120  | tmp1 |\n|130  | tmp2 |\n\n\nEach cell in the ::value:: column must be another table, for example:\n\ntmp1:\n\n|chrom|start|  end|count|\n|-----|-----|-----|-----|\n|chr1 |    0|  100|  111|\n|chr1 |  100|  200|  123|\n|chr2 |    0|  100|  234|\n\n\ntmp2:\n\n|chrom|start|  end|count|\n|-----|-----|-----|-----|\n|chr1 |    0|  100|  444|\n|chr1 |  100|  200|  456|\n|chr2 |  100|  200|  478|\n\n\n::t0 | transpose({sample_id:\u0026key}, {\u0026chrom, \u0026start, \u0026end, \u0026count}):: will produce\nthe following table.\n\n\n|sample_id| chr1_0_100| chr1_100_200| chr2_0_100| chr2_100_200|\n|---------|-----------|-------------|-----------|-------------|\n|120      |   111     |   123       |   234     |    NA       |\n|130      |   444     |   456       |   NA      |   478       |\n\nThe _keyexpr_ must produce a struct with \u003e= 1 column(s).\n\nThe 2nd arg to transpose must produce a struct with \u003e= 2 columns. The last column is used\nas the value, and the other columns used to compute the column name.\n\n\n\n#### gather\n\n\n    tbl | gather(colname..., key:=keycol, value:=valuecol)\n\nArg types:\n\n- _colname_: string\n- _keycol_: string\n- _valuecol_: string\n\nGather collapses multiple columns into key-value pairs, duplicating all other columns as needed. gather is based on the R tidyr::gather() function.\n\nExample: Imagine table t0 with following contents:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  | 31  |\n|Dog  | 40  | 41  |\n\n::t0 | gather(\"col1\", \"col2, key:=\"name\", value:=\"value\"):: will produce the following table:\n\n| col0| name| value|\n|-----|-----|------|\n|  Cat| col1|    30|\n|  Cat| col2|    31|\n|  Dog| col1|    40|\n|  Dog| col2|    41|\n\t\t\n\n#### spread\n\n\n    tbl | spread(keycol, valuecol)\n\nArg types:\n\n- _keycol_: string\n- _valuecol_: string\n\nSpread expands rows across two columns as key-value pairs, duplicating all other columns as needed. spread is based on the R tidyr::spread() function.\n\nExample: Imagine table t0 with following contents:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  | 31  |\n|Dog  | 40  | 41  |\n\n::t0 | spread(\"col1\", \"col2, key:=\"col1\", value:=\"col2\"):: will produce the following table:\n\n| col0| 30| 40|\n|-----|---|---|\n|  Cat| 31|   |\n|  Dog|   | 41|\n\nNote the blank cell values, which may require the use the function to contains to\ntest for the existence of a field in a row struct in subsequent manipulations.\n\n\n#### collapse\n\n\n    tbl | collapse(colname...)\n\nArg types:\n\n- _colname_: string\n\nCollapse will collapse multiple rows with non-overlapping values for the specified\ncolumns into a single row when all of the other cell values are identical.\n\nExample: Imagine table t0 with following contents:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  |     |\n|Cat  |     | 41  |\n\n::t0 | collapse(\"col1\", \"col2\"):: will produce the following table:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  | 41  |\n\nNote that the collapse will stop if one of the specified columns has multiple\nvalues, for example for t0 below:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  |     |\n|Cat  | 31  | 41  |\n\n::t0 | collapse(\"col1\", \"col2\"):: will produce the following table:\n\n|col0 | col1| col2|\n|-----|-----|-----|\n|Cat  | 30  |     |\n|Cat  | 30  | 41  |\n\n\n\n#### joinbed\n\n\n    srctable | joinbed(bedtable [, chrom:=chromexpr]\n                                [, start:=startexpr]\n                                [, end:=endexpr]\n                                [, length:=lengthexpr]\n                                [, map:=mapexpr])\n\nArg types:\n\n- bedtable: table (https://uswest.ensembl.org/info/website/upload/bed.html)\n- chromexpr: one-arg function (default: ::|row|row.chrom)\n- startexpr: one-arg function (default: ::|row|row.start)\n- endexpr: one-arg function (default: ::|row|row.end)\n- lengthexpr: one-arg function (default: NA)\n- mapexpr: two-arg function (srcrow, bedrow) (default: ::|srcrow,bedrow|srcrow::)\n\nJoinbed is a special kind of join operation that's optimized for intersecting\n_srctable_ with genomic intervals listed in _bedtable_.\n\nExample:\n\n     bed := read(\"test.bed\")\n     bc := read(\"test.bincount.tsv\")\n     out := bc | joininbed(bed, chrom:=$chromo))\n\nOptional args _chromexpr_, _startexpr_, _endexpr_, and _lengthexpr_ specify how to extract the\ncoordinate values from a _srctable_ row. For example:\n\n     bc := read(\"test.bincount.tsv\")\n     bc | joinbed(bed, chrom:=\u0026chromo, start=\u0026S, end=\u0026E)\n\nwill use columns \"chromo\", \"S\", \"E\" in table \"test.bincount.tsv\" to\nconstruct a genomic coordinate, then checks if the coordinate intersects with a\nrow in the bed table.\n\nAt most one of _endexpr_ or _lengthexpr_ arg can be set. If _endexpr_ is set, [_startexpr_, _endexpr_)\ndefines a zero-based half-open range for the given chromosome. If _lengthexpr_ is\nset, [_startexpr_, _startexpr_+_lengthexpr_) defines a zero-based half-open coordinate range.  The\n\nThe BED table must contain at least three columns. The chromosome name, start\nand end coordinates are extracted from the 1st, 2nd and 3rd columns,\nrespectively. Each coordinate range is zero-based, half-open.\n\nTwo coordinate ranges are considered to intersect if they have nonempty overlap,\nthat is they overlap at least one base.\n\n_mapexpr_ describes the format of rows produced by joinbed. If _mapexpr_ is\nomitted, joinbed simply emits the matched rows in _srctable_.\n\nFor example, the below example will produce rows with three columns: name, chrom\nand pos.  the \"name\" column is taken from the \"featname\" in the BED row, the\n\"pos\" column is taken \"start\" column of the \"bc\" table row.\n\n     bc := read(\"test.bincount.tsv\")\n     bc | joinbed(bed, chrom:=\u0026chromo, start=\u0026S, end=\u0026E, map:=|bcrow,bedrow|{name:bedrow.featname, pos: bcrow.start})\n\nThe below is an example of using the \"row\" argument. It behaves identically to the above exampel.\n\n     bc := read(\"test.bincount.tsv\")\n     bc | joinbed(bed, row:=bcrow, chrom:=bcrow.chromo, start=bcrow.S, end=bcrow.E, map:=|bcrow,bedrow|{name:bedrow.featname, pos: bcrow.start})\n\n\n\n#### count\n\n\ntbl | count()\n\nCount counts the number of rows in the table.\n\nExample: imagine table t0:\n\n| col1|\n|-----|\n|  3  |\n|  4  |\n|  8  |\n\n::t0 | count():: will produce 3.\n\n\n#### pick\n\n\n    tbl | pick(expr)\n\nArg types:\n\n- _expr_: one-arg boolean function\n\nPick picks the first row in the table that satisfies _expr_.  If no such row is\nfound, it returns NA.\n\nImagine table t0:\n\n|col0 | col1|\n|-----|-----|\n|Cat  | 10  |\n|Dog  | 20  |\n\n::t0 | pick(\u0026col1\u003e=20):: will return {Dog:20}.\n::t0 | pick(|row|row.col1\u003e=20):: is the same thing.\n\n\n#### table\n\n\ntable(expr...)\n\nArg types:\n\n- _expr_: any\n\nTable creates a new table consisting of the given values.\n\n#### readdir\n\nUsage: readdir(path)\n\nreaddir creates a Struct consisting of files in the given directory.  The field\nname is a sanitized pathname, value is either a Table (if the file is a .tsv,\n.btsv, etc), or a string (if the file cannot be parsed as a table).\n\n#### table_attrs\n\n\n   tbl |table_attrs()\n\nExample:\n\n     t := read(\"foo.tsv\")\n     table_attrs(t).path  (==\"foo.tsv\")\n\nTable_attrs returns table attributes as a struct with three fields:\n\n - Field 'type' is the table type, e.g., \"tsv\", \"mapfilter\"\n - Field 'name' is the name of the table. It is some random string.\n - Field 'path' is name of the file the table is read from.\n   \"path\" is nonempty only for tables created directly by read(),\n   tables that are result of applying map or filter to table created by read().\n\n#### force\n\n\n    tbl | force()\n\nForce is a performance hint. It is logically a no-op; it just\nproduces the contents of the source table unchanged.  Underneath, this function\nwrites the source-table contents in a file. When this expression is evaluated\nrepeatedly, force will read contents from the file instead of\nrunning _tbl_ repeatedly.\n\n\n### Row manipulation\n\n\n#### Struct comprehension\n\nUsage: {col1:expr1, col2:expr2, ..., colN:exprN}\n\n{...} composes a struct. For example,\n\n    x := table({f0:\"foo\", f1:123}, {f0:\"bar\", f1:234})\n\ncreates the following table and assigns it to $x.\n\n| f0  | f1  |\n|-----|-----|\n|foo  | 123 |\n|bar  | 234 |\n\nThe \"colK:\" part of each element can be omitted, in which case the column name\nis auto-computed using the following rules:\n\n- If expression is of form \"expr.field\" (struct field reference), then\n  \"field\" is used as the column name.\n\n- Similarly, if expression is of form \"$field\" (struct field reference), then\n  \"field\" is used as the column name.\n\n- If expression is of form \"var\", then \"var\" is used as the column name.\n\n- Else, column name will be \"fN\", where \"N\" is 0 for the first column, 1 for the\n  2nd column, and so on.\n\nFor example:\n\n    x := table({col0:\"foo\", f1:123}, {col1:\"bar\", f1:234})\n    y := 10\n    x | map({$col0, z, newcol: $col1})\n\n| col0  | z  | newcol |\n|-------|----|--------|\n|foo    | 10 |123     |\n|bar    | 10 |234     |\n\n\nThe struct literal expression can contain a regex of form\n\n        expr./regex/\n\n\"expr\" must be another struct. \"expr\" is most often \"_\". \"{var./regex/}\" is\nequivalent to a struct that contains all the fields in \"var\" whose names match\nregex.  Using the above example,\n\n        x | map({_./.*1$/})\n\nwill pick only column f1. The result will be\n\n| f1  |\n|-----|\n| 123 |\n| 234 |\n\nAs an syntax suger, you can omit the \"var.\" part if var is \"_\". So \"map($x,\n{_./.*1$/})\" can also be written as \"map($x, {/.*1$/})\".\n\n#### unionrow\n\n\n    unionrow(x, y)\n\nArg types:\n\n- _x_, _y_: struct\n\nExample:\n    unionrow({a:10}, {b:11}) == {a:10,b:11}\n    unionrow({a:10, b:11}, {b:12, c:\"ab\"}) == {a:10, b:11, c:\"ab\"}\n\nUnionrow merges the columns of the two structs.\nIf one column appears in both args, the value from the second arg is taken.\nBoth arguments must be structs.\n\n#### optionalfield\n\nUsage: optional_field(struct, field [, default:=defaultvalue])\n\nThis function acts like struct.field. However, if the field is missing, it\nreturns the defaultvalue. If defaultvalue is omitted, it returns NA.\n\nExample:\n\n  optionalfield({a:10,b:11}, a) == 10\n  optionalfield({a:10,b:11}, c, default:12) == 12\n  optionalfield({a:10,b:11}, c) == NA\n\n\n#### contains\n\n\n    contains(struct, field)\n\nArg types:\n\n- _struct_: struct\n- _field_: string\n\nContains returns true if the struct contains the specified field, else it returns false.\n\n### File I/O\n\n#### read\n\nUsage:\n\n    read(path [, type:=filetype])\n\nArg types:\n\n- _path_: string\n- _filetype_: string\n\n\nRead table contents to a file. The optional argument 'type' specifies the file format.\nIf the type is unspecified, the file format is auto-detected from the file extension.\n\n- Extension \".tsv\" or \".bed\" loads a tsv file. If file\n  \"path_data_dictionary.tsv\" exists, it is also read to construct a\n  dictionary. If a dictionary tsv file is missing, the cell data types are\n  guessed.\n\n- Extension \".prio\" loads a fragment file.\n\n- Extension \".btsv\" loads a btsv file.\n\n- Extension \".bam\" loads a BAM file.\n\n- Extension \".pam\" loads a PAM file.\n\n\nIf the type is specified, it must be one of the following strings: \"tsv\", \"bed\",\n\"btsv\", \"fragment\", \"bam\", \"pam\". The type arg overrides file-type autodetection\nbased on path extension.\n\nExample:\n  read(\"blahblah\", type:=tsv)\n.\n\n#### write\n\nUsage: write(table, \"path\" [,shards:=nnn] [,type:=\"format\"] [,datadictionary:=false])\n\nWrite table contents to a file. The optional argument \"type\" specifies the file\nformat. The value should be either \"tsv\", \"btsv\", or \"bed\".  If type argument is\nomitted, the file format is auto-detected from the extension of the \"path\" -\n\".tsv\" for the TSV format, \".btsv\" for the BTSV format, \".bed\" for the BED format.\n\n- For TSV, write a dictionary file \"path_data_dictionary.tsv\" is created by\n  default. Optional argument datadictionary:=false disables generation of the\n  dictionary file. For example:\n\n    read(\"foo.tsv\") | write(\"bar.tsv\", datadictionary:=false)\n\n- When writing a btsv file, the write function accepts the \"shards\"\n  parameter. It sets the number of rangeshards. For example,\n\n    read(\"foo.tsv\") | write(\"bar.btsv\", shards:=64)\n\n  will create a 64way-sharded bar.btsv that has the same content as\n  foo.tsv. bar.btsv is actually a directory, and shard files are created\n  underneath the directory.\n\n.\n\n#### writecols\n\nUsage: writecols(table, \"path-template\", [,datadictionary:=false], [gzip:=false])\n\nWrite table contents to a set of files, each containing a single column of the\ntable. The naming of the files is determined using a templating (golang's\ntext/template). For example, a template of the form:\n\ncols-{{.Name}}-{{.Number}}.ctsv\n\nwill have .Name replaced with name of the column and .Number with the index\nof the column. So for a table with two columns 'A' and 'B', the files\nwill cols-A-0.ctsv and cols-B-1.cstv. Note, that if a data dictionary is\nto be written it will have 'data-dictionary' for .Name and 0 for .Number.\n\nFiles may be optionally gzip compressed if the gzip named parameter is specified\nas true.\n.\n\n### Predicates\n\n    expr0 == expr1\n    expr0 \u003e expr1\n    expr0 \u003e= expr1\n    expr0 \u003c expr1\n    expr0 \u003c= expr1\n    max(expr1, expr2, ..., exprN)\n    min(expr1, expr2, ..., exprN)\n\n    expr0 ?== expr1\n    expr0 ==? expr1\n    expr0 ?==? expr1\n\nThese predicates can be applied to any scalar values, including\nints, floats, strings, chars, and dates, and nulls.\nThe two sides of the operator must be of the same type, or null.\nA null value is treated as ∞. Thus, 1 \u003c NA, but 1 \u003e -NA\n\nPredicates \"==?\", \"?==\", \"?==?\" are the same as \"==\", as long as both sides are\nnon-null. \"X==?Y\" is true if either X==Y, or Y is null. \"X?==Y\" is true if\neither X==Y, or X is null. \"X?==?Y\" is true if either X==Y, or X is null, or Y\nis null. These predicates can be used to do outer, left, or right joins with\njoin builtin.\n\n#### isnull\n\n\n    isnull(expr)\n\nisnull returns true if expr is NA (or -NA). Else it returns false.\n\n#### istable\n\n\n    istable(expr)\n\nIstable returns true if expr is a table.\n\n#### isstruct\n\n\n    isstruct(expr)\n\nIsstruct returns true if expr is a struct.\n\n### Arithmetic and string operators\n\n    expr0 + expr1\n\nThe \"+\" operator can be applied to ints, floats, strings, and structs.  The two\nsides of the operator must be of the same type. Adding two structs will produce\na struct whose fields are union of the inputs. For example,\n\n    {a:10,b:20} + {c:30} == {a:10,b:20,c:30}\n    {a:10,b:20,c:40} + {c:30} == {a:10,b:20,c:30}\n\nAs seen in the second example, if a column appears in both inputs, the second\nvalue will be taken.\n\n\n    -expr0\n\nThe unary \"-\" can be applied to ints, floats, and strings.  Expression \"-str\"\nproduces a new string whose sort order is lexicographically reversed. That is,\nfor any two strings A and B, if A \u003c B, then -A \u003e -B. Unary \"-\" can be used to\nsort rows in descending order of string column values.\n\n    expr0 * expr1\n    expr0 % expr1\n    expr0 - expr1\n    expr0 / expr1\n\nThe above operators can be applied to ints and floats.  The two sides of the\noperator must be of the same type.\n\n\n#### string_count\n\n\n    string_count(str, substr)\n\nArg types:\n\n- _str_: string\n- _substr_: string\n\nExample:\n    string_count(\"good dog!\", \"g\") == 2\n\nCount the number of non-overlapping occurrences of substr in str.\n\n#### regexp_match\n\nUsage: regexp_match(str, re)\n\nExample:\n    regexp_match(\"dog\", \"o+\") == true\n    regexp_match(\"dog\", \"^o\") == false\n\nCheck if str matches re.\nUses go's regexp.MatchString (https://golang.org/pkg/regexp/#Regexp.MatchString).\n\n#### regexp_replace\n\n\n    regexp_replace(str, re, replacement)\n\nArg types:\n\n- _str_: string\n- _re_: string\n_ _replacement_: string\n\nExample:\n    regexp_replace(\"dog\", \"(o\\\\S+)\" \"x$1y\") == \"dxogy\"\n\nReplace occurrence of re in str with replacement. It is implemented using Go's\nregexp.ReplaceAllString (https://golang.org/pkg/regexp/#Regexp.ReplaceAllString).\n\n#### string_len\n\n\n    string_len(str)\n\nArg types:\n\n- _str_: string\n\n\nExample:\n    string_len(\"dog\") == 3\n\nCompute the length of the string. Returns an integer.\n\n#### string_has_suffix\n\n\n    string_has_suffix(str, suffix)\n\nArg types:\n\n- _str_: string\n- _suffix_: string\n\nExample:\n    string_has_suffix(\"dog\", \"g\") == true\n\nChecks if a string ends with the given suffix\n\n#### string_has_prefix\n\n\n    string_has_prefix(str, prefix)\n\nArg types:\n\n- _str_: string\n- _prefix_: string\n\nExample:\n    string_has_prefix(\"dog\", \"d\") == true\n\nChecks if a string starts with the given prefix\n\n#### string_replace\n\n\n    string_replace(str, old, new)\n\nArg types:\n\n- _str_: string\n- _old_: string\n_ _new_: string\n\nExample:\n    regexp_replace(\"dogo\", \"o\" \"a\") == \"daga\"\n\nReplace occurrence of old in str with new.\n\n#### substring\n\n\n    substring(str, from [, to])\n\nSubstring extracts parts of a string, [from:to].  Args \"from\" and \"to\" specify\nbyte offsets, not character (rune) counts.  If \"to\" is omitted, it defaults to\n∞.\n\nArg types:\n\n- _str_: string\n- _from_: int\n_ _to_: int, defaults to ∞\n\nExample:\n    substring(\"hello\", 1, 3) == \"ell\"\n    substring(\"hello\", 2) == \"llo\"\n\n\n#### sprintf\n\n\n    sprintf(fmt, args...)\n\nArg types:\n\n- _fmt_: string\n- _args_: any\n\nExample:\n    sprintf(\"hello %s %d\", \"world\", 10) == \"hello world 10\"\n\nBuilds a string from the format string. It is implemented using Go's fmt.Sprintf\nThe args cannot be structs or tables.\n\n#### string\n\n\n    string(expr)\n\nExamples:\n    string(123.0) == \"123.0\"\n    string(NA) == \"\"\n    string(1+10) == \"11\"\n\nThe string function converts any scalar expression into a string.\nNA is translated into an empty string.\n\n\n#### int\n\n\n    int(expr)\n\nInt converts any scalar expression into an integer.\nExamples:\n    int(\"123\") == 123\n    int(1234.0) == 1234\n    int(NA) == 0\n\nNA is translated into 0.\nIf expr is a date, int(expr) computes the number of seconds since the epoch (1970-01-01).\nIf expr is a duration, int(expr) returns the number of nanoseconds.\n\n\n#### float\n\n\n    float(expr)\n\nThe float function converts any scalar expression into an float.\nExamples:\n    float(\"123\") == 123.0\n    float(1234) == 1234.0\n    float(NA) == 0.0\n\nNA is translated into 0.0.\nIf expr is a date, float(expr) computes the number of seconds since the epoch (1970-01-01).\nIf expr is a duration, float(expr) returns the number of seconds.\n\n\n#### hash64\n\n\n    hash64(arg)\n\nArg types:\n\n- _arg_: any\n\n\nExample:\n    hash64(\"foohah\")\n\nCompute the hash of the arg. Arg can be of any type, including a table or a row.\nThe hash is a positive int64 value.\n\n#### land\n\n\n    land(x, y)\n\nArg types:\n\n- _x_, _y_: int\n\nExample:\n    land(0xff, 0x3) == 3\n\nCompute the logical and of two integers.\n\n#### lor\n\nUsage: lor(x, y)\nExample:\n    lor(0xff, 0x3) == 255\n\nCompute the logical or of two integers.\n\n#### isset\n\n\n    isset(x, y)\n\nArg types:\n\n- _x_, _y_: int\n\nExample:\n    isset(0x3, 0x1) == true\n\nCompute whether all bits in the second argument are present in the first.\nUseful for whether flags are set.\n\n### Logical operators\n\n    !expr\n\n### Miscellaneous functions\n\n#### print\n\n\n    print(expr... [,depth:=N] [,mode:=\"mode\"])\n\nPrint the list of expressions to stdout.  The depth parameters controls how\nnested tables are printed.  If depth=0, nested tables are printed as\n\"[omitted]\".  If depth \u003e 0, nested tables are expanded up to that level.  If the\ndepth argument is omitted, print fully expands nested tables to the infinite\ndegree.\n\nThe mode argument controls the print format. Valid values are the following:\n\n- \"default\" prints the expressions in a long format.\n- \"compact\" prints them in a short format\n- \"description\" prints the value description (help message) instead of the values\n  themselves.\n\nThe default value of mode is \"default\".\n\n\n\n\n## Future plans\n\n- Invoking R functions inside GQL.\n\n- Invoking GQL inside R.\n\n- Reading VCF and other non-tsv files.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrailbio%2Fgql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrailbio%2Fgql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrailbio%2Fgql/lists"}