Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/quintanar401/DCoQ

Dark Corners of Q
https://github.com/quintanar401/DCoQ

Last synced: 2 months ago
JSON representation

Dark Corners of Q

Awesome Lists containing this project

README

        

# Dark corners of q

* [Intro](#intro)
* [Magic value](#magic-value)
* [`";"` as an identity function](#-as-an-identity-function)
* [`"q"` and `"k"` are functions](#q-and-k-are-functions)
* [Mystery of `:`](#mystery-of-)
* [Arbitrary stuff out of comments](#arbitrary-stuff-out-of-comments)
* [A way to get local variables by name in a function](#a-way-to-get-local-variables-by-name-in-a-function)
* [Subvert timeouts via 0 handle and catch blocks](#subvert-timeouts-via-0-handle-and-catch-blocks)
* [Functions with unlimited number of args](#functions-with-unlimited-number-of-args)
* [DSLs on the fly](#dsls-on-the-fly)
* [Lexer in 5 minutes](#lexer-in-5-minutes)
* [Weak pointers in q](#weak-pointers-in-q)
* [Object-oriented q](#object-oriented-q)
* [Function invocation techniques](#function-invocation-techniques)
* [Simple top-down recursive-descent parser](#simple-top-down-recursive-descent-parser)
* [Multiple DBs in one q process](#multiple-dbs-in-one-q-process)

### Intro

There are many hidden features and obscure practices in q. This lists some of them.

### Magic value

The magic value is something that looks like the identity function `(::)` but creates void everywhere. The magic value can be created from any projected function with a missing argument:

```q
q)mv:(value(1;))2
```

At first sight it looks like [Identity](https://code.kx.com/q/ref/identity).

```q
q)string mv
"::"
q)mv[1]
1
q)1 2[mv]
1 2
```

But only until you try it in more complex expressions; then the weird things start to happen. You may think you can't add `(::)` and a number, but wait:

```q
q)mv + 10
+[;10]
```

And you can add `mv` to `mv` too.

```q
q)mv + mv
+[;]
```

`mv` represents some kind of void – if it appears in the function’s argument list this argument will be omitted and the function will become a projection.

```q
q){z}[mv;"1";"2"]
{z}[;"1";"2"]
```

There are some limitations though – a unary function can’t have a projection, so it gets evaluated as usual.

```q
q)null mv
0b
```

`mv` is difficult to handle because it tends to project all functions around. For example, it is not easy to create a list of `mv`s.

```q
q)10#mv
#[10;]
```

Fortunately we can use unary functions that are immune to projection.

```q
q)l:raze 10#enlist mv
q)l[1]+l[2]
+[;]
```

And now we are ready to do something that is extremely difficult to do in the boring, ordinary q without magic – project functions via apply!

```q
q){z} . (;1;2) / will not work!
q){z} . @[3#l;1 2;:;1 2] / works!
{z}[;1;2]
```

We can even project functions dynamically.

```q
q)args:0 2!(10;20)
q)prj:{x . @[(count value[x]1)#l;key y;:;value y]}
q)prj[{z};args]
{z}[10;;20]
```

An important special case – the `enlist` function. We can’t use Apply, because it checks that the number of arguments is less than 9, but we can use `eval`.

```q
q)l:100#l
q)prj2:{eval(enlist),@[x#l;key y;:;value y]}
q)prj2[50;args]
enlist(10;;20;;;....;;)
```

It’s not easy to create such an `enlist` with arbitrary values at random places! Let’s use it to calculate some aggregation function on a list with a random number of nulls that will be filled later.

```q
q)v:100?til[10],1#l / add some mv nulls
q)f:(')[max;eval(enlist),v] / create func, f now is a function of a random number of args!
q){x first 1?100}/[{100";"+-";",*%&|$?#!~^") / multi symbol groups
sg:sg,"e._:=/\\\"`'" / 1 symbol groups
c2g:@[128#0;"j"$sg;:;1+til count sg] / char to group map
```

Then we define a state-transition map `states->new_state onGroupsOfChars`. The small-letter and special-char states are initial states corresponding to the char groups; the capital letter states are token states.

```q
/ A - id, B - dead end, C - sym + :, EF - numbers,floats,times,dates, KL -strings
mst:" "vs/:("aeA A ae0._"; ". A ae"; / identifiers
". E 0";"0E E 0.:a";"0E F e";"F E +0ae."; / Q lexer doesn't care about the exact form of a number/float/date/stamp, it parses first then checks
"< B ="; "/\\'0:+.=<_,C C :"; / <= and >= and ::::: /: \: ': ....
"` S ae0.";"S S ae0.:_";"` T :";"T T ae0.:_/"; / symbols and paths
"\" K *"; "K K *"; "K L \""; "K M \\"; "\" L \""; "\" M \\"; "M K *"; / strings, K-start/middle, L-end, M-\X
"\tW W \t" / merge ws for convenience
);
```

We create a state matrix from these rules.

```q
st:distinct" ",(first each sg),raze mst[;1]; / all matrix states, before A - initial states
mt:{a:st?y;x[a 0;(a 2;::)"*"in y 2]:first a 1;x}/[count[st]#enlist til count sg;mst]
```

The lex function itself is simple.

```q
q)lx:{(where(mt\[0;c2g x]) {[th;dummy]}
ssr[y;z,"?";{("th[`",("gs" f),";`",(-1_3_x),"]"),(neg 1-f:":"=last x)#x}] } / th.aaa: 10 => th[`s;`aaa]10, th.fn[] => th[`g;`fn][]
```

`.objs.regObj` should be called on every raw prototype to preprocess functions and adjust its parent. Note that the prototype itself is not an object. (For brevity, we could make it one.)
`.objs.mf` preprocesses a function using `.objs.subst` to substitute the required substrings in the function’s body.

The next step is to implement `new`.

```q
.objs.c:0
.objs.new:{[o;a] n set (.objs o),(.objs.pp,`objName)!(enlist n;n:`$".objs.o",string .objs.c+:1) / create a new obj
if[`constr in key n;.objs.mth[n][`g;`constr].(),a];
: .objs.mp n;
}
.objs.mth:{{$[y=`s;{if[100=type y;y:.objs.mf y];x set y;y}` sv(x 0),z
y in `v`g;$[(y=`v)|100<>type f:.objs.res[x 0;z];f;`th=first v:value[f]1;f .objs.mth x 0;f];'string y]}x .objs.pp} / internal smart pointer
.objs.mp:{(')[{f:.objs.mth x;$[`set=z 0;f[`s;z 1]. 2_z;1=count z;f[`v;z 0];f[`g;z 0]. 1_z]}[x;x .objs.pp];enlist]} / external smart pointer
.objs.res:{$[1=count n:` vs y;$[y in key x;x y;.z.s[` sv`.objs,x .objs.p;y]];.z.s[` sv`.objs,p;$[0=count p:x .objs.p;'y;n[0] in p;n 1;y]]]} / name resolver
```

It is pretty straightforward: create an object, assign the unique pointer to it to count references and call `constr` if it is defined. Function `.objs.mth` creates
the internal pointer: the pointer to be used inside the object’s functions. `.objs.mp` creates the external pointer with slightly different semantics. The name-resolver function
walks down the prototype chain searching for the required function. You can pass to it a name like `base.var` to force it to start the search from the specific
`base` object. (Useful if a child redefines a variable from its parent.)

It may not be clear from this code, but we are not restricted to explicit prototypes if we want to create an object. We can start from `obj` and then assign to it
all required functions and values. (As in JS, actually.)

Finally, garbage collection (GC).

```q
.objs.rc:{-2+(-16!)x[`objName].objs.pp}
.objs.gc:{if[count k:k where 3=(-16!)each .objs[k:k where (k:key .objs)like"o[0-9]*";.objs.pp];![`.objs;();0b;k];-1 "Deleted: ",string count k]}
```

GC can be called to delete the unused objects. (It doesn't support loops though.) `rc` can be called to check the reference count of an object.

Now examples. First we define prototypes.

```q
.objs.obj.__proto__:()
.objs.obj.show:{[th] show get th.objName}
.objs.obj.pointer:{[th] .objs.mp th.objName}

.objs.animal.constr:{'`abstract}
.objs.animal.says:{[th] 1 th.getName[]," says "};
.objs.animal.getName:{[th] th.name}

.objs.cat.__proto__:`animal
.objs.cat.constr:{[th;color] th.color: color; th.name:"cat"}
.objs.cat.says:{[th] th.animal.says[]; -1 "myau, I'm ",th.color}

.objs.dog.__proto__:`animal
.objs.dog.constr:{[th;toy] th.toy: toy; th.name:"dog"}
.objs.dog.says:{[th] th.animal.says[]; -1 "bark, I like my ",th.toy}
.objs.dog.getName:{[th] upper th.name}
/ we need to register first
.objs.regObj each `obj`animal`cat`dog
```

Note that we use inheritance, polymorphism, virtual functions, abstract classes, constructors, make assignments to object variables, call the parent function - quite a lot! Let’s create objects

```q
q)cat:.objs.new[`cat;enlist "black"]
q)dog:.objs.new[`dog;enlist "bone"]

q)cat[`says;::]
cat says myau, I'm black
q)dog[`says;::]
DOG says bark, I like my bone

/ when you call an external pointer with a name it will return the obj's value as is, object's real name in this case.
q)cat[`objName]
/ we can use set to reassign or create any variable
q)cat[`set;`color;"white"]
q)cat[`says;::]
cat says myau, I'm white
/ lets assign a function - 100% object oriented
q)cat[`set;`hiss;{[th;name] th.animal.says[]; -1 "hiss, go away ",name}]
q)cat[`hiss;"Bill"]
cat says hiss, go away Bill

/ finally we can delete dog and call gc
q)dog:0
q).objs.gc[] / will delete its object
```

This version doesn’t support other objects as prototypes, but it is easy to implement: extract the object’s name, and save a pointer to it in some variable inside the new
object to ensure that the prototype will not be deleted.

### Function invocation techniques

The Converge iterators accept only a unary argument, but this can be bypassed with Apply.

```q
q)({(x+z;y+z;0|z-1)}.)\[1 2 3]
1 2 3
4 5 2
6 7 1
7 8 0
```
For a named function the syntax is even simpler.

```q
q)f: {(x+z;y+z;0|z-1)}.
q)f/[1 2 3]
7 8 0
```

A derived function can be applied as a binary with infix syntax, or as a unary with prefix syntax.

```q
{x<10}{x+1}/3 / {x+1}/[{x<10};3]
({x+y}/)1 2 3 / {x+1}/[1 2 3]
```

To create projections in `eval` expressions, use the magic value.

```q
eval ({x+y};;1) / wrong
eval ({x+y};enlist mv;1) / ok!
```

### Simple top-down recursive-descent parser

The parser is based on ideas from the `sp.q` file for the ODBC3 driver provided by KX.

The KX parser uses the q `parse` keyword to create the initial tree of a parsed grammar. This adds some inconvenient restrictions on the grammar language, so I create my own parser for
the grammar language to make it more similar to Yacc, ANTLR and etc.

Each grammar consists of rules like:

```q
ruleName: ruleDef
```

The simplest definition blocks are:

```q
"select" - constant
`select - another way to provide a constant
ruleName - reference to some grammar rule
NAME - token name (capital letters).
```

More complex definitions look like:

```q
X | X ... - OR rule. The subrules are executed from left to right until one of them succeeds
X X ... - AND rule. All subrules are executed from left to right and all must succeed
(X ...) - subrule
X? - 0 or 1 rule
X* - 0 or more rule
X+ - 1 or more rule
{fn} - action fn will be inserted into the parse fn
X 135 - shortcut for {:(v1;v3;v5)}, where vN are references to the parsed exprs
```

When there is no action specified for some rule or subrule, the result of the rule will conform to its definition. That is, `ID "=" expr` will produce `(id;"=";expr)`.

First of all, we need to remove q functions from the grammar rules to simplify the parse functions. I cut all functions from the input string and save them in variable `.g.A`.
I replace them with `.g.A` references like `1h`. The `h` type is used to distinguish them from the action shortcuts:

```q
.g.A:()
.g.fext:{if[count w:where differ[0;]1&sums(not{$[x=2;1;x;$["\""=y;0;"\\"=y;2;1];1*"\""=y]}\[0;x])*(neg"}"=x)+"{"=x;ca:count .g.A;.g.A,:a:value each x b:a+til each d:1+w[;1]-a:(w:(0N 2)#w)[;0];x:@[;;:;]/[x;b;d$(string ca+til count a),\:"h"]];x}
```

Next I need to split the rules into tokens. This can be done with the `-4!` internal function that splits a string into q string tokens. Then I remove spaces and apply `value` to each string to get
the tokens: `"xxx"` and `` `xxx`` are mapped to strings, special ops to chars, rule names to symbols, token names to enlisted symbols, numbers to numbers:

```q
.g.val:{{$[(f:first x)in .Q.n;$[-7=type x:value x;[.g.A,:value{"{:(",x,")}"}";"sv"v",/:string x;"h"$-1+count .g.A];x];f="\"";(),value x;f="`";1_x;f in "?!|+*():";f;$[all x in .Q.A;enlist;::]`$x]}each t where not enlist[" "]~/:t:-4!x}
```

Top-level rule-parse functions are very simple: find the rule name, and execute OR rule parser. Its result will be the rule function.

```q
.g.prule:{if[not(-11=type x 0)&":"~x 1;'"rule fmt"]; .g[x 0]:.g.por "|",2_x};
.g.pflt:{not sums(x~\:"(")+-1*prev x~\:")"}; / utility to filter (...) exprs
```

OR parser: try each alternative until one succeeds, otherwise return `.g.err`:

```q
.g.por:{i:where(x~\:"|")&.g.pflt x; value {"{i:.g.i;\n ",(";\n "sv x),";\n `.g.err}"}{"if[not`.g.err~v:",x," x; :v]; .g.i:i"}each string .g.pand each 1_'i cut x};
```

The AND rule must take into account `()`, `*`, `+` and other expressions. `.g.pval` prepares AND atoms and `.g.pexp` takes care of `*`, `+` and `?`.

```q
.g.pand:{v:1_{x,$["("~first y;.g.por "|",1_-1_y;y]}/[();(where differ .g.pflt x)cut x:"|",x]; value {"{",y,";\n (",(";"sv"v",/:string 1+til max x),")}"}[s]";\n "sv .g.pexp'[s:sums(type each first each v)in 10 11 -11 100h;v:(where differ sums not any each v~/:\:"+?*")cut v]}
.g.pval:{if[-5=t:type x;:-1_1_string .g.A x];"if[",$[-11=t;"`.g.err~v:.g.",string[x]," x";11=t;"[v:.g.t[0;.g.i];not .g.t[1;.g.i]~`",string[x 0],"]";10=t;"not(v:.g.t[0;.g.i])~",$[1=count x;{"(),",1_x};::].Q.s1 x;100=t;"`.g.err~v:",string[x]," x";'"unexp"],";:`.g.err]",$[t in -11 100h;"";"; .g.i+:1"]}
.g.pexp:{v:.g.pval y 0; n:"v",string x; if[1=count y;:v,$[-5=type y 0;"";"; ",n,":v"]]; v:"`.g.err~v:{",v,"; v}[x]"; $["?"=y 1;n,":$[",v,";();v]";n,":",$["+"=y 1;"$[",v,";:`.g.err;enlist v]";"()"],"; while[not ",v,";",n,":",n,",enlist v]"]}
```

Finally we need some functions to do the actual parsing:

```q
.g.e:{.g.prule .g.val .g.fext x}
.g.p:{.g.t:y; .g.i:0; if[(.g.i<>count .g.t 0)|`.g.err~v:.g[x][];'"parse error, last token ",.Q.s1 .g.t[0;.g.i]]; v}
.g.l:{(x;{$["\""=first x;`STR;x[0]in .Q.n;`NUM;x[0]in".",.Q.a,.Q.A;`ID;`]}each x:x where not (()," ")~/:x:-4!x)}
```

`.g.e` can be used to define the grammar (see below). `.g.l` is a simple lexer based on the q lexer. `.g.p` is a simple, eager parse function that expects a top rule name and tokens.

Let’s now create a simple parser for a simple functional language:

```q
g)top: {x:()!()} expr
g)expr: "let" ID "=" aexpr "in" {x[`$v2]:v4} expr 6 | aexpr
g)aexpr: mexpr (("+" {:(+)} |"-" {:(-)}) mexpr)* {:{y[0][x;y 1]}/[v1;v2]}
g)mexpr: atom (("*" {:(*)} |"/" {:(%)}) atom)* {:{y[0][x;y 1]}/[v1;v2]}
g)atom: NUM {: value v1} | ID {:$[(vv:`$v1)in key x;x vv;'vv]} | "(" expr ")" 2
.c.e:{.g.p[`top] .g.l x}
```

We allow simple arithmetic expressions with priorities and the variable binding via `let var = .. in ...`. `.c.e` is the parser and evaluator. For example:

```q
c)10+20 / -> 30
c)2*3+5*2 / -> 16
c)let a=10 in let b = 20 in 10*(a+b) / -> 300
c)let a=10 in b*a / exception
c)10+(let b = 12*6+1 in b-10) / ->73
c)10++ / exception, the parser will return 10 and detect that not all input is consumed
\ts:100 .c.e "+"sv 1000#enlist "(12*6+1)" / 5.6sec, (100*9000)%5.6 -> 161.000 bytes per sec. Not bad for such primitive parser
```

Full parser generator code:

```q
.g.A:()
.g.fext:{if[count w:where differ[0;]1&sums(not{$[x=2;1;x;$["\""=y;0;"\\"=y;2;1];1*"\""=y]}\[0;x])*(neg"}"=x)+"{"=x;ca:count .g.A;.g.A,:a:value each x b:a+til each d:1+w[;1]-a:(w:(0N 2)#w)[;0];x:@[;;:;]/[x;b;d$(string ca+til count a),\:"h"]];x}
.g.val:{{$[(f:first x)in .Q.n;$[-7=type x:value x;[.g.A,:value{"{:(",x,")}"}";"sv"v",/:string x;"h"$-1+count .g.A];x];f="\"";(),value x;f="`";1_x;f in "?!|+*():";f;$[all x in .Q.A;enlist;::]`$x]}each t where not enlist[" "]~/:t:-4!x}
.g.prule:{if[not(-11=type x 0)&":"~x 1;'"rule fmt"]; .g[x 0]:.g.por "|",2_x}
.g.pflt:{not sums(x~\:"(")+-1*prev x~\:")"} / utility to filter (...) exprs
.g.por:{i:where(x~\:"|")&.g.pflt x; value {"{i:.g.i;\n ",(";\n "sv x),";\n `.g.err}"}{"if[not`.g.err~v:",x," x; :v]; .g.i:i"}each string .g.pand each 1_'i cut x}
.g.pand:{v:1_{x,$["("~first y;.g.por "|",1_-1_y;y]}/[();(where differ .g.pflt x)cut x:"|",x]; value {"{",y,";\n (",(";"sv"v",/:string 1+til max x),")}"}[s]";\n "sv .g.pexp'[s:sums(type each first each v)in 10 11 -11 100h;v:(where differ sums not any each v~/:\:"+?*")cut v]}
.g.pval:{if[-5=t:type x;:-1_1_string .g.A x];"if[",$[-11=t;"`.g.err~v:.g.",string[x]," x";11=t;"[v:.g.t[0;.g.i];not .g.t[1;.g.i]~`",string[x 0],"]";10=t;"not(v:.g.t[0;.g.i])~",$[1=count x;{"(),",1_x};::].Q.s1 x;100=t;"`.g.err~v:",string[x]," x";'"unexp"],";:`.g.err]",$[t in -11 100h;"";"; .g.i+:1"]}
.g.pexp:{v:.g.pval y 0; n:"v",string x; if[1=count y;:v,$[-5=type y 0;"";"; ",n,":v"]]; v:"`.g.err~v:{",v,"; v}[x]"; $["?"=y 1;n,":$[",v,";();v]";n,":",$["+"=y 1;"$[",v,";:`.g.err;enlist v]";"()"],"; while[not ",v,";",n,":",n,",enlist v]"]}
.g.e:{.g.prule .g.val .g.fext x}
.g.p:{.g.t:y; .g.i:0; if[(.g.i<>count .g.t 0)|`.g.err~v:.g[x][];'"parse error, last token ",.Q.s1 .g.t[0;.g.i]]; v}
.g.l:{(x;{$["\""=first x;`STR;x[0]in .Q.n;`NUM;x[0]in".",.Q.a,.Q.A;`ID;`]}each x:x where not (()," ")~/:x:-4!x)}
```

### Multiple DBs in one q process

By default it is possible to load only one historical DB at a time. If you load another, all data from the previous one will be
lost. Fortunately, all DB state is stored in the `.Q` namespace, and in the HDB tables in the root namespace, and you can save it
in some variable and restore it later. Here are some simple functions that do exactly this:

```q
.db.db:.db.dbm:(0#`)!(); .db.cdb:`
.db.sdb:{if[not null .db.cdb;.Q[`sym`date]:(sym;date);.db.db[.db.cdb]:.Q;![`.;();0b;key .Q.dbt]]}
.db.ldb:{if[x in key .db.db;.db.cdb:x;.Q:.db.db x;`sym`date set' .Q`sym`date;system"cd ",.Q.dbp;set'[key .Q.dbt;value .Q.dbt]]}
.q.dbl:{.db.sdb[];system "l ",x;.db.cdb:last` vs`$":",x;.Q.dbp:x;.db.dbm[(` sv/: .db.cdb,/:t),t where{$[98=type x:get x;-11=type value flip x;0b]}each t:tables[]]:.db.cdb;.Q.dbt:t!get each t;}
.q.db:{if[null d:.db.dbm x;'x];if[not .db.cdb=d;.db.sdb[];.db.ldb d];last ` vs x}
```

All we need to do is to save accurately all DB-related info: the `.Q` namespace itself, the DB path, tables, sym and date vectors. To load a DB, instead
of `system "l path"` we should call

```q
dbl"/kdb/history/fx"
```

`dbl` will save the previous DB’s state (if any), clear the space for the new DB, load it and save its basic parameters. After
all DBs are loaded we can easily switch between them using the `db` function:

```q
db`trade
db`fx.trade
```

`db` requires a table’s name, finds the first DB that has this table and restores its state. It returns the table so it is
possible to use it in `select` as:

```q
select from db[`trade] where date=.z.D-1
```