{"id":13760228,"url":"https://github.com/StanfordHCI/datavore","last_synced_at":"2025-05-10T10:31:57.301Z","repository":{"id":2708788,"uuid":"3702897","full_name":"StanfordHCI/datavore","owner":"StanfordHCI","description":"A small, fast, in-browser database engine written in JavaScript.","archived":false,"fork":false,"pushed_at":"2021-10-30T09:40:16.000Z","size":175,"stargazers_count":247,"open_issues_count":4,"forks_count":44,"subscribers_count":27,"default_branch":"master","last_synced_at":"2024-08-03T13:04:28.919Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://vis.stanford.edu/projects/datavore","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StanfordHCI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-03-13T03:28:58.000Z","updated_at":"2024-07-30T19:41:52.000Z","dependencies_parsed_at":"2022-08-29T10:51:01.710Z","dependency_job_id":null,"html_url":"https://github.com/StanfordHCI/datavore","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanfordHCI%2Fdatavore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanfordHCI%2Fdatavore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanfordHCI%2Fdatavore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanfordHCI%2Fdatavore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StanfordHCI","download_url":"https://codeload.github.com/StanfordHCI/datavore/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224949729,"owners_count":17397225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T13:01:05.698Z","updated_at":"2024-11-16T17:30:49.898Z","avatar_url":"https://github.com/StanfordHCI.png","language":"JavaScript","funding_links":[],"categories":["Storage","JavaScript"],"sub_categories":["Runner"],"readme":"# Datavore\n\n**Datavore** is a small in-browser database engine written in JavaScript.\nDatavore enables you to perform fast aggregation queries within web-based \nanalytics or visualization applications. Datavore consists of an in-memory\ncolumn-oriented database implemented using standard JavaScript arrays. The\nsystem provides support for filtering and group-by aggregation queries. When\nrun within an optimized JavaScript environment, Datavore can complete queries\nover million-element data tables at interactive (sub-100ms) rates.\n\n### Getting Started\n\nSimply reference the script `dv.js` within your web page to import Datavore.\nThe included example files include demonstrations of Datavore's functionality\nalong with performance benchmarks. The `profile` example shows how Datavore\ncan be used to support high-performance brushing and linking among\nvisualizations using the [D3](http://github.com/mbostock/d3) framework.\n\n### Creating A Datavore Table\n\nA Datavore table is simply a collection of data columns, each realized as a\nJavaScript array. To create a table instance, you can either initialize the\nfull table through the constructor or add columns one-by-one. For instance:\n\n    var colA = [\"a\",\"a\",\"b\",\"b\",\"c\"];\n    var colB = [0,1,2,3,4];\n    \n    // create a table in one call by bundling up columns\n    var tab1 = dv.table([\n        {name:\"A\", values:colA, type:dv.type.nominal},\n        {name:\"B\", values:colB, type:dv.type.numeric}\n    ]);\n\n    // create a table adding one column at a time\n    // the resulting 'tab2' should be identical to 'tab1'\n    var tab2 = dv.table();\n    tab2.addColumn(\"A\", colA, dv.type.nominal);\n    tab2.addColumn(\"B\", colB, dv.type.numeric);\n\nIn addition to the column name and array of values, each column must have a\nspecified data type, one of `dv.type.nominal`, `dv.type.ordinal`, \n`dv.type.numeric`, or `dv.type.unknown`. Numeric means the column contains numbers\nthat can be aggregated (e.g., summed, averaged, etc). Nominal values are\ncategory labels without a meaningful sort order, while ordinal values can be \nmeaningfully sorted.\n\nDatavore treats nominal and ordinal data in a special way: it recodes the\ninput array values as zero-based integers (much like a\n[star schema](http://en.wikipedia.org/wiki/Star_schema)). The unique values\nin the input array are sorted and placed into a lookup table. Mapping strings\nand other data types to integer codes enables faster query performance.\n\n### Accessing Table Values\n\nYou can access values within a Datavore table directly via array indices or\nthrough the table `get` method. For nominal or ordinal types, direct access will \nreturn coded integers. The `get` method always returns the original value.\n\n    // both array indices and the \"get\" method use (column, row) ordering\n    alert(tab1[0][1]);    // 1st column, 2nd row, coded   --\u003e prints \"0\"\n    alert(tab1.get(0,1)); // 1st column, 2nd row, uncoded --\u003e prints \"a\"\n\n    // directly accessing the lookup table (lut) to decode a value\n    // included for demo purposes only; use the \"get\" method instead!\n    // 1st column, 2nd row, uncoded --\u003e prints \"a\"\n    alert(tab1[0].lut[tab1[0][1]]);\n\nYou can either access columns by their numerical index (as above) or by name:\n\n    // accessing table values by column name\n    alert(tab1[\"A\"][1]);    // 1st column, 2nd row, coded   --\u003e prints \"0\"\n    alert(tab1.get(\"A\",1)); // 1st column, 2nd row, uncoded --\u003e prints \"a\"\n\n**WARNING**: *Datavore column names should NOT be numbers.* If you use column \nnames that JavaScript can interpret as integer values (\"00\") you will likely\nexperience unexpected (and undesirable) behavior.\n\n### Filtering Queries\n\nDatavore tables support two kinds of queries: filtering operations and\ngroup-by aggregation. Filtering queries simply filter table contents\naccording to a predicate function; these are similar to simple SQL queries\nwith a WHERE clause. The filtering function takes a table instance and row\nnumber as arguments and returns a new Datavore table instance.\n\n    // creates a new table with 3 rows: [[\"b\",\"b\",\"c\"], [2,3,4]]\n    var filtered_table = tab1.where(function(table, row) {\n        return table.get(\"B\", row) \u003e 1;\n    });\n\n*NOTE*: To ensure that tables created by various filtering queries are\ncompatible with each other, nominal and ordinal columns within the result\ntables will always have the same lookup table as the original table, even if\nsome unique values have been completely filtered out. As a result you may\nsee some unexpected zero values returned when running dense aggregation\nqueries on filtered tables.\n\n### Aggregation Queries\n\nThe primary use case for Datavore is running aggregation queries. These queries\nallow you to calculate counts, sums, averages, standard deviations, and minimum\nor maximum values for a column, optionally grouped according to nominal or\nordinal dimensions. These queries are similar to SQL queries with group-by clauses.\n\n    // count all rows in the table -\u003e returns [[5]]\n    var counts = tab1.query({vals:[dv.count()]});\n\n    // count rows and sum values in 2nd column, grouped by 1st column\n    // returns -\u003e [[\"a\",\"b\",\"c\"], [2,2,1], [1,5,4]]]\n    var groups = tab1.query({dims:[0], vals:[dv.count(), dv.sum(1)]});\n\n    // same as before, but now with extra parameter \"code:true\"\n    // nominal/ordinal types remain coded integers, NOT original values\n    // returns -\u003e [[0,1,2], [2,2,1], [1,5,4]]]\n    var uncode = tab1.query({dims:[0], vals:[dv.count(), dv.sum(1)], code:true});\n\n    // count all table rows where first column != \"a\"\n    // returns -\u003e [[\"a\",\"b\",\"c\"], [0,2,1]]\n    var filter = tab1.query({dims:[0], vals:[dv.count()], where:\n        function(table, row) { return table.get(\"A\",row) != \"a\"; }\n    });\n\nThe return value of the `query` method is an array of arrays. Note that the\nreturn value is *not* a Datavore table object. The input to the query method\nshould be a JavaScript object with up to four parameters: `vals` (required),\n`dims`, `where`, and `code`.\n\nThe `vals` parameter indicates the aggregation functions to run. The\navailable operators are `dv.count`, `dv.sum`, `dv.min`, `dv.max`, `dv.avg`,\n`dv.variance`, and `dv.stdev`. All aggregation operators accept a single column\nindex or name as input (except for `dv.count`, which ignores any input).\n\nThe `dims` parameter indicates the dimensions to group by. This\nshould be an array containing column indices, column names or special dimension\nquery operators (`dv.bin` or `dv.quantile`).\n\nThe `where` parameter specifies a predicate function for filtering the\ntable (as in `where` queries). Filtering is performed *prior* to aggregation.\n\nIf true, the `code` parameter indicates that nominal and ordinal values\nshould be left as coded integers. If false (the default), coded integers are\nmapped back to the original values in the query result arrays.\n\n#### Dense Queries vs. Sparse Queries\n\nThe standard aggregate query uses a *dense* representation of the resulting\ndata space. What this means is that all dimensions are realized, even if the\nresulting aggregate values are zero. So if you group by columns A and B,\nand column A has 3 unique values and column B has 4 unique values, then\nthe resulting aggregate table will have 3*4=12 rows, including zero values.\n\nDatavore also supports a *sparse* representation that does not include rows\nfor zero values. To use a sparse representation, use the `sparse_query`\nfunction, like so:\n\n    // non-zero counts of all table rows where first column != \"a\"\n    // returns -\u003e [[\"b\",\"c\"], [2,1]]\n    var sparse = tab1.sparse_query({dims:[0], vals:[dv.count()], where:\n        function(table, row) { return table.get(\"A\",row) != \"a\"; }\n    });\n\nSo why the different query types? Dense queries can be calculated faster\n\u0026ndash; by \"materializing\" the full dimensionality of the aggregated data one\ncan use an array to store all the intermediate results. The sparse\nrepresentation instead uses an associative array (a JavaScript object\ninstance), which induces a higher overhead for object value lookups. On the\nother hand, dense queries over high-dimensional data can produce very large\nresult arrays; sometimes these can be too large to fit in the browser's memory\nfootprint. So, if you are dealing with high-dimensional aggregates (concretely,\nif the product of the set sizes of your group-by dimensions is \u003e 100,000 rows)\nyou should consider using `sparse_query`. However, if the total number of\naggregate rows is reasonable (as is typically the case), or you want to\nexplicitly include zero-valued cells, use the normal `query` method for faster\nperformance.\n\n*NOTE:* Dense queries are processed by the `dense_query` function. The\n`query` function is simply an alias for `dense_query`.\n\n### Extensibility\n\nDatavore can be extended with new dimensional and (with some effort)\naggregate operators. To create your own dimensional operator, view the source\ncode for `dv.bin` and `dv.quantile`, and follow their example. Adding new\naggregate operators is possible but more complex. You will need to add a new\nmodule (following in the foot steps of `dv.sum`, `dv.avg`, etc) and add new\nlogic to the inner loop of the query processor (for both dense and sparse\nqueries). *This is not for the faint of heart!* The query processor avoids\nmaking function calls within its inner loop \u0026mdash; this helps make Datavore\nmuch faster, but at some cost to extensibility. You will have to modify the\nguts of the engine to add new aggregate operators.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStanfordHCI%2Fdatavore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FStanfordHCI%2Fdatavore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStanfordHCI%2Fdatavore/lists"}