Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/uwdata/arquero
Query processing and transformation of array-backed data tables.
https://github.com/uwdata/arquero
arrays data database dataframe query table transform
Last synced: 2 days ago
JSON representation
Query processing and transformation of array-backed data tables.
- Host: GitHub
- URL: https://github.com/uwdata/arquero
- Owner: uwdata
- License: bsd-3-clause
- Created: 2020-09-01T12:48:01.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-11-06T17:27:19.000Z (about 1 month ago)
- Last Synced: 2024-12-03T13:02:51.631Z (9 days ago)
- Topics: arrays, data, database, dataframe, query, table, transform
- Language: JavaScript
- Homepage: https://idl.uw.edu/arquero/
- Size: 1.26 MB
- Stars: 1,327
- Watchers: 29
- Forks: 64
- Open Issues: 37
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dataframes - Arquero - A JavaScript library for query processing and transformation of array-backed data tables. Following the relational algebra and inspired by the design of dplyr, Arquero provides a fluent API for manipulating column-oriented data frames. (Libraries)
- awesome-dataviz-education - Arquero - backed data tables. (JavaScript Libraries)
- awesome-starred - uwdata/arquero - Query processing and transformation of array-backed data tables. (database)
README
**Arquero** is a JavaScript library for query processing and transformation of array-backed data tables. Following the [relational algebra](https://en.wikipedia.org/wiki/Relational_algebra) and inspired by the design of [dplyr](https://dplyr.tidyverse.org/), Arquero provides a fluent API for manipulating column-oriented data frames. Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations.
* Fast: process data tables with million+ rows.
* Flexible: query over arrays, typed arrays, array-like objects, or [Apache Arrow](https://arrow.apache.org/) columns.
* Full-Featured: perform a variety of wrangling and analysis tasks.
* Extensible: add new column types or functions, including aggregate & window operations.
* Lightweight: small size, minimal dependencies.To get up and running, start with the [Introducing Arquero](https://observablehq.com/@uwdata/introducing-arquero) tutorial, part of the [Arquero notebook collection](https://observablehq.com/collection/@uwdata/arquero).
Have a question or need help? Visit the [Arquero GitHub repo](https://github.com/uwdata/arquero) or post to the [Arquero GitHub Discussions board](https://github.com/uwdata/arquero/discussions).
Arquero is Spanish for "archer": if datasets are [arrows](https://arrow.apache.org/), Arquero helps their aim stay true. 🏹 Arquero also refers to a goalkeeper: safeguard your data from analytic "own goals"! 🥅 ✋ ⚽
## API Documentation
* [Top-Level API](https://idl.uw.edu/arquero/api) - All methods in the top-level Arquero namespace.
* [Table](https://idl.uw.edu/arquero/api/table) - Table access and output methods.
* [Verbs](https://idl.uw.edu/arquero/api/verbs) - Table transformation verbs.
* [Op Functions](https://idl.uw.edu/arquero/api/op) - All functions, including aggregate and window functions.
* [Expressions](https://idl.uw.edu/arquero/api/expressions) - Parsing and generation of table expressions.
* [Extensibility](https://idl.uw.edu/arquero/api/extensibility) - Extend Arquero with new expression functions or table verbs.## Example
The core abstractions in Arquero are *data tables*, which model each column as an array of values, and *verbs* that transform data and return new tables. Verbs are table methods, allowing method chaining for multi-step transformations. Though each table is unique, many verbs reuse the underlying columns to limit duplication.
```js
import { all, desc, op, table } from 'arquero';// Average hours of sunshine per month, from https://usclimatedata.com/.
const dt = table({
'Seattle': [69,108,178,207,253,268,312,281,221,142,72,52],
'Chicago': [135,136,187,215,281,311,318,283,226,193,113,106],
'San Francisco': [165,182,251,281,314,330,300,272,267,243,189,156]
});// Sorted differences between Seattle and Chicago.
// Table expressions use arrow function syntax.
dt.derive({
month: d => op.row_number(),
diff: d => d.Seattle - d.Chicago
})
.select('month', 'diff')
.orderby(desc('diff'))
.print();// Is Seattle more correlated with San Francisco or Chicago?
// Operations accept column name strings outside a function context.
dt.rollup({
corr_sf: op.corr('Seattle', 'San Francisco'),
corr_chi: op.corr('Seattle', 'Chicago')
})
.print();// Aggregate statistics per city, as output objects.
// Reshape (fold) the data to a two column layout: city, sun.
dt.fold(all(), { as: ['city', 'sun'] })
.groupby('city')
.rollup({
min: d => op.min(d.sun), // functional form of op.min('sun')
max: d => op.max(d.sun),
avg: d => op.average(d.sun),
med: d => op.median(d.sun),
// functional forms permit flexible table expressions
skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
})
.objects()
```## Usage
### In Browser
To use in the browser, you can load Arquero from a content delivery network:
```html
```
Arquero will be imported into the `aq` global object. The default browser bundle does not include the [Apache Arrow](https://arrow.apache.org/) library. To perform Arrow encoding using [toArrow()](https://idl.uw.edu/arquero/api/#toArrow) or binary file loading using [loadArrow()](https://idl.uw.edu/arquero/api/#loadArrow), import Apache Arrow first:
```html
```
Alternatively, you can build and import `arquero.min.js` from the `dist` directory, or build your own application bundle. When building custom application bundles for the browser, the module bundler should draw from the `browser` property of Arquero's `package.json` file. For example, if using [rollup](https://rollupjs.org/), pass the `browser: true` option to the [node-resolve](https://github.com/rollup/plugins/tree/master/packages/node-resolve) plugin.
Arquero uses modern JavaScript features, and so will not work with some outdated browsers. To use Arquero with older browsers including Internet Explorer, set up your project with a transpiler such as [Babel](https://babeljs.io/).
### In Node.js or Application Bundles
First install `arquero` as a dependency, for example via `npm install arquero --save`.
Arquero assumes Node version 18 or higher.
As of Arquero version 6, the library uses type `module` and should be loaded using ES module syntax.Import using ES module syntax, import all exports into a single object:
```js
import * as aq from 'arquero';
```Import using ES module syntax, with targeted imports:
```js
import { op, table } from 'arquero';
```Dynamic import (e.g., within a Node.js REPL):
```js
aq = await import('arquero');
```## Build Instructions
To build and develop Arquero locally:
- Clone [https://github.com/uwdata/arquero](https://github.com/uwdata/arquero).
- Run `npm i` to install dependencies.
- Run `npm test` to run test cases, `npm run perf` to run performance benchmarks, and `npm run build` to build output files.