Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/singerla/automizer-data
Stores and browses aggregated data using Prisma database tools
https://github.com/singerla/automizer-data
Last synced: 23 days ago
JSON representation
Stores and browses aggregated data using Prisma database tools
- Host: GitHub
- URL: https://github.com/singerla/automizer-data
- Owner: singerla
- Created: 2021-07-05T07:29:49.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-17T15:02:19.000Z (about 2 months ago)
- Last Synced: 2024-09-17T18:50:09.431Z (about 2 months ago)
- Language: TypeScript
- Size: 531 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Parse, tag and query xlsx-data
This package is used to parse structured tabular data from CSV or XLSX.
Stored data can be browsed and transformed into a desired two-dimensional result table.Its primary purpose is to deliver data for [pptx-automizer](https://github.com/singerla/pptx-automizer).
This project is *Work in progress*.
Nevertheless, you might already use `automizer-data` to handle table files coming from statistical analytics software.
The [example-xlsx](https://github.com/singerla/automizer-data/blob/main/__test__/data/test-data.xlsx) in `__test__/data`-folder is based on [GESStabs](https://gessgroup.de/software/gesstabs/).Storage and querying is done with [Prisma](https://github.com/prisma/prisma) ORM tools.
## Installation
### As a package
If you are working on an existing project, you can add *automizer-data* to it using npm or yarn. Run
```
$ yarn add automizer-data
```
or
```
$ npm install automizer-data
```
in the root folder of your project. This will download and install the most recent version into your existing project.### As a cloned repository
If you want to see how it works and you like to run own tests, you should clone this repository and install the dependencies:
```
$ git clone [email protected]:singerla/automizer-data.git my-project
$ cd my-project
$ yarn install
$ yarn prisma generate
```## Prisma studio
You can open prisma studio and take a look at the data:
```
$ yarn prisma studio
```
A lot of good stuff can be found at [prisma.io](https://www.prisma.io/).# Import Data
According to parser's configuration, parsed data will sliced, tagged and separated into two-dimensional tables.The Database contains:
- __Categories:__ Generic nouns to describe the basic structure of your project
- __Tags:__ Values of a certain category
- __Sheets:__ Two-dimensional tables and their additional infoEach __Sheet__ will contain:
- a collection of rows
- a collection of columns
- the two-dimensional table body
- a collection of tags
- a collection of metadata that came along with the sheet## Example usage
```ts
import { PrismaClient } from '@prisma/client'
import { Parser, Store } from '../src/index';
import { ParserOptions, Tagger, RawResultInfo } from "../src/types";const store = new Store(
new PrismaClient()
)const config = {
// This string separates tables if found in Column A
separator: 'Table Separator',
// A row that fits to any of the strings below will be
// separated into "meta"-field if found in Col A
metaMap: {
base: ['BASE'],
topBox: ['Top-2-Box (1-2)'],
bottomBox: ['Bottom-2-Box (4-5)'],
mean: ['Mean Value']
},
// Rows that equal to one of the labels below will be skipped.
skipRows: [
'company',
'* Annotation',
'(Sum of answers)'
],
// A callback function to be applied to every body row
renderRow: (row: string[]): (number|null)[] => {
return row.map(cell => {
if(cell === ' ') return null
else return Number(cell)
})
},
// The info array of each sub-table will be passed to this
// callback. Tagging can be fine tuned here.
renderTags: (info: RawResultInfo[], tags: Tagger): void => {
info.forEach((info, level) => {
let cat
// info.key contains the info string's original section
// could be 'body' or 'info'
if(info.key === 'body') {
cat = 'vartitle'
} else if(info.value.indexOf('- ') === 0) {
cat = 'measure'
} else if(level === 0) {
// We strip the table separator and pass CountryName
info.value = info.value.replace('Table Separator – ', '')
cat = 'country'
} else if(level === 1) {
cat = 'variable'
} else if(level === 2) {
cat = 'questionText'
}
if(cat) {
tags.push(cat, info.value)
}
})
},
}const parse = new Gesstabs(config)
const file = `${__dirname}/data/test-data.xlsx`
const datasheets = await parse.fromXlsx(filename)
const summary = await store.run(datasheets)
```# Intermediate JSON
Xlsx-Parser will tranform tabular data into an intermediate JSON object. The closer your input data comes to this format, the easier it will be to implement a new parser type.
```json
{
"tags": [
{
"category": "country",
"value": "Norway"
},
{
"category": "variable",
"value": "Q12"
},
{
"category": "category",
"value": "Bar soap"
},
{
"category": "subgroup",
"value": "Age"
},
{
"category": "measure",
"value": "nominal"
}
],
"columns": ["Total", "19-29", "30-39", "40-69"],
"rows": ["answer 1", "answer 2", "answer 3"],
"data": [
[29,18,36,12],
[39,19,24,11],
[19,28,46,10]
],
"meta": {
"significance": [
[null,null,"h",null],
[null,"h","l",null],
[null,null,"h","l"]
]
}
}
```# Query Data
As all the Sheets are tagged, our queries will use tags to find the desired datasets.```ts
import { getData, Store } from '../src';
import { all } from '../src/filter';
import { value } from '../src/cell';// A selector is an array of tags.
const selector = [
{
category: 'country',
value: 'Norway'
},
{
category: "variable",
value: "Q12"
}
]// The grid will define rows, cols and a callback
// to run inside a target cell.
const grid = {
rows: all('row'),
columns: all('column'),
cell: value
}const result = await getData(selector, grid)
// automizer-data will convert the result directly into
// a pptx-automizer-object.
const chartData = result.toSeriesCategories()
```