https://github.com/sambaiz/athena-admin
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.
https://github.com/sambaiz/athena-admin
athena aws
Last synced: over 1 year ago
JSON representation
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.
- Host: GitHub
- URL: https://github.com/sambaiz/athena-admin
- Owner: sambaiz
- License: mit
- Created: 2017-12-24T13:49:20.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-01-12T00:30:49.000Z (over 8 years ago)
- Last Synced: 2025-03-21T07:11:11.549Z (over 1 year ago)
- Topics: athena, aws
- Language: JavaScript
- Homepage:
- Size: 119 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# athena-admin
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.

```
$ npm install athena-admin
```
```
const AthenaAdmin = require('athena-admin').AthenaAdmin;
const dbDef = require('./sampledatabase.json');
const admin = new AthenaAdmin(dbDef);
await admin.replaceObjects();
await admin.migrate();
await admin.partition();
```
## Database definition
Describe the database definition in the following format.
```
{
"general": {
"athenaRegion": "ap-northeast-1",
"databaseName": "aaaa",
"saveDefinitionLocation": "s3://saveDefinitionBucket/aaaa.json"
},
"tables": {
"sample_data": {
"columns": {
"user_id": "int",
"some_value": { /* = "struct" */
"score": "int",
"category": "string"
},
"some_array1": ["string"], /* = array */
"some_array2": [{ /* = array> */
"aaa": "int",
"bbb": "string"
}]
},
"srcLocation": "s3://src/location/",
"partition": {
"prePartitionLocation": "s3://pre/partition/", /* optional */
"regexp": "(\\d{4})/(\\d{2})/(\\d{2})/", /* optional */
"keys": [
{
"name": "dt",
"type": "string",
"format": "{1}-{2}-{3}", /* optional */
}
]
}
}
}
}
```
### general
| Field | Description |
|:-----------|:------------|
| athenaRegion | Region for Athena |
| databaseName | Athena database name |
| saveDefinitionLocation | Location to save the previous definition |
### tables
- Root field name (sample_data) is a table name.
| Field | Description |
|:-----------|:------------|
| columns | Column name and type pairs. struct<> and array<> can also be described as a json object so you can describe these by converting the actual data values to the type. |
| srcLocation | Location to be refferenced by Athena |
| partition | Partition detectable by key=value prefix.
If objects' location don't have partition's key=value prefix, you can replace from prePartitionLocation to srcLocation by `replaceObjects()`. This is for `partition()` automatically detecting and adding partitions with keys.key as its key and keys.format as its value of keys.type as its type.
keys.format's {n} corresponds to the group of regexp. (e.g. `s3://pre/partition/2017/12/01/00/aaa.png` => `[2017/12/01, 2017, 12, 01]`) |
### API
### replaceObjects(deletePreObject=true, matchedHandler=(matched, objKey, table)=>matched)
Replaces object located in prePartitionLocation to srcLocation with partition key=value prefix.
(e.g. `s3://pre/partition/2017/12/01/00/aaa.png` => `s3://src/location/dt=2017-12-01/00/aaa.png`)
If you need to change the key before this operation, use matchedHandler.
The following example is changing the UTC string to that of TimeZone.
(e.g. `2017/12/01/19` => `2017/12/02/04`)
There are full codes in /sample.
```
const utcToTZ = (matched, objKey, table) => {
let existsDt = false;
table.partition.keys.forEach((key) => {
if (key.name === 'dt') {
existsDt = true;
}
});
if (!existsDt) {
return matched;
}
let tz = moment(`${matched[0]} +00:00`, 'YYYY/MM/DD/HH ZZ');
matched[1] = tz.format('YYYY');
matched[2] = tz.format('MM');
matched[3] = tz.format('DD');
matched[4] = tz.format('HH');
return matched;
};
await admin.replaceObjects(false, utcToTZ);
```
### migrate()
If there are differences from the previous saved definition in S3, create/drop the table or update the schema.
### partition()
Just run `MSCK REPAIR TABLE`. Partition is automatically detected and added by objects' key=value prefix.
## Article
[Athenaのmigrationやpartitionするathena-managerを作った - sambaiz-net](https://www.sambaiz.net/article/145/)