https://github.com/expediadotcom/haystack-tables

This is an EXPERIMENTAL project - not ready for production use.
https://github.com/expediadotcom/haystack-tables

Last synced: about 1 month ago
JSON representation

This is an EXPERIMENTAL project - not ready for production use.

Host: GitHub
URL: https://github.com/expediadotcom/haystack-tables
Owner: ExpediaDotCom
Created: 2019-02-19T16:34:29.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-03-03T10:49:47.000Z (over 7 years ago)
Last Synced: 2025-03-04T13:47:21.268Z (over 1 year ago)
Language: Java
Homepage:
Size: 130 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Architecture

![Architecture](./docs/images/Haystack_Tables.png)

### Getting Started?
Launch the table-allocator dropwizard app that exposes endpoint for creating and listing the views.
The allocator uses kubernetes for running parquet-writers by default. If you are using minikube, make sure it is running and current k8s context points to it.

##### Create a new view:

```
curl -XPOST -H "Content-Type: application/json" -d '
{
"view": "oms",
"select": [
"tags[errorcode]",
"operationname"
],
"where": {
"servicename": "oms"
}
}' "http://localhost:8080/view"
```

##### List all views:

```
curl "http://localhost:8080/views"

Response:

[
{
"createTimestamp": "2019-03-03T10:17:50.000Z",
"lastUpdatedTimestamp": "2019-03-03T10:17:50.866Z",
"query": {
"view": "oms-test",
"select": [
"tags[errorcode]",
"operationname"
],
"where": {
"servicename": "oms"
}
},
"running": true
}
]
```

##### Delete a view:

```
curl -XDELETE "http://localhost:8080/view/oms"
```

### S3 Data
Parquet writer runs independently for each requested view. They put the parquet data under a configured bucket name with following partitoning strategy:

`s3://bucket-name/views/{view-name}/year=2019/month=02/day=03/hour=12/..`

The parquet files are named with the last kafka-offset value of the record in the file itself.

### Athena Tables
Allocator provides an endpoint `/athena/refresh` that takes following action for all the running views:
* Create partitioned table in Athena under haystack_tables database
* Repair the already existing table to add new s3 partitions

We run a cron job that hits this endpoint after every few minutes to make sure the tables are always upto date.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/expediadotcom/haystack-tables

Awesome Lists containing this project

README