Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yougov/pinot-client-rust
Rust client library to query Apache Pinot.
https://github.com/yougov/pinot-client-rust
Last synced: about 2 months ago
JSON representation
Rust client library to query Apache Pinot.
- Host: GitHub
- URL: https://github.com/yougov/pinot-client-rust
- Owner: yougov
- License: apache-2.0
- Created: 2022-04-28T17:25:05.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-08-18T12:16:14.000Z (over 2 years ago)
- Last Synced: 2024-04-24T00:41:39.353Z (8 months ago)
- Language: Rust
- Size: 279 KB
- Stars: 3
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Pinot Client Rust
===============---
A rust library to query Apache Pinot.
Installing Pinot
================To install Pinot locally, please follow this [Pinot Quickstart](https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally) link to install and start Pinot batch quickstart locally.
```
bin/quick-start-batch.sh
```Alternatively, the docker contained Pinot database ochestrated by this repository's `docker-compose.yaml` file may be used.
```bash
make prepare-pinot
```Examples
========Check out Client library Github Repo
```bash
git clone [email protected]:yougov/pinot-client-rust.git
cd pinot-client-rust
```Start up the docker contained pinot database
```base
make prepare-pinot
```Build and run an example application to query from Pinot
```bash
cargo run --example pql-query
cargo run --example sql-query-deserialize-to-data-row
cargo run --example sql-query-deserialize-to-struct
```Usage
=====Create a Pinot Connection
-------------------------Pinot client could be initialized through:
1. Zookeeper Path.
```rust
let client = pinot_client_rust::connection::client_from_zookeeper(
&pinot_client_rust::zookeeper::ZookeeperConfig::new(
vec!["localhost:2181".to_string()],
"/PinotCluster".to_string(),
),
None
);
```2. A list of broker addresses.
```rust
let client = pinot_client_rust::connection::client_from_broker_list(
vec!["localhost:8099".to_string()], None);
```### Asynchronous Queries
An asynchronous connection can be established with `pinot_client_rust::async_connection::AsyncConnection` for
which exist equivalents to the above described synchronous instantiation methods.Query Pinot
-----------Please see this [example](https://github.com/yougov/pinot-client-rust/blob/master/examples/sql-query-deserialize-to-data-row.rs) for your reference.
Code snippet:
```rust
fn main() {
let client = pinot_client_rust::connection::client_from_broker_list(
vec!["localhost:8099".to_string()], None).unwrap();
let broker_response = client.execute_sql::(
"baseballStats",
"select count(*) as cnt, sum(homeRuns) as sum_homeRuns from baseballStats group by teamID limit 10"
).unwrap();
if let Some(stats) = broker_response.stats {
log::info!(
"Query Stats: response time - {} ms, scanned docs - {}, total docs - {}",
stats.time_used_ms,
stats.num_docs_scanned,
stats.total_docs,
);
}
}
```Response Format
---------------Query Responses are defined by one of two broker response structures.
SQL queries return `SqlResponse`, whose generic parameter is supported by all structs implementing the
`FromRow` trait, whereas PQL queries return `PqlResponse`.
`SqlResponse` contains a `Table`, the holder for SQL query data, whereas `PqlResponse` contains
`AggregationResults` and `SelectionResults`, the holders for PQL query data.
Exceptions for a given request for both `SqlResponse` and `PqlResponse` are stored in the `Exception` array.
Stats for a given request for both `SqlResponse` and `PqlResponse` are stored in `ResponseStats`.### Common
`Exception` is defined as:
```rust
/// Pinot exception.
#[derive(Clone, Debug, Deserialize, Eq, PartialEq)]
pub struct PinotException {
#[serde(rename(deserialize = "errorCode"))]
pub error_code: i32,
pub message: String,
}
````ResponseStats` is defined as:
```rust
/// ResponseStats carries all stats returned by a query.
#[derive(Clone, Debug, PartialEq)]
pub struct ResponseStats {
pub trace_info: HashMap,
pub num_servers_queried: i32,
pub num_servers_responded: i32,
pub num_segments_queried: i32,
pub num_segments_processed: i32,
pub num_segments_matched: i32,
pub num_consuming_segments_queried: i32,
pub num_docs_scanned: i64,
pub num_entries_scanned_in_filter: i64,
pub num_entries_scanned_post_filter: i64,
pub num_groups_limit_reached: bool,
pub total_docs: i64,
pub time_used_ms: i32,
pub min_consuming_freshness_time_ms: i64,
}
```### PQL
`PqlResponse` is defined as:
```rust
/// PqlResponse is the data structure for broker response to a PQL query.
#[derive(Clone, Debug, PartialEq)]
pub struct PqlResponse {
pub aggregation_results: Vec,
pub selection_results: Option,
pub stats: Option,
}
```### SQL
`SqlResponse` is defined as:
```rust
/// SqlResponse is the data structure for a broker response to an SQL query.
#[derive(Clone, Debug, PartialEq)]
pub struct SqlResponse {
pub table: Option>,
pub stats: Option,
}
````Table` is defined as:
```rust
/// Table is the holder for SQL queries.
#[derive(Clone, Debug, PartialEq)]
pub struct Table {
schema: Schema,
rows: Vec,
}
````Schema` is defined as:
```rust
/// Schema is response schema with a bimap to allow easy name <-> index retrieval
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Schema {
column_data_types: Vec,
column_name_to_index: bimap::BiMap::,
}
```There are multiple functions defined for `Schema`, like:
```
fn get_column_count(&self) -> usize;
fn get_column_name(&self, column_index: usize) -> Result<&str>;
fn get_column_index(&self, column_name: &str) -> Result;
fn get_column_data_type(&self, column_index: usize) -> Result;
fn get_column_data_type_by_name(&self, column_name: &str) -> Result;
````DataType` is defined as:
```rust
/// Pinot native types
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum DataType {
Int,
Long,
Float,
Double,
Boolean,
Timestamp,
String,
Json,
Bytes,
IntArray,
LongArray,
FloatArray,
DoubleArray,
BooleanArray,
TimestampArray,
StringArray,
BytesArray,
}
````FromRow` is defined as:
```rust
/// FromRow represents any structure which can deserialize
/// the Table.rows json field provided a `Schema`
pub trait FromRow: Sized {
fn from_row(
data_schema: &Schema,
row: Vec,
) -> std::result::Result;
}
```In addition to being implemented by `DataRow`, `FromRow` is also implemented by all implementors
of `serde::de::Deserialize`, which is achieved by first deserializing the response to json and then
before each row is deserialized into final form, a json map of column name to value is substituted.
Additionally, there are a number of serde deserializer functions provided to deserialize complex pinot types:```
/// Converts Pinot timestamps into `Vec>` using `deserialize_timestamps_from_json()`.
fn deserialize_timestamps<'de, D>(deserializer: D) -> std::result::Result>, D::Error>.../// Converts Pinot timestamps into `DateTime` using `deserialize_timestamp_from_json()`.
pub fn deserialize_timestamp<'de, D>(deserializer: D) -> std::result::Result, D::Error>.../// Converts Pinot hex strings into `Vec>` using `deserialize_bytes_array_from_json()`.
pub fn deserialize_bytes_array<'de, D>(deserializer: D) -> std::result::Result>, D::Error>.../// Converts Pinot hex string into `Vec` using `deserialize_bytes_from_json()`.
pub fn deserialize_bytes<'de, D>(deserializer: D) -> std::result::Result, D::Error>.../// Deserializes json potentially packaged into a string by calling `deserialize_json_from_json()`.
pub fn deserialize_json<'de, D>(deserializer: D) -> std::result::Result
```For example usage, please refer to this [example](https://github.com/yougov/pinot-client-rust/blob/master/examples/sql-query-deserialize-to-struct.rs)
`DataRow` is defined as:
```rust
/// A row of `Data`
#[derive(Clone, Debug, PartialEq)]
pub struct DataRow {
row: Vec,
}
````Data` is defined as:
```rust
/// Typed Pinot data
#[derive(Clone, Debug, PartialEq)]
pub enum Data {
Int(i32),
Long(i64),
Float(f32),
Double(f64),
Boolean(bool),
Timestamp(DateTime),
String(String),
Json(Value),
Bytes(Vec),
IntArray(Vec),
LongArray(Vec),
FloatArray(Vec),
DoubleArray(Vec),
BooleanArray(Vec),
TimestampArray(Vec>),
StringArray(Vec),
BytesArray(Vec>),
Null(DataType),
}
```There are multiple functions defined for `Data`, like:
```
fn data_type(&self) -> DataType;
fn get_int(&self) -> Result;
fn get_long(&self) -> Result;
fn get_float(&self) -> Result;
fn get_double(&self) -> Result;
fn get_boolean(&self) -> Result;
fn get_timestamp(&self) -> Result>;
fn get_string(&self) -> Result<&str>;
fn get_json(&self) -> Result<&Value>;
fn get_bytes(&self) -> Result<&Vec>;
fn is_null(&self) -> bool;
```In addition to row count, `DataRow` also contains convenience counterparts to those above given a column index.