Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hrbrmstr/sergeant-caffeinated

:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'
https://github.com/hrbrmstr/sergeant-caffeinated

dplyr drill jdbc parquet-files r rstats sql

Last synced: 3 months ago
JSON representation

:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'

Awesome Lists containing this project

README

        

---
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---
```{r pkg-knitr-opts, include=FALSE}
hrbrpkghelpr::global_opts()
```

```{r badges, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::stinking_badges()
```

```{r description, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::yank_title_and_description()
```

## What's Inside The Tin

The following functions are implemented:

```{r ingredients, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::describe_ingredients()
```

## Installation

```{r install-ex, results='asis', eval=TRUE, echo=TRUE, cache=FALSE}
remotes::install_github("hrbrmstr/sergeant-caffeinated")
```

## Usage

```{r lib-ex}
library(sergeant.caffeinated)

# current version
packageVersion("sergeant.caffeinated")

```

```{r dplyr-01, message=FALSE}
library(tidyverse)

# use localhost if running standalone on same system otherwise the host or IP of your Drill server
test_host <- Sys.getenv("DRILL_TEST_HOST", "localhost")

be_quiet()

con <- dbConnect(drv = DrillJDBC(), sprintf("jdbc:drill:zk=%s", test_host))

db <- tbl(con, "cp.`employee.json`")

# without `collect()`:
db %>%
count(
gender,
marital_status
)

db %>%
count(
gender,
marital_status
) %>%
collect()

db %>%
group_by(position_title) %>%
count(gender) -> tmp2

group_by(db, position_title) %>%
count(gender) %>%
ungroup() %>%
mutate(
full_desc = ifelse(gender=="F", "Female", "Male")
) %>%
collect() %>%
select(
Title = position_title,
Gender = full_desc,
Count = n
)

arrange(db, desc(employee_id)) %>% print(n=20)

db %>%
mutate(
position_title = tolower(position_title),
salary = as.numeric(salary),
gender = ifelse(gender == "F", "Female", "Male"),
marital_status = ifelse(marital_status == "S", "Single", "Married")
) %>%
group_by(supervisor_id) %>%
summarise(
underlings_count = n()
) %>%
collect()
```

## sergeant Metrics

```{r echo=FALSE}
cloc::cloc_pkg_md()
```

## Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.