https://github.com/ijlyttle/azuredatalakestore
REST API Client for Microsoft Azure Data Lake Store
https://github.com/ijlyttle/azuredatalakestore
Last synced: about 2 months ago
JSON representation
REST API Client for Microsoft Azure Data Lake Store
- Host: GitHub
- URL: https://github.com/ijlyttle/azuredatalakestore
- Owner: ijlyttle
- License: other
- Created: 2017-04-29T21:28:36.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2018-03-03T22:19:26.000Z (over 8 years ago)
- Last Synced: 2026-03-31T16:48:10.570Z (3 months ago)
- Language: R
- Size: 63.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[](https://cran.r-project.org/package=AzureDatalakeStore)
[](https://travis-ci.org/ijlyttle/AzureDatalakeStore)
[](https://codecov.io/github/ijlyttle/AzureDatalakeStore?branch=master)
# AzureDatalakeStore
## Overview
This package is used to access the datalake store itself. Looking at the [reference](https://docs.microsoft.com/en-us/rest/api/datalakestore/webhdfs-filesystem-apis), we note that this is using WebHDFS.
Just to note that the goal of this package is to get things working for Azure Datalake Store. If we end up being able to support WebHDFS, that will be a bonus.
WebHDFS seems to use a lot of redirects. However, this [Microsoft blog](https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/03/14/using-r-to-perform-filesystem-operations-on-azure-data-lake-store/) suggests that we may not have to do that. If so, I don't know if this would be a "feature" of Azure or a "feature" of **httr**, or maybe **curl**.
## What we will need
* `adls` S3 object to hold the base-url and the token **done**
* Make a directory **done**
* List a directory **done**
* Rename a file/folder in a directory **done**
* Delete a file/directory **done**
* Upload a file to a directory **done**
* Read a file from a directory **done**
* Append to a file in a directory **done**
* Get file status (file/directory) (consider sending back as data-frame and reusing format from list_status)
* Get contents summary (directory)
* Concatenate files
For the file uploads, it may be easiest to provide a filepath (even a temporary one). This makes us dependent on a filesystem, whereas I would rather not be (shinyapps.io, for example).
Following **readr**, file can be:
- form_file (`upload_file()`)
Near future:
- single string
- no newlines, interpret as path, guess type
- one-or-more newlines, interpret as text
- raw vector, literal data
Farther future, handle connections and files that begin with `http://`, etc.
How about `adls_is_empty()` to see if a path returns anything?
Type-stability for functions that return file-status objects. Thinking of `list_status` and `get_file_status`. Return empty data-frames or NULL? NULL.
## Installation
__AzureDatalakeStore__ is not yet available on CRAN. You may install from GitHub:
```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ijlyttle/AzureDatalakeStore")
```
If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/ijlyttle/AzureDatalakeStore/issues).
## Acknowlegements
This package draws inspiration from [Forest Fang's **rwebhdfs** package](https://github.com/saurfang/rwebhdfs).
## Code of conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.