https://github.com/vanatteveldt/datasync
R functions to rsync and access project data directories
https://github.com/vanatteveldt/datasync
Last synced: about 2 months ago
JSON representation
R functions to rsync and access project data directories
- Host: GitHub
- URL: https://github.com/vanatteveldt/datasync
- Owner: vanatteveldt
- License: mit
- Created: 2014-10-23T18:28:39.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-10-24T23:19:27.000Z (over 10 years ago)
- Last Synced: 2025-01-04T20:47:00.836Z (3 months ago)
- Language: R
- Size: 172 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - vanatteveldt/datasync - R functions to rsync and access project data directories (R)
README
Loading and rsyncing a Data folder
---This package assumes that you have a separate folder containing project data.
It provides a function `rsync` to synchronize that folder with a remote folder,
and a function `data.load` to load data from the project folder.The data folder is assumed to be in an environment variable `R_DATA_ROOT`, defaulting to `~/data`
Installing
---The package can be directly installed using `devtools`:
```{r message=F}
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/datasync")
library(datasync)
```Note: If `devtools` is not available, you can simple source the [data.r](R/data.r) file.
Loading and saving data
----To demonstrate loading and saving data, let's setup a new folder in the temporary folder as data.folder (Normally, you would not call this function but either use the default location (~/data) or set the `R_DATA_ROOT` variable):
```{r}
set.data.folder(tempfile())
```Now, we can save and load files to that folder, which will be created recursively:
```{r}
foo = "bar"
save.data(foo, file = "myproject/test.rdata")
rm("foo")
load.data("myproject/test.rdata")
foo
```Obviously, there is not a lot of magic going on here.
The use of these functions is to be able to set a data folder as an environment variable,
so the project source code does not contain the reference to the local file systemOne other convenience feature is that you can pass an environment to `load.data`,
and this environment is (invisibly) returned from the call.
This makes it easy to load files into a 'temporary' environment so variables are not overwritten:```{r}
e = load.data("myproject/test.rdata", envir=new.env())
ls(e)
e$foo
```Synchronizing data
----The `rsync` function calls the rsync command as a system call to synchronize the data folder with a remote folder.
To demonstrate this function, I will use `localhost` as the "remote" host and synchronize with a second temporary folder:
```{r}
remote = tempfile()
dir.create(file.path(remote, "myproject"), recursive=T)
save(foo, file=file.path(remote, "myproject/test2.rdata"))
list.files(file.path(remote, "myproject"))
list.files(file.path(get.data.folder(), "myproject"))
```As you can see, the 'local' folder contains `test.rdata`, while the remote folder contains `test2.rdata`.
To run the actual synchronization, call `rsync` with the remote host and remote folder, if it is different from the local folder:```{r}
rsync(remote_host = "localhost", remote_folder = remote)
```Let's inspect the folders:
```{r}
list.files(file.path(remote, "myproject"))
list.files(file.path(get.data.folder(), "myproject"))
```If we call rsync again, only new or changed files will synchronized.
If a file is edited in both locations, the newer file will be kept.
If a file is deleted in either location, it will be 'restored' from the other location.