https://github.com/enricoschumann/tsdb

A terribly-simple data base for time series
https://github.com/enricoschumann/tsdb
csv r time-series tsdb
Last synced: 3 months ago
JSON representation
A terribly-simple data base for time series
Host: GitHub
URL: https://github.com/enricoschumann/tsdb
Owner: enricoschumann
Created: 2017-07-13T12:13:38.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2025-03-20T19:57:21.000Z (4 months ago)
Last Synced: 2025-03-29T17:51:09.484Z (4 months ago)
Topics: csv, r, time-series, tsdb
Language: R
Homepage: http://enricoschumann.net/R/packages/tsdb/
Size: 271 KB
Stars: 11
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.org
- Changelog: ChangeLog
Awesome Lists containing this project

README

        #+TITLE: tsdb: a terribly-simple database for time series

#+AUTHOR: Enrico Schumann

#+OPTIONS: toc:nil

#+BIND: org-latex-default-packages-alist nil

#+BIND: org-use-sub-superscripts {}

#+PROPERTY: tangle yes

#+PROPERTY: header-args :comments link

#+PROPERTY: header-args:R :session *R*

#+PROPERTY: header-args :eval never-export

# ------------------ LATEX ------------------

#+LATEX_CLASS: scrartcl

#+LATEX_CLASS_OPTIONS: [a4paper,fontsize=11pt]

#+LATEX_HEADER: \addtokomafont{disposition}{\rmfamily}

#+LATEX_HEADER: \addtokomafont{descriptionlabel}{\rmfamily}

#+LATEX_HEADER: \setlength{\parindent}{0em}

#+LATEX_HEADER: \setlength{\parskip}{2ex plus0.5ex minus0.5ex}

#+LATEX_HEADER: \newcommand{\pmwr}{\textsc{pm}w\textsc{r}}

#+LATEX_HEADER: \newcommand{\pl}{\textsc{pl}}

#+LATEX_HEADER: \newcommand{\R}{\textsf{R}}

#+LATEX_HEADER: \usepackage[backend=bibtex,citestyle=authoryear]{biblatex}

#+LATEX_HEADER: \addbibresource{Library.bib}

#+LATEX_HEADER: \usepackage[left=3cm,right=5cm,top=2cm,bottom=4cm,twoside]{geometry}

#+LATEX_HEADER: \usepackage[libertine]{newtxmath}

#+LATEX_HEADER: \usepackage{fontspec}

#+LATEX_HEADER: \setmainfont{Linux Libertine O}

#+LATEX_HEADER: \setmonofont[Scale=0.91]{inconsolata}

#+LATEX_HEADER: \usepackage{graphicx}

#+LATEX_HEADER: \usepackage[dvipsnames]{xcolor}

#+LATEX_HEADER: \definecolor{grey20}{gray}{0.20}

#+LATEX_HEADER: \definecolor{grey30}{gray}{0.30}

#+LATEX_HEADER: \definecolor{grey40}{gray}{0.40}

#+LATEX_HEADER: \definecolor{grey90}{gray}{0.90}

#+LATEX_HEADER: \definecolor{grey96}{gray}{0.96}

#+LATEX_HEADER: \usepackage{listings}

#+LATEX_HEADER: \lstset{language=R,basicstyle=\ttfamily,frame=single,commentstyle=\ttfamily\color{OliveGreen},

#+LATEX_HEADER:         numberstyle=\ttfamily\footnotesize\color{gray},stringstyle=\ttfamily\color{blue},

#+LATEX_HEADER:         backgroundcolor=\color{grey96},rulecolor=\color{grey90},showstringspaces=false,

#+LATEX_HEADER:         }

#+LATEX_HEADER: \lstnewenvironment{results}

#+LATEX_HEADER:   {\lstset{basicstyle=\ttfamily\color{grey30},backgroundcolor={},frame=single,numbers=none,showstringspaces=false,rulecolor=\color{grey96}}}{}

#+LATEX_HEADER: \usepackage{mdframed}

#+LATEX_HEADER: \newenvironment{FAQ}

#+LATEX_HEADER:  {\begin{mdframed}}{\end{mdframed}}

#+LATEX_HEADER: \newenvironment{FAA}

#+LATEX_HEADER:  {\begin{mdframed}}{\end{mdframed}}

#+LATEX_HEADER: \usepackage{makeidx}\makeindex

#+LATEX_HEADER: \usepackage[hidelinks]{hyperref}

# ------------------ HTML ------------------

#+HTML_HEAD: 

#+HTML_HEAD: 

#+HTML_HEAD:  html,body {

#+HTML_HEAD:    font-family: sans-serif;

#+HTML_HEAD:    padding: 0;

#+HTML_HEAD:    margin: 0;

#+HTML_HEAD:  }

#+HTML_HEAD:  body {

#+HTML_HEAD:      line-height: 1.45;

#+HTML_HEAD:  }

#+HTML_HEAD:  #content {

#+HTML_HEAD:    font-family: serif;

#+HTML_HEAD:    border: 1px solid #eeeeee;

#+HTML_HEAD:    border-radius: 3px;

#+HTML_HEAD:    color: #222222; width: 100%;

#+HTML_HEAD:    width: 700px;

#+HTML_HEAD:    padding-top: 2ex;

#+HTML_HEAD:    padding: 1em;

#+HTML_HEAD:    margin: 0.5em;

#+HTML_HEAD:    margin-left: auto;margin-right: auto;

#+HTML_HEAD:  }

#+HTML_HEAD:  @media (max-width: 700px) {

#+HTML_HEAD:    html,body,#content {

#+HTML_HEAD:      width: 95%;

#+HTML_HEAD:    }

#+HTML_HEAD:  }

#+HTML_HEAD:  .example {

#+HTML_HEAD:    border: 1px solid rgb(240,240,240);

#+HTML_HEAD:    padding: 4px;

#+HTML_HEAD:    color: rgb(110,110,110);

#+HTML_HEAD:    overflow: auto;

#+HTML_HEAD:  }

#+HTML_HEAD:  .src {

#+HTML_HEAD:    border: 1px solid rgb(240,240,240);

#+HTML_HEAD:    color: rgb(30,30,30);

#+HTML_HEAD:    background-color: rgb(230,230,230);

#+HTML_HEAD:    padding: 4px;

#+HTML_HEAD:    overflow: auto;

#+HTML_HEAD:  }

#+HTML_HEAD:  .src:hover {

#+HTML_HEAD:    background-color: rgb(240,240,240);

#+HTML_HEAD:    padding: 4px;

#+HTML_HEAD:  }

#+HTML_HEAD:  dt {

#+HTML_HEAD:    font-weight: bold;

#+HTML_HEAD:  }

#+HTML_HEAD:  li {

#+HTML_HEAD:    margin-bottom: 0.5ex;

#+HTML_HEAD:  }

#+HTML_HEAD:  code {

#+HTML_HEAD:    font-size: 115%;

#+HTML_HEAD:    color: rgb(60,60,60);

#+HTML_HEAD:  }

#+HTML_HEAD:  .org-right {

#+HTML_HEAD:    text-align: right;

#+HTML_HEAD:  }

#+HTML_HEAD:  nav ul {

#+HTML_HEAD:    list-style-type: none;

#+HTML_HEAD:  }

#+HTML_HEAD: 

#+BEGIN_SRC R :results none :exports none

  options(useFancyQuotes=FALSE)

#+END_SRC

* About tsdb

A terribly-simple database for numeric time series,

written purely in R, so no external database-software

is needed. Series are stored in plain-text files (the

most-portable and enduring file type) in CSV

format. Timestamps are encoded using R's native numeric

representation for 'Date'/'POSIXct', which makes them

fast to parse, but keeps them accessible with other

software. The package provides tools for saving and

updating series in this standardised format, for

retrieving and joining data, for summarising files and

directories, and for coercing series from and to other

data types (such as 'zoo' series).

** Good things about tsdb

- no setup needed, no system dependencies

  (i.e. external software, such as a database)

- completely portable; moving from one computer to

  another requires no effort other than copying the

  files (the only thing to take care of is file

  encoding if non-ASCII column names are used)

- data usable by other software

** When you need another database

- tsdb is potentially slow

- no multi-user support; no access-rights management

  (other than that provided by the OS)

- no network protocols

* Using tsdb

** Writing data

We first load the package.

#+BEGIN_SRC R :session *R* :results none :exports code

  library("tsdb")

#+END_SRC

Start by creating time-series data.

#+BEGIN_SRC R :session *R* :results output :exports both

  library("zoo")

  z <- zoo(1:5, as.Date("2016-1-1") + 0:4)

  z

#+END_SRC

#+RESULTS:

: 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05

:          1          2          3          4          5

To store these data, we need to enforce a consistent

format, which the functions =ts_table= and

=as.ts_table= do.

#+BEGIN_SRC R :session *R* :results output :exports both

ts <- as.ts_table(z, columns = "A")

ts

#+END_SRC

#+RESULTS:

: 5 rows [2016-01-01 -> 2016-01-05]: A

Note that we had to provide a column name (=A=) for the

data. This is not optional. It is one of the things

that =ts_table= enforces. Another is that timestamps

need to be of class =Date= or =POSIXct=.

To store the data to a file, use =write_ts_table=. The

function will take a directory and file name as

arguments, which mimics the hierarchy of databases and

tables in a classical database.

#+BEGIN_SRC R :session *R* :results none :exports code

  write_ts_table(ts, dir = "~/tsdb/daily", file = "example1")

#+END_SRC

The written file will look like this:

# +INCLUDE: ~/tsdb/daily/example1 example

#+BEGIN_EXAMPLE

"timestamp","A"

16801,1

16802,2

16803,3

16804,4

16805,5

#+END_EXAMPLE

You may notice that the dates have been replaced by

numbers. The mapping between these numbers and calendar

times is described later, when we discuss the

representation of timestamps. (But if you can't wait:

it is the number of days since 1 January 1970.)

Let us write a second file. This time, we use

=ts_table= directly.

#+BEGIN_SRC R :session *R* :results output :exports both

x <- array(1:20, dim = c(10, 2))

colnames(x) <- c("A", "B")

x

#+END_SRC

#+RESULTS:

#+begin_example

       A  B

 [1,]  1 11

 [2,]  2 12

 [3,]  3 13

 [4,]  4 14

 [5,]  5 15

 [6,]  6 16

 [7,]  7 17

 [8,]  8 18

 [9,]  9 19

[10,] 10 20

#+end_example

#+BEGIN_SRC R :session *R* :results output :exports both

  ts_table(x, timestamp = as.Date("2016-1-1") + 0:9)

#+END_SRC

#+RESULTS:

: 10 rows [2016-01-01 -> 2016-01-10]: A, B

We can also explicitly specify the column names, which

will override the column names of the data. In fact,

this is the preferred way, since it makes things more

explicit (which usually means safer).

#+BEGIN_SRC R :session *R* :results output :exports both

  ts <- ts_table(x, timestamp = as.Date("2016-1-1") + 0:9,

		 columns = c("B", "A"))

  ts

#+END_SRC

#+RESULTS:

: 10 rows [2016-01-01 -> 2016-01-10]: B, A

We write the data to a file =example2=.

#+BEGIN_SRC R :session *R* :results none :exports code

  write_ts_table(ts, dir = "~/tsdb/daily", file = "example2")

#+END_SRC

The written file looks like this:

# +INCLUDE: ~/tsdb/daily/example2 example

#+BEGIN_EXAMPLE

"timestamp","B","A"

16801,1,11

16802,2,12

16803,3,13

16804,4,14

16805,5,15

16806,6,16

16807,7,17

16808,8,18

16809,9,19

16810,10,20

#+END_EXAMPLE

** TODO Reading data

Use the function =read_ts_tables=.

#+name: read1

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A")

#+END_SRC

The default return value is a list with components

=data=, =timestamp=, =columns= and =file.path=.

#+RESULTS: read1

#+begin_example

$data

     A

[1,] 1

[2,] 2

[3,] 3

[4,] 4

[5,] 5

$timestamp

[1] "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-01-05"

$columns

[1] "A"

$file.path

[1] "~/tsdb/daily/example1::A"

#+end_example

More convenient may be to specify a =return.class=.

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A",

		 return.class = "zoo")

#+END_SRC

#+RESULTS:

:            ~/tsdb/daily/example1::A

: 2016-01-01                        1

: 2016-01-02                        2

: 2016-01-03                        3

: 2016-01-04                        4

: 2016-01-05                        5

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables("example1", dir = "~/tsdb/daily", columns = "A",

		 return.class = "data.frame")

#+END_SRC

#+RESULTS:

:    timestamp ~/tsdb/daily/example1::A

: 1 2016-01-01                        1

: 2 2016-01-02                        2

: 3 2016-01-03                        3

: 4 2016-01-04                        4

: 5 2016-01-05                        5

With =tsdb= before version 0.7, =read_ts_tables= would

per default only have returned values for non-weekend

days.  (=tsdb= was written with financial data in mind,

and on weekends there are no prices.) This behaviour is

controlled by argument =drop.weekends=, which defaults

to =FALSE=.

#+BEGIN_SRC R :session *R* :results output :exports both

weekdays(as.Date("2016-1-1")+0:4)

#+END_SRC

#+RESULTS:

: [1] "Friday"   "Saturday" "Sunday"   "Monday"   "Tuesday"

To obtain data for weekends as well, specify the

argument =drop.weekends=.

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables("example1", dir = "~/tsdb/daily",

		 columns = "A",

		 return.class = "data.frame",

		 drop.weekends = TRUE)

#+END_SRC

#+RESULTS:

:    timestamp ~/tsdb/daily/example1::A

: 1 2016-01-01                        1

: 2 2016-01-04                        4

: 3 2016-01-05                        5

You may have noticed a small difference in the names of

the functions for reading and writing. We always write

a single table, but we read tables.

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables(c("example1", "example2"),

		 dir = "~/tsdb/daily",

		 columns = "A",

		 return.class = "data.frame")

#+END_SRC

#+RESULTS:

#+begin_example

    timestamp ~/tsdb/daily/example1::A ~/tsdb/daily/example2::A

1  2016-01-01                        1                       11

2  2016-01-02                        2                       12

3  2016-01-03                        3                       13

4  2016-01-04                        4                       14

5  2016-01-05                        5                       15

6  2016-01-06                       NA                       16

7  2016-01-07                       NA                       17

8  2016-01-08                       NA                       18

9  2016-01-09                       NA                       19

10 2016-01-10                       NA                       20

#+end_example

The column names of the returned object consist of the

filepaths and the column, which may be more information

than we actually want. The argument =column.name=

specifies the format; its default is

=%dir%/%file%::%column%=.

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables(c("example1", "example2"),

		 dir = "~/tsdb/daily",

		 columns = "A",

		 return.class = "data.frame",

                 column.name = "%file%/%column%")

#+END_SRC

#+RESULTS:

#+begin_example

    timestamp example1/A example2/A

1  2016-01-01          1         11

2  2016-01-02          2         12

3  2016-01-03          3         13

4  2016-01-04          4         14

5  2016-01-05          5         15

6  2016-01-06         NA         16

7  2016-01-07         NA         17

8  2016-01-08         NA         18

9  2016-01-09         NA         19

10 2016-01-10         NA         20

#+end_example

Missing values are by default set to =NA=. That happens

even for missing columns, with a warning though.

#+BEGIN_SRC R :session *R* :results output :exports both

  read_ts_tables(c("example1", "example2"),

		 dir = "~/tsdb/daily",

		 columns = c("A", "B"),

		 return.class = "data.frame",

                 column.name = "%file%/%column%")

#+END_SRC

#+RESULTS:

#+begin_example

    timestamp example1/A example1/B example2/A example2/B

1  2016-01-01          1         NA         11          1

2  2016-01-02          2         NA         12          2

3  2016-01-03          3         NA         13          3

4  2016-01-04          4         NA         14          4

5  2016-01-05          5         NA         15          5

6  2016-01-06         NA         NA         16          6

7  2016-01-07         NA         NA         17          7

8  2016-01-08         NA         NA         18          8

9  2016-01-09         NA         NA         19          9

10 2016-01-10         NA         NA         20         10

Warning message:

In read_ts_tables(c("example1", "example2"), dir = "~/tsdb/daily",  :

  columns missing

#+end_example

* How tsdb works

** ts_tables

tsdb works with /time-series tables/ (objects of

class =ts_table=). A =ts_table= is a numeric matrix,

so there is always a =dim= attribute. For a

time-series table =x=, you get the number of

observations with =dim(x)[1L]=.

Attached to this matrix are several attributes:

- timestamp :: a vector: the numeric representation of

               the timestamp

- t.type :: character: the class of the original

            timestamp, either =Date= or =POSIXct=

- columns :: a character vector that provides the

             columns names

There may be other attributes as well, but these three

are always present.

A =ts_table= is not meant as a time-series class. For

most computations (plotting, calculation of statistics,

etc), the =ts_table= must first be coerced to =zoo=,

=xts=, a data-frame or a similar data

structure. Methods that perform such coercions are

responsible for converting the numeric timestamp vector

to an actual timestamp. For this, they may use the

function =ttime=, whose pronounciation may remind you

of a hot beverage, but whose name really stands for

=translate time=.

** The file format

  :PROPERTIES:

  :CUSTOM_ID: file-format

  :END:

=tsdb= can store and load time-series data. The format

it uses is plain CSV. A sample file may look as

follows:

#+BEGIN_EXAMPLE

  "timestamp","close"

  17131,11

  17132,12

  17133,13

  17134,14

  17135,15

#+END_EXAMPLE

Thus, the file has a header line that gives the

names of the columns, with the first column always

being named =timestamp=.

The advantage of this plain format is that the data

are in no way dependent on =tsdb=. The files can be

used and manipulated by other software as well.

** Timestamps

  :PROPERTIES:

  :CUSTOM_ID: timestamps

  :END:

  Two types of timestamps are supported: =Date= and

  =POSXIct=. As part of a =ts_table=, timestamps are

  always stored in their numeric representation: daily

  timestamps are represented as the number of days

  since 1 Jan 1970; intraday timestamps are the number

  of seconds since 1 Jan 1970.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/enricoschumann/tsdb

Awesome Lists containing this project

README