{"id":18347008,"url":"https://github.com/k-tomaszewski/eternal-db","last_synced_at":"2025-04-13T12:51:42.654Z","repository":{"id":234950424,"uuid":"789791618","full_name":"k-tomaszewski/eternal-db","owner":"k-tomaszewski","description":"Eternal DB is a time series database with a data retention policy based on disk space, thus allowing to collect data eternally.","archived":false,"fork":false,"pushed_at":"2025-02-18T00:04:08.000Z","size":135,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T12:51:38.725Z","etag":null,"topics":["database","document-database","embedded-database","java","java-21","json","linux","nosql","time-series","tsdb"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/k-tomaszewski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-21T15:11:04.000Z","updated_at":"2025-04-01T12:57:13.000Z","dependencies_parsed_at":"2024-05-18T12:23:18.707Z","dependency_job_id":"d22e76f2-9cff-4ddb-97b9-1a0e1386f507","html_url":"https://github.com/k-tomaszewski/eternal-db","commit_stats":null,"previous_names":["k-tomaszewski/eternal-db"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-tomaszewski%2Feternal-db","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-tomaszewski%2Feternal-db/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-tomaszewski%2Feternal-db/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-tomaszewski%2Feternal-db/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/k-tomaszewski","download_url":"https://codeload.github.com/k-tomaszewski/eternal-db/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717258,"owners_count":21150388,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","document-database","embedded-database","java","java-21","json","linux","nosql","time-series","tsdb"],"created_at":"2024-11-05T21:12:55.827Z","updated_at":"2025-04-13T12:51:42.648Z","avatar_url":"https://github.com/k-tomaszewski.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# eternal-db\n[![Unit Tests](https://github.com/k-tomaszewski/eternal-db/actions/workflows/maven.yml/badge.svg)](https://github.com/k-tomaszewski/eternal-db/actions/workflows/maven.yml)\n\n## Overview\nEternal DB is a Java embedded time series database/data storage engine with a data retention policy based on a disk space. \nAs oldest records are removed when needed to reclaim disk space, this database allows to collect data eternally. It gives unmatched control\non a disk resource usage. Moreover, data are easily accessible for any tools as they are kept in regular disk files on a given file system.\nIt's a schema-less document database as records are just JSON documents.\n\nSome possible use cases:\n- collecting measurement data\n- collecting metrics\n- collecting market data\n\nIn above cases the most important thing for a running system is to keep some time window of latest data. Eternal DB allows to express this \ntime window in terms of a disk space.\n\n**Please note, that this is not a classic database system like RDBMS.**\nIt doesn't support SQL, users, transactions, constraints, relations, triggers, stored procedures, etc.\nMaybe the best term is a \"data storage engine\".\n\nAs this is designed to be used under Linux, it requires Coreutils https://www.gnu.org/software/coreutils/ or a compatible software module\nthat supports running the command `du -sk`. Most likely it can work with BusyBox as well: https://www.busybox.net/downloads/BusyBox.html#du\n(not verified yet).\n\nNOTE: This project is a work-in-progress (WIP).\n\n## Project goals\n1. It is an embedded database (data store) that is easy to add to a program written in Java.\n2. It has a disk space based data retention policy. This is meant to keep as much data as possible within a defined disk space limit.\n3. When reclaiming disk space, oldest data records are removed.\n4. Records are kept in a text format in files on a given filesystem and these files are easily readable by other programs.\n5. Records are considered documents, like in document databases. No fixed data structure is forced.\n6. A record always has a timestamp and time is considered an indexed attribute.\n7. It is meant to run in a Linux environment.\n8. It is thread-safe.\n9. It is fault-tolerant. A single error should not block you from reading all your data. As files are written only by appending a corrupted\n   file may only be a result of not closing a database object before JVM exit. In this case some records may be lost, but a single data file\n   may have only the last record corrupted. A database can continue using this file when started again.\n\n## Non-goals\n1. This is not meant to run as a standalone server enabling multiple clients to connect and use.\n2. Ability to configure other data retention policy is not required.\n3. Support for any other operating system is not required if such an operating system cannot provide Linux-compatible runtime environment.\n4. Strict schema validation for data is not a goal.\n5. Strict transactions support is not a goal.\n6. Data access control is not a goal. A user is responsible to configure desired access rights to a directory defined for data files.\n7. Valid use of a single database storage from many database instances is not a goal.\n8. Support for efficient searching of records based on any other criteria than time period is not a goal.\n\n## Design decisions\n1. Project is developed in Java 21.\n2. Records are kept formatted as single line JSON, thus a single data file has a format similar to JSON Lines: https://jsonlines.org/\n3. Each line in a data file is beginning with timestamp being number of millis from the Epoch (as given by `System.currentTimeMillis()`)\n   and formatted with radix of 32 (to use less disk space), followed by the tab character, followed by JSON.\n4. Unix line endings are used in data files, as it is only 1 byte and it makes it easier to detect corrupted file.\n5. Data are persisted as a disk files in directory tree that is representing periods of time.\n6. Project is avoiding unnecessary dependencies. No Vavr, Project Reactor, Guava, etc.\n7. The database should be easy to use in Spring Boot based application.\n\n## Usage\n### Dependency\nAdd dependency to eternal-db library (use the latest version):\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eio.github.k_tomaszewski\u003c/groupId\u003e\n  \u003cartifactId\u003eeternal-db\u003c/artifactId\u003e\n  \u003cversion\u003e1.0.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\nAs of now the library is published only in GiHub Packages repository, so you need to add additional Maven repository in your `pom.xml` like this:\n```xml\n\u003crepositories\u003e\n    \u003crepository\u003e\n        \u003cid\u003egithub_k-tomaszewski_eternal-db\u003c/id\u003e\n        \u003curl\u003ehttps://maven.pkg.github.com/k-tomaszewski/eternal-db\u003c/url\u003e\n    \u003c/repository\u003e\n\u003c/repositories\u003e\n```\nIt seems that GitHub Packages repository requires authentication even for public\nartifacts, so you need to have a GitHub account and you need to set up your credentials\n(GitHub username and access token) for Maven in the `settings.xml` file. Ref:\n- https://maven.apache.org/guides/mini/guide-multiple-repositories.html\n- https://maven.apache.org/guides/mini/guide-deployment-security-settings.html\n- https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-apache-maven-registry#authenticating-with-a-personal-access-token\n\n### Opening a database\nTo use it you create a single object of class `io.github.k_tomaszewski.eternaldb.Database`.\nSuch object gives access to operations on a single data store.\n\nPlease note that the `Database` is a generic class and has a type parameter. This is used for type\nchecking only, so a compiler can detect when a wrong object is passed for writing method.\n\nTo create instance of the `Database` class you need to provide a `DatabaseProperties` object, which contains following \nconfiguration properties:\n- path to a directory where data files are going to be stored\n- limit of disk space to use, given as integer number of megabytes (MB)\n- object being strategy for serialization and deserialization, the library provides `JacksonSerialization` instance as a default\n- object being strategy for naming data files, the library provides `BasicFileNaming` instance as a default\n- timestamp supplier - object implementing `ToLongFunction\u003cT\u003e` that returns timestamp, possibly based on a record being written. This is optional.\n\nExample:\n```java\nDatabase\u003cMyRecord\u003e db = new Database\u003c\u003e(new DatabaseProperties\u003c\u003e(Path.of(\"/home/db\"), 100));\n```\nYou need one such object for a single data storage directory. Instances of `Database` class are\nthread-safe. You must not use many instances of `Database` class that are configured to use\nthe same directory or subdirectories of each other.\n\n### Writing data\nA basic method for writing data: `void write(T record, long recordMillis)`. To use it you\nneed to give record timestamp when writing as this is a time series database. Example:\n```java\ndb.write(myRecord, System.currentTimeMillis());\n```\n\nIf a database is configured with a timestamp supplier (an object implementing `ToLongFunction\u003cT\u003e` interface),\nthen another data writting method can be used: `void write(T record)`. Calling `write(x)` is equivalent\nof calling `write(x, timestampSupplier.applyAsLong(x))`. This can be useful when your domain model\nalready contains a timestamp attribute.\n\n### Reading data\nThere is just one method for reading data: `Stream\u003cTimestamped\u003cU\u003e\u003e read(Class\u003cU\u003e type, Long minMillis, Long maxMillis)`.\nThis is designed to read a set of records with timestamps matching a given range [minMillis, maxMillis].\nBoth ends of a time range are optional. Example of reading all records with timestamps starting\nwith given time:\n```java\nList\u003cMyRecord\u003e entities = db.read(MyRecord.class, fromMillis, null)\n        .map(Timestamped::record).toList();\n```\nHere the given type (`MyRecord.class` in the example above) is used for deserialization purpose.\n\n### Closing a database\nAfter all interactions with the database are done, for example at the end of your program, \nyou should close the database object:\n```java\ndb.close();\n```\nYou don't need to close a database before the end of your program. It's a lightweight object\nand has rather small memory usage.\n\n### Customization of JSON serialization/deserialization\nThe library provides a default class for serlialization/deserialization strategy: `io.github.k_tomaszewski.eternaldb.JacksonSerialization`.\nIt uses its own instance of Jackson ObjectMapper (precisely: JsonMapper). One can add customization by\ncreating this object and passing `Consumer\u003cObjectMapper\u003e` to the constructor. Example:\n```java\nConsumer\u003cObjectMapper\u003e customizer = objectMapper -\u003e objectMapper.registerModule(new JavaTimeModule())\n        .disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)\n        .enable(SerializationFeature.WRITE_DATES_WITH_ZONE_ID)\n        .disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE);\n\nDatabase\u003cMyRecord\u003e db = new Database\u003c\u003e(new DatabaseProperties\u003c\u003e(Path.of(\"/home/db\"), 100)\n        .setSerialization(new JacksonSerialization(customizer)));\n```\n\n## Contributions\nContributions are welcome. If you want to contribute, just make a pull request. Please contact me before to discuss your idea:\nkrzysztof.tomaszewski (at) gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-tomaszewski%2Feternal-db","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fk-tomaszewski%2Feternal-db","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-tomaszewski%2Feternal-db/lists"}