https://github.com/nathanmarz/elephantdb
Distributed database specialized in exporting key/value data from Hadoop
https://github.com/nathanmarz/elephantdb
Last synced: 7 months ago
JSON representation
Distributed database specialized in exporting key/value data from Hadoop
- Host: GitHub
- URL: https://github.com/nathanmarz/elephantdb
- Owner: nathanmarz
- License: bsd-3-clause
- Created: 2011-02-16T01:54:18.000Z (over 14 years ago)
- Default Branch: develop
- Last Pushed: 2014-06-27T19:41:17.000Z (over 11 years ago)
- Last Synced: 2025-04-04T00:07:48.268Z (8 months ago)
- Language: Java
- Homepage:
- Size: 2.9 MB
- Stars: 558
- Watchers: 40
- Forks: 51
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- useful-java-links - Elephantdb - clause_license_.28.22Revised_BSD_License.22.2C_.22New_BSD_License.22.2C_or_.22Modified_BSD_License.22.29), [](https://github.com/nathanmarz/elephantdb).  (II. Databases, search engines, big data and machine learning / 1. Databases and storages)
- awesome-bigdata - ElephantDB - Distributed database specialized in exporting data from Hadoop. (Key-value Data Model)
- fucking-awesome-bigdata - ElephantDB - Distributed database specialized in exporting data from Hadoop. (Key-value Data Model)
- awesome-bigdata - ElephantDB - Distributed database specialized in exporting data from Hadoop. (Key-value Data Model)
- A-curated-list-of-awesome-big-data-frameworks-ressources-and-other-awesomeness.- - ElephantDB - Distributed database specialized in exporting data from Hadoop. (Key-value Data Model)
- data-engineering-collection - ElephantDB - Distributed database specialized in exporting data from Hadoop. (Key-value Data Model)
- awesome-java - ElephantDB
README
[](https://travis-ci.org/nathanmarz/elephantdb)
# ElephantDB 0.5.1 (cascalog 2.x)
## ElephantDB 0.4.5 (cascalog 1.x)
# About
ElephantDB is a database that specializes in exporting key/value data
from Hadoop. ElephantDB is composed of two components. The first is a
library that is used in MapReduce jobs for creating an indexed
key/value dataset that is stored on a distributed filesystem. The
second component is a daemon that can download a subset of a dataset
and serve it in a read-only, random-access fashion. A group of
machines working together to serve a full dataset is called a ring.
Since ElephantDB server doesn't support random writes, it is almost
laughingly simple. Once the server loads up its subset of the data, it
does very little. This leads to ElephantDB being rock-solid in
production, since there's almost no moving parts.
ElephantDB server has a Thrift interface, so any language can make
reads from it. The database itself is implemented in Clojure.
An ElephantDB datastore contains a fixed number of shards of a "Local
Persistence". ElephantDB's local persistence engine is pluggable, and
ElephantDB comes bundled with local persistence implementations for
Berkeley DB Java Edition and LevelDB. On the MapReduce side, each
reducer creates or updates a single shard into the DFS, and on the
server side, each server serves a subset of the shards.
ElephantDB support hot-swapping so that a live server can be updated
with a new set of shards without downtime.
# Questions
Google group: [elephantdb-user](http://groups.google.com/group/elephantdb-user)
# Introduction
[Introduction to ElephantDB](https://speakerdeck.com/sorenmacbeth/introduction-to-elephantdb)
# Tutorials
TODO: Write an updated tutorial for ElephantDB 0.4.x
# Using ElephantDB in MapReduce Jobs
ElephantDB is hosted at [Clojars](http://clojars.org/elephantdb).
Clojars is a maven repo that is trivially easy to use with maven or
leiningen. You should use this dependency when using ElephantDB within
your MapReduce jobs to create ElephantDB datastores. ElephantDB
contains a module elephantdb-cascading which allows you to easily create
datastores from your Cascading workflows. elephantdb-cascalog is available
for use with [Cascalog](http://github.com/nathanmarz/cascalog) >= 1.10.1.
# Deploying ElephantDB server
TODO: Documentation on how to deploy ElephantDB.
# Running the EDB Jar
TODO: Documentation on how to run ElephantDB