https://github.com/apache/accumulo-examples
Apache Accumulo Examples
https://github.com/apache/accumulo-examples
accumulo big-data hacktoberfest
Last synced: 4 months ago
JSON representation
Apache Accumulo Examples
- Host: GitHub
- URL: https://github.com/apache/accumulo-examples
- Owner: apache
- License: apache-2.0
- Created: 2016-12-06T08:00:06.000Z (about 9 years ago)
- Default Branch: main
- Last Pushed: 2024-11-25T20:29:33.000Z (about 1 year ago)
- Last Synced: 2025-09-04T00:46:58.489Z (4 months ago)
- Topics: accumulo, big-data, hacktoberfest
- Language: Java
- Homepage: https://accumulo.apache.org
- Size: 543 KB
- Stars: 35
- Watchers: 19
- Forks: 37
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Apache Accumulo Examples
[![Build Status][ti]][tl]
## Introduction
The Accumulo-Examples repository contains a collection of examples for Accumulo versions 2.0 and
greater. Examples within the `main` branch are designed to work with the version currently
under development. Additional branches exist for previous releases of the Accumulo 2.x line. For
example, the `2.0` branch contains examples specifically intended to work with that release version.
The [Accumulo Tour] also provides several simple introductory examples that may be of interest.
A collection of examples for Accumulo 1.10 can be found [here].
## Setup instructions
Follow the steps below to run the Accumulo examples:
1. Clone this repository
git clone https://github.com/apache/accumulo-examples.git
2. Follow [Accumulo's quickstart][quickstart] to install and run an Accumulo instance.
Accumulo has an [accumulo-client.properties] in `conf/` that must be configured as
the examples will use this file to connect to your instance.
3. Review [env.sh.example] and [accumulo-env.sh] (within your accumulo installation) to see if you
need to customize them. If `ACCUMULO_HOME` & `HADOOP_HOME` are set in your shell, you may be
able skip this step. Make sure `ACCUMULO_CLIENT_PROPS` is set to the location of your
[accumulo-client.properties].
cp conf/env.sh.example conf/env.sh
vim conf/env.sh
4. Build the examples repo and copy the examples jar to Accumulo's `lib/` directory to get on its
class path:
./bin/build
cp target/accumulo-examples.jar /path/to/accumulo/lib/
5. Each Accumulo example has its own documentation and instructions for running the example which
are linked to below.
When running the examples, remember the tips below:
* Examples are run using the `runex` or `runmr` commands which are located in the `bin/` directory
of this repo. The `runex` command is a simple script that use the examples shaded jar to run a
a class. The `runmr` starts a MapReduce job in YARN.
* Commands intended to be run in bash are prefixed by '$' and should be run from the root of this
repository.
* Several examples use the `accumulo` and `accumulo-util` commands which are expected to be on your
`PATH`. These commands are found in the `bin/` directory of your Accumulo installation.
* Commands intended to be run in the Accumulo shell are prefixed by '>'.
## Available Examples
Each example below highlights a feature of Apache Accumulo.
| Example | Description |
|---------|-------------|
| [batch] | Using the batch writer and batch scanner |
| [bloom] | Creating a bloom filter enabled table to increase query performance |
| [bulkIngest] | Ingesting bulk data using map/reduce jobs on Hadoop |
| [classpath] | Using per-table classpaths |
| [client] | Using table operations, reading and writing data in Java. |
| [combiner] | Using example StatsCombiner to find min, max, sum, and count. |
| [compactionStrategy] | Configuring a compaction strategy |
| [constraints] | Using constraints with tables. Limit the mutation size to avoid running out of memory |
| [deleteKeyValuePair] | Deleting a key/value pair and verifying the deletion in RFile. |
| [dirlist] | Storing filesystem information. |
| [export] | Exporting and importing tables. |
| [filedata] | Storing file data. |
| [filter] | Using the AgeOffFilter to remove records more than 30 seconds old. |
| [helloworld] | Inserting records both inside map/reduce jobs and outside. And reading records between two rows. |
| [isolation] | Using the isolated scanner to ensure partial changes are not seen. |
| [regex] | Using MapReduce and Accumulo to find data using regular expressions. |
| [reservations] | Using conditional mutations to implement simple reservation system. |
| [rgbalancer] | Using a balancer to spread groups of tablets within a table evenly |
| [rowhash] | Using MapReduce to read a table and write to a new column in the same table. |
| [sample] | Building and using sample data in Accumulo. |
| [shard] | Using the intersecting iterator with a term index partitioned by document. |
| [spark] | Using Accumulo as input and output for Apache Spark jobs |
| [tabletofile] | Using MapReduce to read a table and write one of its columns to a file in HDFS. |
| [terasort] | Generating random data and sorting it using Accumulo. |
| [tracing] | Generating trace data in a client application and Accumulo. |
| [uniquecols] | Use MapReduce to count unique columns in Accumulo |
| [visibility] | Using visibilities (or combinations of authorizations). Also shows user permissions. |
| [wordcount] | Use MapReduce and Accumulo to do a word count on text files |
## Release Testing
This repository can be used to test Accumulo release candidates. See
[docs/release-testing.md](docs/release-testing.md).
[quickstart]: https://accumulo.apache.org/docs/2.x/getting-started/quickstart
[accumulo-client.properties]: https://accumulo.apache.org/docs/2.x/configuration/files#accumulo-clientproperties
[accumulo-env.sh]: https://accumulo.apache.org/docs/2.x/configuration/files#accumulo-envsh
[env.sh.example]: conf/env.sh.example
[manual]: https://accumulo.apache.org/latest/accumulo_user_manual/
[batch]: docs/batch.md
[bloom]: docs/bloom.md
[bulkIngest]: docs/bulkIngest.md
[classpath]: docs/classpath.md
[client]: docs/client.md
[combiner]: docs/combiner.md
[compactionStrategy]: docs/compactionStrategy.md
[constraints]: docs/constraints.md
[deleteKeyValuePair]: docs/deleteKeyValuePair.md
[dirlist]: docs/dirlist.md
[export]: docs/export.md
[filedata]: docs/filedata.md
[filter]: docs/filter.md
[helloworld]: docs/helloworld.md
[isolation]: docs/isolation.md
[maxmutation]: docs/maxmutation.md
[regex]: docs/regex.md
[reservations]: docs/reservations.md
[rgbalancer]: docs/rgbalancer.md
[rowhash]: docs/rowhash.md
[sample]: docs/sample.md
[shard]: docs/shard.md
[spark]: spark/README.md
[tabletofile]: docs/tabletofile.md
[terasort]: docs/terasort.md
[tracing]: docs/tracing.md
[uniquecols]: docs/uniquecols.md
[visibility]: docs/visibility.md
[wordcount]: docs/wordcount.md
[ti]: https://github.com/apache/accumulo-examples/workflows/QA/badge.svg
[tl]: https://github.com/apache/accumulo-examples/actions
[here]: https://accumulo.apache.org/1.10/examples
[Accumulo Tour]: https://accumulo.apache.org/tour/