Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/apache/tez
Apache Tez
https://github.com/apache/tez
apache big-data hadoop java tez
Last synced: about 12 hours ago
JSON representation
Apache Tez
- Host: GitHub
- URL: https://github.com/apache/tez
- Owner: apache
- License: apache-2.0
- Created: 2013-04-08T07:20:23.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-11-23T14:24:45.000Z (20 days ago)
- Last Synced: 2024-12-05T21:03:05.925Z (8 days ago)
- Topics: apache, big-data, hadoop, java, tez
- Language: Java
- Homepage: https://tez.apache.org/
- Size: 29 MB
- Stars: 482
- Watchers: 34
- Forks: 424
- Open Issues: 71
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dataops - Apache Tez - A generic data-processing pipeline engine envisioned as a low-level engine. (Data Processing)
README
Apache Tez
==========Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions
such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.At its heart, tez is very simple and has just two components:
* The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to
perform arbitrary data-processing. Every 'task' in tez has the following:
- Input to consume key/value pairs from.
- Processor to process them.
- Output to collect the processed key/value pairs.* A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks'
described above into a task-DAG to process data as desired.
The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.