https://github.com/apache/tez
Apache Tez
https://github.com/apache/tez
apache big-data hadoop java tez
Last synced: 2 days ago
JSON representation
Apache Tez
- Host: GitHub
- URL: https://github.com/apache/tez
- Owner: apache
- License: apache-2.0
- Created: 2013-04-08T07:20:23.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2025-04-02T08:13:58.000Z (13 days ago)
- Last Synced: 2025-04-06T14:05:07.308Z (9 days ago)
- Topics: apache, big-data, hadoop, java, tez
- Language: Java
- Homepage: https://tez.apache.org/
- Size: 29.2 MB
- Stars: 491
- Watchers: 33
- Forks: 430
- Open Issues: 65
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dataops - Apache Tez - A generic data-processing pipeline engine envisioned as a low-level engine. (Data Processing)
README
Apache Tez
==========Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions
such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.At its heart, tez is very simple and has just two components:
* The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to
perform arbitrary data-processing. Every 'task' in tez has the following:
- Input to consume key/value pairs from.
- Processor to process them.
- Output to collect the processed key/value pairs.* A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks'
described above into a task-DAG to process data as desired.
The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.