Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LinkedInAttic/datafu
Hadoop library for large-scale data processing, now an Apache Incubator project
https://github.com/LinkedInAttic/datafu
Last synced: 3 months ago
JSON representation
Hadoop library for large-scale data processing, now an Apache Incubator project
- Host: GitHub
- URL: https://github.com/LinkedInAttic/datafu
- Owner: LinkedInAttic
- Created: 2011-09-16T22:32:31.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2014-07-08T17:00:26.000Z (over 10 years ago)
- Last Synced: 2024-05-19T19:10:24.505Z (6 months ago)
- Language: Java
- Homepage: http://datafu.incubator.apache.org/
- Size: 31.4 MB
- Stars: 585
- Watchers: 75
- Forks: 137
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: changes.md
Awesome Lists containing this project
README
# Apache DataFu
[Apache DataFu](http://datafu.incubator.apache.org) is a collection of libraries for working with large-scale data in Hadoop.
The project was inspired by the need for stable, well-tested libraries for data mining and statistics.It consists of two libraries:
* **Apache DataFu Pig**: a collection of user-defined functions for [Apache Pig](http://pig.apache.org/)
* **Apache DataFu Hourglass**: an incremental processing framework for [Apache Hadoop](http://hadoop.apache.org/) in MapReduceDataFu is currently undergoing incubation with Apache. A mirror of the official git repository can be found on GitHub at [https://github.com/apache/incubator-datafu](https://github.com/apache/incubator-datafu).
For more information please visit the website:
* [http://datafu.incubator.apache.org/](http://datafu.incubator.apache.org/)
If you'd like to jump in and get started, check out the corresponding guides for each library:
* [Apache DataFu Pig - Getting Started](http://datafu.incubator.apache.org/docs/datafu/getting-started.html)
* [Apache DataFu Hourglass - Getting Started](http://datafu.incubator.apache.org/docs/hourglass/getting-started.html)## Blog Posts
* [Introducing DataFu](http://datafu.incubator.apache.org/blog/2012/01/10/introducing-datafu.html)
* [DataFu: The WD-40 of Big Data](http://datafu.incubator.apache.org/blog/2013/01/24/datafu-the-wd-40-of-big-data.html)
* [DataFu 1.0](http://datafu.incubator.apache.org/blog/2013/09/04/datafu-1-0.html)
* [DataFu's Hourglass: Incremental Data Processing in Hadoop](http://datafu.incubator.apache.org/blog/2013/10/03/datafus-hourglass-incremental-data-processing-in-hadoop.html)## Presentations
* [A Brief Tour of DataFu](http://www.slideshare.net/matthewterencehayes/datafu)
* [Building Data Products at LinkedIn with DataFu](http://www.slideshare.net/matthewterencehayes/building-data-products-at-linkedin-with-datafu)
* [Hourglass: a Library for Incremental Processing on Hadoop (IEEE BigData 2013)](http://www.slideshare.net/matthewterencehayes/hourglass-a-library-for-incremental-processing-on-hadoop)
* [DataFu @ ApacheCon 2014](http://www.slideshare.net/williamgvaughan/datafu-apachecon-33420740)## Videos
* [Introduction to Apache DataFu @ ApacheCon 2014](http://www.youtube.com/watch?v=JWI9tVsQ1cY)
## Other Resources
An interesting example of using Quantile from DataFu can be found in the [Hadoop Real-World Solutions Cookbook](http://packtlib.packtpub.com/library/hadoop-real-world-solutions-cookbook/ch06lvl1sec62).
## From Around the Web
* [DataFu Enters Incubation Status at Apache](http://www.infoq.com/news/2014/02/datafu-asf)
* [DataFu: Open Source Apache Pig UDFs by LinkedIn](http://nosql.mypopescu.com/post/15734212877/datafu-open-source-apache-pig-udfs-by-linkedin)
* [LinkedIn Opens DataFu: A Library for Working with Hadoop and Pig](http://readwrite.com/2012/01/12/linkedin-opens-datafu-a-librar)## Papers
* [Hourglass: a Library for Incremental Processing on Hadoop (IEEE BigData 2013)](http://www.slideshare.net/matthewterencehayes/hourglass-27038297)
## Getting Help
Please visit the website:
* [http://datafu.incubator.apache.org/](http://datafu.incubator.apache.org/)