{"id":13439177,"url":"https://github.com/whitfin/efflux","last_synced_at":"2025-04-16T08:43:28.583Z","repository":{"id":57623456,"uuid":"147120860","full_name":"whitfin/efflux","owner":"whitfin","description":"Easy Hadoop Streaming and MapReduce interfaces in Rust","archived":false,"fork":false,"pushed_at":"2024-04-13T01:04:31.000Z","size":53,"stargazers_count":34,"open_issues_count":1,"forks_count":8,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-04-14T15:23:40.680Z","etag":null,"topics":["hadoop","mapreduce","processing"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/whitfin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-02T20:51:21.000Z","updated_at":"2024-07-31T04:36:06.382Z","dependencies_parsed_at":"2024-01-07T05:56:07.844Z","dependency_job_id":"6c1b5347-d682-41b4-9f23-2a36b9777a7d","html_url":"https://github.com/whitfin/efflux","commit_stats":{"total_commits":50,"total_committers":1,"mean_commits":50.0,"dds":0.0,"last_synced_commit":"fe756d76cedb962abcd1934285b9d58e6b00d48c"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fefflux","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fefflux/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fefflux/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fefflux/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/whitfin","download_url":"https://codeload.github.com/whitfin/efflux/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249221599,"owners_count":21232432,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hadoop","mapreduce","processing"],"created_at":"2024-07-31T03:01:11.760Z","updated_at":"2025-04-16T08:43:28.559Z","avatar_url":"https://github.com/whitfin.png","language":"Rust","funding_links":[],"categories":["Libraries","库","库 Libraries","Rust"],"sub_categories":["Distributed systems","分布式系统","分布式系统 Distributed systems"],"readme":"# Efflux\n[![Crates.io](https://img.shields.io/crates/v/efflux.svg)](https://crates.io/crates/efflux) [![Build Status](https://img.shields.io/github/actions/workflow/status/whitfin/efflux/ci.yml)](https://github.com/whitfin/efflux/actions)\n\nEfflux is a set of Rust interfaces for MapReduce and Hadoop Streaming. It enables Rust developers to run batch jobs on Hadoop infrastructure whilst staying with the efficiency and safety they're used to.\n\nInitially written to scratch a personal itch, this crate offers simple traits to mask the internals of working with Hadoop Streaming which lend themselves well to writing jobs quickly. Functionality is handed off to macros where possible to provide compile time guarantees, and any other functionality is kept simple to avoid overhead wherever possible.\n\n## Installation\n\nEfflux is available on [crates.io](https://crates.io/crates/efflux) as a library crate, so you only need to add it as a dependency:\n\n```toml\n[dependencies]\nefflux = \"2.0\"\n```\n\nYou can then gain access to everything relevant using the `prelude` module of Efflux:\n\n```rust\nuse efflux::prelude::*;\n```\n\n## Usage\n\nEfflux comes with a handy template to help generate new projects, using the [kickstart](https://github.com/Keats/kickstart) tool. You can simply use the commands below and follow the prompt to generate a new project skeleton:\n\n```shell\n# install kickstart\n$ cargo install kickstart\n\n# create a project from the template\n$ kickstart -s examples/template https://github.com/whitfin/efflux\n```\n\nIf you'd rather not use the templating tool, you can always work from the examples found in this repository. A good place to start is the traditional [wordcount](examples/wordcount) example.\n\n## Testing\n\nTesting your binaries is actually fairly simple, as you can simulate the Hadoop phases using a basic UNIX pipeline. The following example replicates the Hadoop job flow and generates output that matches a job executed with Hadoop itself:\n\n```shell\n# example Hadoop task invocation\n$ hadoop jar hadoop-streaming-2.8.2.jar \\\n    -input \u003cINPUT\u003e \\\n    -output \u003cOUTPUT\u003e \\\n    -mapper \u003cMAPPER\u003e \\\n    -reducer \u003cREDUCER\u003e\n\n# example simulation run via UNIX utilities\n$ cat \u003cINPUT\u003e | \u003cMAPPER\u003e | sort -k1,1 | \u003cREDUCER\u003e \u003e \u003cOUTPUT\u003e\n```\n\nThis can be tested using the [wordcount](examples/wordcount) example to confirm that the outputs are indeed the same. There may be some cases where output differs, but it should be sufficient for many cases.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhitfin%2Fefflux","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwhitfin%2Fefflux","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhitfin%2Fefflux/lists"}