{"id":13481996,"url":"https://github.com/twitter/summingbird","last_synced_at":"2025-09-27T07:31:02.018Z","repository":{"id":4804402,"uuid":"5957708","full_name":"twitter/summingbird","owner":"twitter","description":"Streaming MapReduce with Scalding and Storm","archived":true,"fork":false,"pushed_at":"2022-01-19T17:31:02.000Z","size":6615,"stargazers_count":2139,"open_issues_count":163,"forks_count":267,"subscribers_count":291,"default_branch":"develop","last_synced_at":"2024-09-26T22:02:09.491Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://twitter.com/summingbird","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/twitter.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-09-25T22:38:35.000Z","updated_at":"2024-09-11T15:35:35.000Z","dependencies_parsed_at":"2022-09-10T21:51:22.848Z","dependency_job_id":null,"html_url":"https://github.com/twitter/summingbird","commit_stats":null,"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2Fsummingbird","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2Fsummingbird/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2Fsummingbird/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2Fsummingbird/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/twitter","download_url":"https://codeload.github.com/twitter/summingbird/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234410008,"owners_count":18828118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T17:00:58.108Z","updated_at":"2025-09-27T07:31:01.598Z","avatar_url":"https://github.com/twitter.png","language":"Scala","readme":"## Summingbird\n\n[![status: retired](https://opensource.twitter.dev/status/retired.svg)](https://opensource.twitter.dev/status/#retired)\n[![Build Status](https://secure.travis-ci.org/twitter/summingbird.png)](http://travis-ci.org/twitter/summingbird)\n[![Codecov branch](https://img.shields.io/codecov/c/github/twitter/summingbird/develop.svg?maxAge=3600)](https://codecov.io/github/twitter/summingbird)\n[![Latest version](https://index.scala-lang.org/twitter/summingbird/summingbird-core/latest.svg?color=orange)](https://index.scala-lang.org/twitter/summingbird/summingbird-core)\n[![Chat](https://badges.gitter.im/twitter/summingbird.svg)](https://gitter.im/twitter/summingbird?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nSummingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including [Storm](https://github.com/nathanmarz/storm) and [Scalding](https://github.com/twitter/scalding).\n\n![Summingbird Logo](https://raw.github.com/twitter/summingbird/develop/logo/summingbird_logo.png)\n\nWhile a word-counting aggregation in pure Scala might look like this:\n\n```scala\n  def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =\n    source.flatMap { sentence =\u003e\n      toWords(sentence).map(_ -\u003e 1L)\n    }.foreach { case (k, v) =\u003e store.update(k, store.get(k) + v) }\n```\n\nCounting words in Summingbird looks like this:\n\n```scala\n  def wordCount[P \u003c: Platform[P]]\n    (source: Producer[P, String], store: P#Store[String, Long]) =\n      source.flatMap { sentence =\u003e\n        toWords(sentence).map(_ -\u003e 1L)\n      }.sumByKey(store)\n```\n\nThe logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in \"batch mode\" (using [Scalding](https://github.com/twitter/scalding)), in \"realtime mode\" (using [Storm](https://github.com/nathanmarz/storm)), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.\n\nSummingbird provides you with the primitives you need to build rock solid production systems.\n\n## Getting Started: Word Count with Twitter\n\nThe `summingbird-example` project allows you to run the wordcount program above on a sample of Twitter data using a local Storm topology and memcache instance. You can find the actual job definition in [ExampleJob.scala](https://github.com/twitter/summingbird/blob/develop/summingbird-example/src/main/scala/com/twitter/summingbird/example/ExampleJob.scala).\n\nFirst, make sure you have `memcached` installed locally. If not, if you're on OS X, you can get it by installing [Homebrew](http://brew.sh/) and running this command in a shell:\n\n```bash\nbrew install memcached\n```\n\nWhen this is finished, run the `memcached` command in a separate terminal.\n\nNow you'll need to set up access to the Twitter Streaming API. [This blog post](http://tugdualgrall.blogspot.com/2012/11/couchbase-create-large-dataset-using.html) has a great walkthrough, so open that page, head over to https://dev.twitter.com/ and get your various keys and tokens. Once you have these, clone the Summingbird repository:\n\n```bash\ngit clone https://github.com/twitter/summingbird.git\ncd summingbird\n```\n\nAnd open [StormRunner.scala](https://github.com/twitter/summingbird/blob/develop/summingbird-example/src/main/scala/com/twitter/summingbird/example/StormRunner.scala) in your editor. Replace the dummy variables under `config` variable with your auth tokens:\n\n```scala\nlazy val config = new ConfigurationBuilder()\n    .setOAuthConsumerKey(\"mykey\")\n    .setOAuthConsumerSecret(\"mysecret\")\n    .setOAuthAccessToken(\"token\")\n    .setOAuthAccessTokenSecret(\"tokensecret\")\n    .setJSONStoreEnabled(true) // required for JSON serialization\n    .build\n```\n\nYou're all ready to go! Now it's time to unleash Storm on your Twitter stream. Make sure the `memcached` terminal is still open, then start Storm from the `summingbird` directory:\n\n```bash\n./sbt \"summingbird-example/run --local\"\n```\n\nStorm should puke out a bunch of output, then stabilize and hang. This means that Storm is updating your local memcache instance with counts of every word that it sees in each tweet.\n\nTo query the aggregate results in Memcached, you'll need to open an SBT repl in a new terminal:\n\n```bash\n./sbt summingbird-example/console\n```\n\nAt the launched repl, run the following:\n\n```scala\nscala\u003e import com.twitter.summingbird.example._\nimport com.twitter.summingbird.example._\n\nscala\u003e StormRunner.lookup(\"i\")\n\u003cmemcache store loading elided\u003e\nres0: Option[Long] = Some(5)\n\nscala\u003e StormRunner.lookup(\"i\")\nres1: Option[Long] = Some(52)\n```\n\nBoom. Counts for the word `\"i\"` are growing in realtime.\n\nSee the [wiki page](https://github.com/twitter/summingbird/wiki/Getting-started-with-summingbird-example) for a more detailed explanation of the configuration required to get this job up and running and some ideas for where to go next.\n\n## Documentation\n\nTo learn more and find links to tutorials and information around the web, check out the [Summingbird Wiki](https://github.com/twitter/summingbird/wiki).\n\nThe latest ScalaDocs are hosted on Summingbird's [Github Project Page](http://twitter.github.io/summingbird).\n\n## Contact\n\nDiscussion occurs primarily on the [Summingbird mailing list](https://groups.google.com/forum/#!forum/summingbird). Issues should be reported on the GitHub issue tracker. Simpler issues appropriate for first-time contributors looking to help out are tagged \"newbie\".\n\nIRC: freenode channel #summingbird\n\nFollow [@summingbird](https://twitter.com/summingbird) on Twitter for updates.\n\nPlease feel free to use the beautiful [Summingbird logo](https://drive.google.com/folderview?id=0B3i3pDi3yVgNMHV0TXVkTGZteWM\u0026usp=sharing) artwork anywhere.\n\n## Get Involved + Code of Conduct\nPull requests and bug reports are always welcome!\n\nWe use a lightweight form of project governence inspired by the one used by Apache projects.\nPlease see [Contributing and Committership](https://github.com/twitter/analytics-infra-governance#contributing-and-committership) for our code of conduct and our pull request review process.\nThe TL;DR is send us a pull request, iterate on the feedback + discussion, and get a +1 from a [Committer](COMMITTERS.md) in order to get your PR accepted.\n\nThe current list of active committers (who can +1 a pull request) can be found here: [Committers](COMMITTERS.md)\n\nA list of contributors to the project can be found here: [Contributors](https://github.com/twitter/summingbird/graphs/contributors)\n\n## Maven\n\nSummingbird modules are published on maven central. The current groupid and version for all modules is, respectively, `\"com.twitter\"` and  `0.9.1`.\n\nCurrent published artifacts are\n\n* `summingbird-core_2.11`\n* `summingbird-core_2.10`\n* `summingbird-batch_2.11`\n* `summingbird-batch_2.10`\n* `summingbird-client_2.11`\n* `summingbird-client_2.10`\n* `summingbird-storm_2.11`\n* `summingbird-storm_2.10`\n* `summingbird-scalding_2.11`\n* `summingbird-scalding_2.10`\n* `summingbird-builder_2.11`\n* `summingbird-builder_2.10`\n\nThe suffix denotes the scala version.\n\n## Authors (alphabetically)\n\n* Oscar Boykin \u003chttps://twitter.com/posco\u003e\n* Ian O'Connell \u003chttps://twitter.com/0x138\u003e\n* Sam Ritchie \u003chttps://twitter.com/sritchie\u003e\n* Ashutosh Singhal \u003chttps://twitter.com/daashu\u003e\n\n## License\n\nCopyright 2013 Twitter, Inc.\n\nLicensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0\n","funding_links":[],"categories":["Table of Contents","Big Data","Scala","Distributed Programming","`Distributed Programming `","大数据","[](https://github.com/josephmisiti/awesome-machine-learning/blob/master/README.md#scala)Scala"],"sub_categories":["DSL","General-Purpose Machine Learning"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwitter%2Fsummingbird","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwitter%2Fsummingbird","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwitter%2Fsummingbird/lists"}