{"id":14982393,"url":"https://github.com/seznam/euphoria","last_synced_at":"2025-08-21T10:31:59.329Z","repository":{"id":41432456,"uuid":"79946335","full_name":"seznam/euphoria","owner":"seznam","description":"Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.","archived":false,"fork":false,"pushed_at":"2022-11-15T23:47:57.000Z","size":4087,"stargazers_count":82,"open_issues_count":35,"forks_count":11,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-12-07T00:04:44.264Z","etag":null,"topics":["apache-flink","apache-spark","batch-processing","big-data","hadoop","hdfs","java-api","kafka","streaming-data","unified-bigdata-processing"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seznam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-24T19:32:26.000Z","updated_at":"2024-03-31T14:18:56.000Z","dependencies_parsed_at":"2022-09-07T20:23:20.519Z","dependency_job_id":null,"html_url":"https://github.com/seznam/euphoria","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seznam%2Feuphoria","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seznam%2Feuphoria/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seznam%2Feuphoria/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seznam%2Feuphoria/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seznam","download_url":"https://codeload.github.com/seznam/euphoria/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230507051,"owners_count":18236944,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-flink","apache-spark","batch-processing","big-data","hadoop","hdfs","java-api","kafka","streaming-data","unified-bigdata-processing"],"created_at":"2024-09-24T14:05:20.139Z","updated_at":"2024-12-19T22:08:52.504Z","avatar_url":"https://github.com/seznam.png","language":"Java","readme":"# Euphoria\n\n[![Build Status](https://travis-ci.org/seznam/euphoria.svg?branch=master)](https://travis-ci.org/seznam/euphoria)\n\nEuphoria is an open source Java API for creating unified big-data\nprocessing flows.  It provides an engine independent programming model\nthat can express both batch and stream transformations.\n\nThe main goal of the API is to ease the creation of programs with\nbusiness logic independent of a specific runtime framework/engine and\nindependent of the source or destination of the processed data.  Such\nprograms are then transferable with little effort to new environments\nand new data sources or destinations - idealy just by configuration.\n\n\n## Key features\n\n * Unified API that supports both batch and stream processing using\n   the same code\n * Avoids vendor lock-in - migrating between different engines is\n   matter of configuration\n * Declarative Java API using Java 8 Lambda expressions\n * Support for different notions of time (_event time, ingestion\n   time_)\n * Flexible windowing (_Time, TimeSliding, Session, Count_)\n\n## Download\n\nThe best way to use Euphoria is by adding the following Maven dependency to your _pom.xml_:\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003ecz.seznam.euphoria\u003c/groupId\u003e\n  \u003cartifactId\u003eeuphoria-core\u003c/artifactId\u003e\n  \u003cversion\u003e0.7.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\nYou may want to add additional modules, such as support of various engines or I/O data sources/sinks. For more details read the [Maven Dependencies](https://github.com/seznam/euphoria/wiki/Maven-dependencies) wiki page.\n\n\n## WordCount example\n\n```java\n// Define data source and data sinks\nDataSource\u003cString\u003e dataSource = new SimpleHadoopTextFileSource(inputPath);\nDataSink\u003cString\u003e dataSink = new SimpleHadoopTextFileSink\u003c\u003e(outputPath);\n\n// Define a flow, i.e. a chain of transformations\nFlow flow = Flow.create(\"WordCount\");\n\nDataset\u003cString\u003e lines = flow.createInput(dataSource);\n\nDataset\u003cString\u003e words = FlatMap.named(\"TOKENIZER\")\n    .of(lines)\n    .using((String line, Collector\u003cString\u003e context) -\u003e {\n      for (String word : line.split(\"\\\\s+\")) {\n        context.collect(word);\n      }\n    })\n    .output();\n\nDataset\u003cPair\u003cString, Long\u003e\u003e counted = ReduceByKey.named(\"COUNT\")\n    .of(words)\n    .keyBy(w -\u003e w)\n    .valueBy(w -\u003e 1L)\n    .combineBy(Sums.ofLongs())\n    .output();\n\nMapElements.named(\"FORMAT\")\n    .of(counted)\n    .using(p -\u003e p.getFirst() + \"\\n\" + p.getSecond())\n    .output()\n    .persist(dataSink);\n\n// Initialize an executor and run the flow (using Apache Flink)\ntry {\n  Executor executor = new FlinkExecutor();\n  executor.submit(flow).get();\n} catch (InterruptedException ex) {\n  LOG.warn(\"Interrupted while waiting for the flow to finish.\", ex);\n} catch (IOException | ExecutionException ex) {\n  throw new RuntimeException(ex);\n}\n```\n\n## Supported Engines\n\nEuphoria translates flows, also known as data transformation\npipelines, into the specific API of a chosen, supported big-data\nprocessing engine.  Currently, the following are supported:\n\n * [Apache Flink](https://flink.apache.org/)\n * [Apache Spark](http://spark.apache.org/)\n * An independent, standalone, in-memory engine which is part of the\n   Euphoria project suitable for running flows in unit tests.\n\nIn the WordCount example from above, to switch the execution engine\nfrom Apache Flink to Apache Spark, we'd merely need to replace\n`FlinkExecutor` with `SparkExecutor`.\n\n## Bugs / Features / Contributing\n\nThere's still a lot of room for improvements and extensions.  Have a\nlook into the [issue tracker](https://github.com/seznam/euphoria/issues)\nand feel free to contribute by reporting new problems, contributing to\nexisting ones, or even open issues in case of questions.  Any constructive\nfeedback is warmly welcome!\n\nAs usually with open source, don't hesitate to fork the repo and\nsubmit a pull requests if you see something to be changed.  We'll be\nhappy see euphoria improving over time.\n\n## Building\n\nTo build the Euphoria artifacts, the following is required:\n\n* Git\n* Java 8\n\nBuilding the project itself is a matter of:\n\n```\ngit clone https://github.com/seznam/euphoria\ncd euphoria\n./gradlew publishToMavenLocal -xtest\n```\n\n## Documentation\n\n* An incipient documentation is currently maintained in the form of a\n  [Wiki on Github](https://github.com/seznam/euphoria/wiki), including a brief [FAQ page](https://github.com/seznam/euphoria/wiki/FAQ).\n\n* Another source of documentation are deliberately simple examples\n  maintained in the [euphoria-examples module](https://github.com/seznam/euphoria/tree/master/euphoria-examples).\n  \n## Contact us\n\n* Feel free to open an issue in the [issue tracker](https://github.com/seznam/euphoria/issues)\n\n## License\n\nEuphoria is licensed under the terms of the Apache License 2.0.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseznam%2Feuphoria","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseznam%2Feuphoria","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseznam%2Feuphoria/lists"}