{"id":18687914,"url":"https://github.com/tenmax/poppy","last_synced_at":"2025-08-22T11:39:45.862Z","repository":{"id":134627511,"uuid":"57257816","full_name":"tenmax/poppy","owner":"tenmax","description":"A dataframe library for java","archived":false,"fork":false,"pushed_at":"2017-06-18T05:51:56.000Z","size":1184,"stargazers_count":78,"open_issues_count":1,"forks_count":10,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-12T05:35:38.511Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tenmax.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-28T00:28:37.000Z","updated_at":"2024-11-11T07:55:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"9bd850f3-ab1f-4acb-a32e-c0ca9bba817e","html_url":"https://github.com/tenmax/poppy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tenmax/poppy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenmax%2Fpoppy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenmax%2Fpoppy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenmax%2Fpoppy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenmax%2Fpoppy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tenmax","download_url":"https://codeload.github.com/tenmax/poppy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tenmax%2Fpoppy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271631139,"owners_count":24793433,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T10:34:54.045Z","updated_at":"2025-08-22T11:39:45.839Z","avatar_url":"https://github.com/tenmax.png","language":"Java","funding_links":[],"categories":["数据科学"],"sub_categories":[],"readme":"# Poppy\n*poppy* is dataframe library for java, which provides common SQL operations (e.g. select, from, where, group by, order by, distinct) to process data in java.\n\nUnlike other dataframe libraries, which keep all the data in memory, *poppy* process data in streaming manager. That is, it is more similar as [Java8 Stream library](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html), but relational version.\n\nHere is a simple example. We have a `Student` class\n\n```java\npublic class Student {\n    private int studentId;\n    private String name;\n    private int grade;\n    private int room;\n    private int height;\n    private int weight;\n    ...\n}\n```\n\nIn SQL, we have a query like this\n\n```sql\nselect \n    grade, \n    room, \n    avg(weight) as weight, \n    avg(height) as height\nfrom Student\ngroup by grade, room\norder by grade, room\n```\n\nHere is the *Poppy*'s version \n\n```java\nList\u003cStudent\u003e students = ...;\n\nDataFrame\n.from(students, Student.class)\n.groupby(\"grade\", \"room\")\n.aggregate(\n    avgLong(\"weight\").as(\"weight\"),\n    avgLong(\"height\").as(\"height\"))\n.sort(\"grade\", \"room\")\n.print();\n```\n\n\n\n# Getting Started\n\n## Requirement\nJava 8 or higher\n\n## Dependency\n\nPoppy's package is managed by [JCenter](https://bintray.com/bintray/jcenter) repository.\n\nMaven\n\n```\n\u003cdependency\u003e\n  \u003cgroupId\u003eio.tenmax\u003c/groupId\u003e\n  \u003cartifactId\u003epoppy\u003c/artifactId\u003e\n  \u003cversion\u003e0.1.8\u003c/version\u003e\n  \u003ctype\u003epom\u003c/type\u003e\n\u003c/dependency\u003e\n```\n\nGradle\n\n```\ncompile 'io.tenmax:poppy:0.1.8'\n```\n## Features\n\n1. Support the most common operations in SQL. e.g. select, from, where, group by, order by, distinct\n2. Support the most common aggregation functions in SQL. e.g. *avg()*, *sum()*, *count()*, *min()*, *max()*\n3. **Custom aggregation functions.** by  [java.util.stream.Collector](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html)\n4. **Partition support.** Partition is the unit of parallelism. Multiple partitions allow you processing data concurrently.\n5. **Multi-threaded support**. For CPU-bound jobs, it leverages all your CPU resources for better performance; for IO-bound jobs, it reduces the waiting time, and take adventages of better concurrency.\n6. Suitable for both **batch** and **streaming** scenario.\n7. **Lightweight**. Comparing to [Spark DataFrame API](https://spark.apache.org/docs/latest/sql-programming-guide.html), it is much more lightweight to embed in your application.\n8. **Stream-based design**. Comparing to [joinery](https://github.com/cardillo/joinery), which keeps the whole data in memory. *Poppy*'s streaming behaviour allows limited memory to process huge volume of data.\n\n## Documentation\n\n- [JavaDoc](http://tenmax.github.io/poppy/docs/javadoc/index.html)\n- [User Manual](http://tenmax.github.io/poppy/)\n\n# Contribution\n\nPlease fork this project and pull request to me and any comment would be appreciated!\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftenmax%2Fpoppy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftenmax%2Fpoppy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftenmax%2Fpoppy/lists"}