{"id":13600775,"url":"https://github.com/holgerbrandl/krangl","last_synced_at":"2025-04-11T01:30:30.528Z","repository":{"id":8586584,"uuid":"58922059","full_name":"holgerbrandl/krangl","owner":"holgerbrandl","description":"krangl is a {K}otlin DSL for data w{rangl}ing","archived":true,"fork":false,"pushed_at":"2023-01-28T22:24:06.000Z","size":22435,"stargazers_count":560,"open_issues_count":29,"forks_count":50,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-11-07T03:42:17.727Z","etag":null,"topics":["data-mining","datascience","java","kotlin","sql"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/holgerbrandl.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-16T09:55:34.000Z","updated_at":"2024-10-23T21:14:22.000Z","dependencies_parsed_at":"2023-02-15T19:16:04.000Z","dependency_job_id":null,"html_url":"https://github.com/holgerbrandl/krangl","commit_stats":null,"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holgerbrandl%2Fkrangl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holgerbrandl%2Fkrangl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holgerbrandl%2Fkrangl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holgerbrandl%2Fkrangl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/holgerbrandl","download_url":"https://codeload.github.com/holgerbrandl/krangl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248324948,"owners_count":21084837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","datascience","java","kotlin","sql"],"created_at":"2024-08-01T18:00:48.391Z","updated_at":"2025-04-11T01:30:25.518Z","avatar_url":"https://github.com/holgerbrandl.png","language":"Kotlin","funding_links":[],"categories":["Libraries","数据科学"],"sub_categories":[],"readme":"# krangl\n\n[ ![Download](https://img.shields.io/badge/Maven%20Central-0.18.4-orange) ](https://mvnrepository.com/artifact/com.github.holgerbrandl/krangl)  [![Build Status](https://github.com/holgerbrandl/krangl/workflows/build/badge.svg)](https://github.com/holgerbrandl/krangl/actions?query=workflow%3Abuild) [![Gitter](https://badges.gitter.im/holgerbrandl/krangl.svg)](https://gitter.im/holgerbrandl/krangl?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n\n\u003e ## `krangl` is no longer developed. It was a wonderful experiement, but has been superceeded with the more complete, more usable and more modern https://github.com/Kotlin/dataframe.\n\n`krangl` is a {K}otlin library for data w{rangl}ing. By implementing a grammar of data manipulation using a modern functional-style API, it allows to filter, transform, aggregate and reshape tabular data.\n\n`krangl` is heavily inspired by the amazing [`dplyr`](https://github.com/hadley/dplyr) for [R](https://www.r-project.org/). `krangl` is written in [Kotlin](https://kotlinlang.org/), excels in Kotlin, but emphasizes as well on good java-interop. It is mimicking the API of `dplyr`, while carefully adding more typed constructs where possible.\n\n\n\n[TOC levels=2,2]: # \" \"\n\n- [Installation](#installation)\n- [Features](#features)\n- [Examples](#examples)\n- [Documentation](#documentation)\n- [How to contribute?](#how-to-contribute)\n\n\nIf you're not sure about how to proceed, check out  [krangl in 10 minutes](http://holgerbrandl.github.io/krangl/10_minutes/) section in the\n**[krangl user guide](http://holgerbrandl.github.io/krangl/)**.\n\n\nInstallation\n------------\n\nTo get started simply add it as a dependency to your `build.gradle`:\n\n```groovy\nrepositories {\n    mavenCentral() \n}\n\ndependencies {\n    implementation \"com.github.holgerbrandl:krangl:0.18.4\"\n}\n```\nDeclaring the repository is purely optional as it is the default already.\n\nYou can also use [JitPack with Maven or Gradle](https://jitpack.io/#holgerbrandl/krangl) to build the latest snapshot as a dependency in your project.\n\n```groovy\nrepositories {\n    maven { url 'https://jitpack.io' }\n}\ndependencies {\n    implementation 'com.github.holgerbrandl:krangl:-SNAPSHOT'\n}\n```\n\nTo build and install it into your local maven cache, simply clone the repo and run\n```bash\n./gradlew install\n```\n\n\nFeatures\n--------\n\n* Filter, transform, aggregate and reshape tabular data\n* Modern, user-friendly and easy-to-learn data-science API\n* Reads from plain and compressed tsv, csv, json, or any delimited format with or without header from local or remote\n* Supports grouped operations\n* Ships with JDBC support\n* Tables can contain atomic columns (int, double, boolean) as well as object columns\n* Reshape tables from wide to long and back\n* Table joins (left, right, semi, inner, outer)\n* Cross tabulation\n* Descriptive statistics (mean, min, max, median, ...)\n* Functional API inspired by [dplyr](http://dplyr.tidyverse.org/), [pandas](http://pandas.pydata.org/), and Kotlin [stdlib](https://kotlinlang.org/api/latest/jvm/stdlib/index.html)\n\n* many more...\n\n`krangl` is _just_ about data wrangling. For data visualization we recommend [`kravis`](https://github.com/holgerbrandl/kravis) which seamlessly integrates with krangl and implements a grammar to build a wide variety of plots.\n\n\nExamples\n--------\n\n```kotlin\n// Read data-frame from disk\nval iris = DataFrame.readTSV(\"data/iris.txt\")\n\n\n// Create data-frame in memory\nval df: DataFrame = dataFrameOf(\n    \"first_name\", \"last_name\", \"age\", \"weight\")(\n    \"Max\", \"Doe\", 23, 55,\n    \"Franz\", \"Smith\", 23, 88,\n    \"Horst\", \"Keanes\", 12, 82\n)\n\n// Or from csv\n// val otherDF = DataFrame.readCSV(\"path/to/file\")\n\n// Print rows\ndf                              // with implict string conversion using default options\ndf.print(colNames = false)      // with custom printing options\n\n// Print structure\ndf.schema()\n\n\n// Add columns with mutate\n// by adding constant values as new column\ndf.addColumn(\"salary_category\") { 3 }\n\n// by doing basic column arithmetics\ndf.addColumn(\"age_3y_later\") { it[\"age\"] + 3 }\n\n// Note: krangl dataframes are immutable so we need to (re)assign results to preserve changes.\nval newDF = df.addColumn(\"full_name\") { it[\"first_name\"] + \" \" + it[\"last_name\"] }\n\n// Also feel free to mix types here since krangl overloads  arithmetic operators like + for dataframe-columns\ndf.addColumn(\"user_id\") { it[\"last_name\"] + \"_id\" + rowNumber }\n\n// Create new attributes with string operations like matching, splitting or extraction.\ndf.addColumn(\"with_anz\") { it[\"first_name\"].asStrings().map { it!!.contains(\"anz\") } }\n\n// Note: krangl is using 'null' as missing value, and provides convenience methods to process non-NA bits\ndf.addColumn(\"first_name_initial\") { it[\"first_name\"].map\u003cString\u003e{ it.first() } }\n\n// or add multiple columns at once\ndf.addColumns(\n    \"age_plus3\" to { it[\"age\"] + 3 },\n    \"initials\" to { it[\"first_name\"].map\u003cString\u003e { it.first() } concat it[\"last_name\"].map\u003cString\u003e { it.first() } }\n)\n\n\n// Sort your data with sortedBy\ndf.sortedBy(\"age\")\n// and add secondary sorting attributes as varargs\ndf.sortedBy(\"age\", \"weight\")\ndf.sortedByDescending(\"age\")\ndf.sortedBy { it[\"weight\"].asInts() }\n\n\n// Subset columns with select\ndf.select2 { it is IntCol } // functional style column selection\ndf.select(\"last_name\", \"weight\")    // positive selection\ndf.remove(\"weight\", \"age\")  // negative selection\ndf.select({ endsWith(\"name\") })    // selector mini-language\n\n\n// Subset rows with vectorized filter\ndf.filter { it[\"age\"] eq 23 }\ndf.filter { it[\"weight\"] gt 50 }\ndf.filter({ it[\"last_name\"].isMatching { startsWith(\"Do\")  }})\n\n// In case vectorized operations are not possible or available we can also filter tables by row\n// which allows for scalar operators\ndf.filterByRow { it[\"age\"] as Int \u003e 5 }\ndf.filterByRow { (it[\"age\"] as Int).rem(10) == 0 } // round birthdays :-)\n\n\n// Summarize\n\n// do simple cross tabulations\ndf.count(\"age\", \"last_name\")\n\n// ... or calculate single summary statistic\ndf.summarize(\"mean_age\" to { it[\"age\"].mean(true) })\n\n// ... or multiple summary statistics\ndf.summarize(\n    \"min_age\" to { it[\"age\"].min() },\n    \"max_age\" to { it[\"age\"].max() }\n)\n\n// for sake of r and python adoptability you can also use `=` here\ndf.summarize(\n    \"min_age\" `=` { it[\"age\"].min() },\n    \"max_age\" `=` { it[\"age\"].max() }\n)\n\n// Grouped operations\nval groupedDf: DataFrame = df.groupBy(\"age\") // or provide multiple grouping attributes with varargs\nval sumDF = groupedDf.summarize(\n    \"mean_weight\" to { it[\"weight\"].mean(removeNA = true) },\n    \"num_persons\" to { nrow }\n)\n\n// Optionally ungroup the data\nsumDF.ungroup().print()\n\n// generate object bindings for kotlin.\n// Unfortunately the syntax is a bit odd since we can not access the variable name by reflection\nsumDF.printDataClassSchema(\"Person\")\n\n// This will generate and print the following conversion code:\ndata class Person(val age: Int, val mean_weight: Double, val num_persons: Int)\n\nval records = sumDF.rows.map { row -\u003e Person(row[\"age\"] as Int, row[\"mean_weight\"] as Double, row[\"num_persons\"] as Int) }\n\n// Now we can use the krangl result table in a strongly typed way\nrecords.first().mean_weight\n\n// Vice versa we can also convert an existing set of objects into\nval recordsDF = records.asDataFrame()\nrecordsDF.print()\n\n// to populate a data-frame with selected properties only, we can do\nval deparsedDF = records.deparseRecords { mapOf(\"age\" to it.age, \"weight\" to it.mean_weight) }\n\n```\n\nDocumentation\n-------------\n\n`krangl` is not yet mature, full of bugs and its API is in constant flux. Nevertheless, feel welcome to submit pull-requests or tickets, or simply get in touch via gitter (see button on top).\n\n* [Krangl User Guide](http://holgerbrandl.github.io/krangl) for detailed information about the API and usage examples.\n* [API Docs](http://holgerbrandl.github.io/krangl/javadoc/krangl/) for detailed information about the API including manu usage examples\n* TBD `krangl` Cheat Sheet\n\n\nAnother great [introduction into data-science with kotlin](https://blog.jetbrains.com/kotlin/2019/12/making-kotlin-ready-for-data-science/)\n was presented at 2019's [KotlinConf](https://kotlinconf.com/)\n  by Roman Belov from [JetBrains](https://www.jetbrains.com/).\n\n\n\nHow to contribute?\n------------------\n\nFeel welcome to post ideas, suggestions and criticism to our [tracker](https://github.com/holgerbrandl/krangl/issues).\n\nWe always welcome pull requests. :-)\n\nYou could also show your spiritual support by upvoting `krangl` here on github.\n\nAlso see\n\n* [Developer Information](./docs/devel.md) with technical notes \u0026 details about to build, test, release and improve `krangl`\n* [Roadmap](./docs/roadmap.md) complementing the tracker with a backlog\n\nAlso, there are a few issues in the IDE itself which limit the applicability/usability of `krangl`,  So, you may want to vote for\n\n* [KT-24789](https://youtrack.jetbrains.net/issue/KT-24789) \"Unresolved reference\" when running a script which is a symlink to a script outside of source roots\n* [KT-12583](https://youtrack.jetbrains.com/issue/KT-12583) IDE REPL should run in project root directory\n* [KT-11409](https://youtrack.jetbrains.com/issue/KT-11409) Allow to \"Send Selection To Kotlin Console\"\n* [KT-13319](https://youtrack.jetbrains.net/issue/KT-13319) Support \":paste\" for pasting multi-line expressions in REPL\n* [KT-21224](https://youtrack.jetbrains.net/issue/KT-21224) REPL output is not aligned with input","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fholgerbrandl%2Fkrangl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fholgerbrandl%2Fkrangl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fholgerbrandl%2Fkrangl/lists"}