{"id":13600734,"url":"https://github.com/scicloj/tablecloth","last_synced_at":"2025-04-12T19:49:20.895Z","repository":{"id":37486230,"uuid":"272415161","full_name":"scicloj/tablecloth","owner":"scicloj","description":"Dataset manipulation library built on the top of tech.ml.dataset","archived":false,"fork":false,"pushed_at":"2024-04-19T08:15:33.000Z","size":29951,"stargazers_count":266,"open_issues_count":30,"forks_count":18,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-05-01T21:35:00.162Z","etag":null,"topics":["clojure","dataframe","dataset","machinelearning"],"latest_commit_sha":null,"homepage":"https://scicloj.github.io/tablecloth","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scicloj.png","metadata":{"files":{"readme":"README-source.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-15T11:00:12.000Z","updated_at":"2024-05-11T14:46:50.555Z","dependencies_parsed_at":"2023-02-18T01:00:37.637Z","dependency_job_id":"da552898-dc3b-4072-8841-a80cfb646719","html_url":"https://github.com/scicloj/tablecloth","commit_stats":{"total_commits":156,"total_committers":8,"mean_commits":19.5,"dds":"0.23076923076923073","last_synced_commit":"1e9c8d88c46cdf4492f8fd0e219b4f6af94886a9"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scicloj%2Ftablecloth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scicloj%2Ftablecloth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scicloj%2Ftablecloth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scicloj%2Ftablecloth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scicloj","download_url":"https://codeload.github.com/scicloj/tablecloth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625501,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","dataframe","dataset","machinelearning"],"created_at":"2024-08-01T18:00:47.450Z","updated_at":"2025-04-12T19:49:20.875Z","avatar_url":"https://github.com/scicloj.png","language":"HTML","funding_links":[],"categories":["Clojure","Libraries"],"sub_categories":["[Tools](#tools-1)"],"readme":"# Tablecloth\n\nDataset (data frame) manipulation API for the tech.ml.dataset library\n\n\n[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)\n[![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth)\n[![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)\n\n## Versions\n\n### tech.ml.dataset 7.x (master branch)\n\n[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)\n\n### tech.ml.dataset 4.x (4.0 branch)\n\n`[scicloj/tablecloth \"4.04\"]`\n\n## Introduction\n\n[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack.\n\nI've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/).\n\nDuring conversions of the examples I've come up how to reorganized existing `tech.ml.dataset` functions into simple to use API. The main goals were:\n\n* Focus on dataset manipulation functionality, leaving other parts of `tech.ml` like pipelines, datatypes, readers, ML, etc.\n* Single entry point for common operations - one function dispatching on given arguments.\n* `group-by` results with special kind of dataset - a dataset containing subsets created after grouping as a column.\n* Most operations recognize regular dataset and grouped dataset and process data accordingly.\n* One function form to enable thread-first on dataset.\n\nImportant! This library is not the replacement of `tech.ml.dataset` nor a separate library. It should be considered as a addition on the top of `tech.ml.dataset`.\n\nIf you want to know more about `tech.ml.dataset` and `dtype-next` please refer their documentation:\n\n* [tech.ml.dataset walkthrough](https://techascent.github.io/tech.ml.dataset/walkthrough.html)\n* [dtype-next overview](https://cnuernber.github.io/dtype-next/overview.html)\n* [dtype-next cheatsheet](https://cnuernber.github.io/dtype-next/cheatsheet.html)\n\nJoin the discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)\n\n## Documentation\n\nPlease refer [detailed documentation with examples](https://scicloj.github.io/tablecloth).\n\nThe old documentation (till the end of 2023) is [here](https://scicloj.github.io/tablecloth/old).\n\n## Usage example\n\n```{clojure results=\"hide\"}\n(require '[tablecloth.api :as tc])\n```\n\n```{clojure results=\"asis\"}\n(-\u003e \"https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv\"\n    (tc/dataset {:key-fn keyword})\n    (tc/group-by (fn [row]\n                    {:symbol (:symbol row)\n                     :year (tech.v3.datatype.datetime/long-temporal-field :years (:date row))}))\n    (tc/aggregate #(tech.v3.datatype.functional/mean (% :price)))\n    (tc/order-by [:symbol :year])\n    (tc/head 10))\n```\n\n## Contributing\n\n`Tablecloth` is open for contribution. The best way to start is discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api).\n\n### Development tools for documentation\n\nDocumentation is written in the [Kindly](https://scicloj.github.io/kindly/) convention and is rendered using [Clay](https://scicloj.github.io/clay/) composed with [Quarto](https://quarto.org/).\n\nThe old documentation was written in RMarkdown and is kept under [docs/old/](./docs/old/).\n\nDocumentation contains around 600 code snippets which are run during build. There are three relevant source files:\n\n* [README-source.md](./README-source.md) for README.md\n* [notebooks/index.clj](./notebooks/index.clj) for the detailed documentation\n* [clay.edn](./clay.edn) for some styling options of the docs\n\n(`notebooks/index.clj` was generated by [dev/conversion.clj](dev/conversion.clj) from the earlier Rmarkdown-based `index.Rmd` with asome additional manual editing. Starting at 2024, it will diverge from that source, that will no longer be maintained.)\n\n### README generation\n\nTo generate `README.md`, run the `generate!` function at the [dev/readme_generation.clj](./dev/readme_generation.clj) script.\n\n### Detailed documentation generation\n\nTo generate the detailed documentation, call the following. You will need the Quarto CLI [installed](https://quarto.org/docs/get-started/) in your system.\n\nCurrently (April 2024), we use Quarto's [v1.5.10 pre-release](https://github.com/quarto-dev/quarto-cli/releases/tag/v1.5.10) (specifically this version, not the later ones) due to some Quarto bugs.\n\n```{clojure eval=FALSE}\n(require '[scicloj.clay.v2.api :as clay])\n(clay/make! {:format [:quarto :html]\n             :source-path \"notebooks/index.clj\"})\n```\n\n### Code Generation\n\nTo build this project fully we need to perform some code generation operations. These are listed below:\n\n1. Build the `tablecloth.api.operators` namespace\n\n    The `tablecloth.api.operators` namespace is generated by\n`tablecloth.api.lift_operators`. To build that namespace, you need to\nload the `tablecloth.api.lift_operators` namespace, and then execute\nthe code surrounded by a comment at the bottom of the file.\n\n2. Build the `tablecloth.api` (aka the Dataset API)\n\n    The `tablecloth.api` namespace is generated out of `api-template`. To\nbuild that namespace you need to load the\n`tablecloth.api.api-template` namespace, and then evaluate the code\ncontained in the comment section at the bottom of the file. This will\nre-generate the `tablecloth.api` namespace.\n\n3. Build the `tablecloth.column.api.operators`  namespace\n\n    The `tablecloth.column.api.operators` namespace is generated by\n`tablecloth.column.api.lift_operators`. To build that namespace, you\nneed to load the `tablecloth.api.lift_operators` namespace, and then\nexecute the code surrounded by a comment at the bottom of the file.\n\n4. Build the `tablecloth.column.api` (aka the Column API) \n\n    The `tablecloth.column.api` namespace is generated out of\n`api-template`. To build that namespace you need to load the\n`tablecloth.column.api.api-template` namespace, and then evaluate the\ncode contained in the comment section at the bottom of the file. This\nwill re-generate the `tablecloth.column.api` namespace.\n\n\n### Guideline\n\n1. Before commiting changes please perform tests. I ususally do: `lein do clean, check, test` and build documentation as described above (which also tests whole library).\n2. Keep API as simple as possible:\n    - first argument should be a dataset\n    - if parametrizations is complex, last argument should accept a map with not obligatory function arguments\n    - avoid variadic associative destructuring for function arguments\n    - usually function should working on grouped dataset as well, accept `parallel?` argument then (if applied).\n3. Follow `potemkin` pattern and import functions to the API namespace using `tech.v3.datatype.export-symbols/export-symbols` function\n4. Functions which are composed out of API function to cover specific case(s) should go to `tablecloth.utils` namespace.\n5. Always update `README-source.md`, `CHANGELOG.md`, `notebooks/index.clj`, tests and function docs are highly welcomed.\n6. Always discuss changes and PRs first\n\n### Tests\n\nTests are written and run using [midje](https://github.com/marick/Midje/). To run a test, evaluate a midje form. If it passes, it will return `true`, if it fails details will be printed to the REPL.\n\n## TODO\n\n* elaborate on tests\n* tutorials\n\n## Licence\n\nCopyright (c) 2020 Scicloj\n\nThe MIT Licence\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscicloj%2Ftablecloth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscicloj%2Ftablecloth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscicloj%2Ftablecloth/lists"}