{"id":26288918,"url":"https://github.com/clojure-finance/clojask","last_synced_at":"2025-05-07T20:37:13.552Z","repository":{"id":37004873,"uuid":"357756730","full_name":"clojure-finance/clojask","owner":"clojure-finance","description":"Clojask is a Clojure data processing framework with parallel computing on larger-than-memory datasets","archived":false,"fork":false,"pushed_at":"2023-09-04T13:57:49.000Z","size":10550,"stargazers_count":121,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"2.x.x","last_synced_at":"2025-05-07T20:37:06.604Z","etag":null,"topics":["big-data","clojure","dataframe","parallel-computing"],"latest_commit_sha":null,"homepage":"https://clojure-finance.github.io/clojask-website","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clojure-finance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-04-14T03:06:56.000Z","updated_at":"2025-02-01T20:03:40.000Z","dependencies_parsed_at":"2023-12-19T04:22:37.008Z","dependency_job_id":"6b769a5b-8d9d-4a19-8971-78f075e44d9e","html_url":"https://github.com/clojure-finance/clojask","commit_stats":{"total_commits":375,"total_committers":8,"mean_commits":46.875,"dds":0.5226666666666666,"last_synced_commit":"a3723839663901dd142d7b4519b62decb740b0e7"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clojure-finance%2Fclojask","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clojure-finance%2Fclojask/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clojure-finance%2Fclojask/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clojure-finance%2Fclojask/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clojure-finance","download_url":"https://codeload.github.com/clojure-finance/clojask/tar.gz/refs/heads/2.x.x","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252954125,"owners_count":21830892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","clojure","dataframe","parallel-computing"],"created_at":"2025-03-14T22:15:33.913Z","updated_at":"2025-05-07T20:37:13.529Z","avatar_url":"https://github.com/clojure-finance.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Clojask\n\u003e Clojure data processing framework with parallel computing on larger-than-memory datasets\n\n### Features\n\n- **Unlimited Size**\n\n  It supports datasets larger than memory.\n\n- **Various Operations**\n\n  Although Clojask is designed for larger-than-memory datasets, like NoSQLs, it does not sacrifice common operations on relational dataframes, such as [group by](https://clojure-finance.github.io/clojask-website/posts-output/API/#group-by), [aggregate](https://clojure-finance.github.io/clojask-website/posts-output/API/#aggregate), [join](https://clojure-finance.github.io/clojask-website/posts-output/API/#inner-join--left-join--right-join).\n\n- **Fast**\n\n  Faster than Dask in most operations, and the larger the dataframe is, the bigger the advantage. Please find the benchmarks [here](https://clojure-finance.github.io/clojask-website/pages-output/about/#benchmarks).\n\n- **All Native Types**\n\n  All the datatypes used to store data are native Clojure (or Java) types.\n\n- **From File to File**\n\n  Integrate IO inside the dataframe. No need to write your own read-in and output functions.\n\n- **Parallel**\n\n  Most operations could be executed into multiple threads or even machines. See the principle in [Onyx](http://www.onyxplatform.org/).\n\n- **Lazy Operations**\n\n  Most operations will not be executed immediately. Dataframe will intelligently pipeline the operations altogether in computation.\n\n- **Little Constraints on programming**\n\n  Except for some aggregations where you need to write customized functions subject to simple templates, operations in Clojask support arbitrary Clojure functions as input\n\n### Installation\n\nAvailable on [Clojars](https://clojars.org/com.github.clojure-finance/clojask) ![Clojars Project](https://img.shields.io/clojars/v/com.github.clojure-finance/clojask.svg).\n\nInsert this line into your `project.clj` if using Leiningen.\n\n```\n[com.github.clojure-finance/clojask \"2.0.0\"]\n```\n\nInsert this line into your `deps.edn` if using CLI.\n\n```clojure\ncom.github.clojure-finance/clojask {:mvn/version \"2.0.0\"}\n```\n\n**Requirements:**\n\n- MacOS or Linux\n- Java 8 - 11\n\n### Example Usage\n\n1. Import `Clojask`\n\n   ```clojure\n   (require '[clojask.dataframe :as ck])\n   ```\n\n2. Initialize a dataframe\n\n   ```clojure\n   (def df (ck/dataframe \"Employees-example.csv\"))\n   ```\n\n   The source file can be found [here](https://github.com/clojure-finance/clojask/blob/1.x.x/test/clojask/Employees-example.csv).\n\n   See [`dataframe`](https://clojure-finance.github.io/clojask-website/posts-output/API/#dataframe)\n\n3. Preview the first few lines of the dataframe\n\n   ```clojure\n   (ck/print-df df)\n   ```\n\n   ![image-20220405210757274](docs/img/image-20220405210757274.png)\n\n   See [`print-df`](https://clojure-finance.github.io/clojask-website/posts-output/API/#print-df)\n\n4. Change the data type of some columns\n\n   ```clojure\n   (ck/set-type df \"Salary\" \"double\")\n   (ck/set-type df \"UpdateDate\" \"date:yyyy/MM/dd\")\n   (ck/print-df df)\n   ```\n\n   ![image-20220405210826777](docs/img/image-20220405210826777.png)\n\n   See [`set-type`](https://clojure-finance.github.io/clojask-website/posts-output/API/#set-type)\n\n5. Add 100 to Bob as `NewSalary`\n\n   ```clojure\n   (ck/operate df (fn [EmployeeName Salary] (if (= EmployeeName \"Bob\") (+ Salary 100) Salary)) [\"EmployeeName\" \"Salary\"] \"NewSalary\")\n   (ck/print-df df)\n   ```\n\n   ![image-20220405211348723](docs/img/image-20220405211348723.png)\n\n   See [`operate`](https://clojure-finance.github.io/clojask-website/posts-output/API/#operate-in-place-modification)\n\n6. Output the resultant dataset to \"result.csv\" (Use 8 threads)\n\n   ```clojure\n   (ck/compute df 8 \"result.csv\" :select [\"Employee\" \"EmployeeName\" \"Department\" \"NewSalary\" \"UpdateDate\"])\n   ```\n\n   See [`compute`](https://clojure-finance.github.io/clojask-website/posts-output/API/#compute)\n\n### Supported Functions and Procedures\n\n![clojask functions](docs/clojask_functions.png)\n\n- *The solid arrows point to the fixed next step; dotted arrows point to all possible next steps.*\n- *Any step except for Initialization is optional.*\n\n### Documentation\n\nThe detailed documentation for every API can be found [here](https://clojure-finance.github.io/clojask-website/posts-output/API/).\n\n### Examples\n\nA separate repository for some typical usage of Clojask can be found [here](https://github.com/clojure-finance/clojask-examples).\n\n### Problem Feedback\n\nIf your question is not answered in existing [issues](https://github.com/clojure-finance/clojask/issues), feel free to create a new one.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclojure-finance%2Fclojask","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclojure-finance%2Fclojask","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclojure-finance%2Fclojask/lists"}