{"id":15020445,"url":"https://github.com/milesgranger/lumber-jack","last_synced_at":"2025-10-26T00:31:09.612Z","repository":{"id":62577128,"uuid":"107002769","full_name":"milesgranger/lumber-jack","owner":"milesgranger","description":"High performance \u0026 light weight alternative to Pandas with ML focused tooling. (Work in progress x100!)","archived":false,"fork":false,"pushed_at":"2018-08-18T14:34:12.000Z","size":189,"stargazers_count":5,"open_issues_count":6,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-30T02:58:32.534Z","etag":null,"topics":["cython","data-mining","dataframe","datamanipulation","datascience","numpy","pandas","python","python3","rust","series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/milesgranger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-15T10:32:30.000Z","updated_at":"2024-06-04T13:45:15.000Z","dependencies_parsed_at":"2022-11-03T19:05:36.014Z","dependency_job_id":null,"html_url":"https://github.com/milesgranger/lumber-jack","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milesgranger%2Flumber-jack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milesgranger%2Flumber-jack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milesgranger%2Flumber-jack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milesgranger%2Flumber-jack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/milesgranger","download_url":"https://codeload.github.com/milesgranger/lumber-jack/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238235622,"owners_count":19438725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","data-mining","dataframe","datamanipulation","datascience","numpy","pandas","python","python3","rust","series"],"created_at":"2024-09-24T19:55:05.946Z","updated_at":"2025-10-26T00:31:09.278Z","avatar_url":"https://github.com/milesgranger.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](logo.png)\n\n---\n\n[![Build Status](https://travis-ci.org/milesgranger/lumber-jack.svg?branch=master)](https://travis-ci.org/milesgranger/lumber-jack)\n\n---\n\nFirst and foremost: This is a project I'm using to refine my craft utilizing Python, Cython \nand Rust together in a high-performance and efficient fashion. Aimed at replacing the core analysis \nfunctionality found in Pandas; only _really_ fast and memory efficient. \n\n\n*This package is in Alpha and in no-way can you expect this to be functional or reliable.*\n\n### Project outlook *(...the long story)*:\n\nThe _(long term)_ goal for this project is to provide a light-weight alternative to\nthe fantastic `pandas`. I love and use pandas all the time, so this is what has \ninspired me to making something similar; but excelling in a few areas such as minimizing\nmemory footprint and speeding up certain computations via Rust's speed and safety \nguarantees.  \n\nThis project shall have the same concept of `dataframe` \u0026 `series`; these objects \nwill be stored as Rust structures. When \"displayed\" in Python\nit will merely be some meta-data description of the vector. Most computations will\ntake place there, those which can't or perhaps better implemented in numpy will continue \nto be done in numpy (via pointer transferals, thus free of copies).\n\nThe point is, I'm striving to practice efficiency, parallelism, safety \u0026 speed with this \nproject while maintaining some of the most valuable functionality of pandas. \n\n---\n\n### Install checklist\n\n- [Rustup](https://rustup.rs/)\n    - `rustup install nightly`\n    - `rustup default nightly`\n- gcc \u003e= 7.x.x \n- g++ \u003e= 7.x.x\n\n\n**NOTE** Only Python 3.5 is being tested against on Unix platforms\n\n\n#####Installing from command line:\n```commandline\n# Clone repo:\ngit clone https://github.com/milesgranger/lumber-jack.git \u0026\u0026 cd lumber-jack\n\n# Run tests\nLD_LIBRARY_PATH=$(pwd)/lumberjack/rust:$LD_LIBRARY_PATH python setup.py test\n\n# Install\npython setup.py build_ext \u0026\u0026 python setup.py install\n\n# Uninstall\npip uninstall lumber-jack\n```\n---\n\n### Working Pandas \u0026 Numpy like operations:\n\n##### There isn't much, but check back soon! ;)\n```python\nimport lumberjack as lj\n\n# lj.Series is a drop-in replacement for pandas.Series, overriding these methods:\nseries = lj.Series.arange(0, 10000)  # ~8x  faster than numpy\nseries.sum()                         # ~40x faster than pandas \u0026 ~3x  faster than numpy\nseries.cumsum()                      # ~4x  faster than pandas \u0026 ~0x  faster than numpy\nseries.mean()                        # ~98x faster than pandas \u0026 ~20x faster than numpy\nseries * 2                           # ~8x  faster than pandas \u0026 ~0x  faster than numpy\nseries *= 2                          # ~10x faster than pandas \u0026 ~0x  faster than numpy\nseries + 2                           # ....\nseries += 2                          # ....\n\n```\n\n### Working Alteration Tools:\n\nSplit arrays of text which are separated by \u003c something \u003e into a one-hot encoded ndarray\n\n```python\nfrom lumberjack import alterations\nraw_texts = ['hello, there', 'hi, there']\nalterations.split_n_one_hot_encode(raw_texts, sep=',', cutoff=0)\n(['hello', 'there', 'hi'], [[1, 1, 0], [0, 1, 1]])\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmilesgranger%2Flumber-jack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmilesgranger%2Flumber-jack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmilesgranger%2Flumber-jack/lists"}