{"id":47829318,"url":"https://github.com/basf/rormula","last_synced_at":"2026-04-03T20:07:16.206Z","repository":{"id":156537030,"uuid":"630948399","full_name":"basf/rormula","owner":"basf","description":"Formula parser and evaluator for Wilkinson Notation and dataframes arithmetics","archived":false,"fork":false,"pushed_at":"2025-12-24T12:34:20.000Z","size":140,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-12-26T02:02:37.191Z","etag":null,"topics":["doe","experimental-design","parser","wilkinson"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-21T14:31:58.000Z","updated_at":"2025-12-24T12:33:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"aa78a4ef-eaf7-4149-8e9c-5b1816a39dcb","html_url":"https://github.com/basf/rormula","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/basf/rormula","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basf%2Frormula","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basf%2Frormula/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basf%2Frormula/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basf%2Frormula/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basf","download_url":"https://codeload.github.com/basf/rormula/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basf%2Frormula/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31374101,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doe","experimental-design","parser","wilkinson"],"created_at":"2026-04-03T20:07:15.787Z","updated_at":"2026-04-03T20:07:16.197Z","avatar_url":"https://github.com/basf.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rormula\n\n[![Test](https://github.com/basf/rormula/actions/workflows/test.yml/badge.svg)](https://github.com/basf/rormula/actions)\n[![PyPI](https://img.shields.io/pypi/v/rormula.svg?color=%2334D058)](https://pypi.org/project/rormula)\n\nRormula is a Python package that parses the Wilkinson notation to create model matrices useful in design of experiments. \nAdditionally, it can be used for column arithmetics similar to\n`df.eval` where `df` is a Pandas dataframe. Rormula is significantly faster for small matrices than `df.eval` or [Formulaic](https://github.com/matthewwardrop/formulaic)\nand still a not well tested prototype.\n\n\n\n## Getting Started with Wilkinson Notation \n\n```\npip install rormula\n```\nCurrently, the supported operations are `+`, `:`, and `^`. We can add new operators easily but we have to do\nthis explicitly. There\nare different options how to receive results and provide inputs.\nThe result can either be a Pandas dataframe or a list of names and a Numpy array.\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom rormula import Wilkinson, SeparatedData\ndata_np = np.random.random((10, 2))\ndata = pd.DataFrame(data=data_np, columns=[\"a\", \"b\"])\nror = Wilkinson(\"a+b+a:b\")\n\n# option 1 returns the model matrix as pandas dataframe\nmm_df = ror.eval_asdf(data)\nassert isinstance(mm_df, pd.DataFrame)\nprint(mm_df)\n\n# option 2 is faster\nmm_names, mm = ror.eval(data)\nassert isinstance(mm, np.ndarray)\nassert isinstance(mm_names, list)\n```\n\nRegarding inputs, the fastest option is to use the interface with separated categorical and numerical data, even if there is no categorical data. \nThe categorical data is expected to have the object-`dtype` `O`. \nAdmittedly, the current interface is rather tedious.\n\n```python\ndata = pd.DataFrame(\n   data=np.random.random((100, 3)),\n   columns=[\"alpha\", \"beta\", \"gamma\"],\n)\nseparated_data = SeparatedData(\n   numerical_cols=data.columns.to_list(),\n   numerical_data=data.to_numpy(),\n   categorical_cols=[],\n   categorical_data=np.zeros((100, 0), dtype=\"O\"),\n)\nror = Wilkinson(\"alpha + beta + alpha:gamma\")\nnames, mm = ror.eval(separated_data)\nassert names == [\"Intercept\", \"alpha\", \"beta\", \"alpha:gamma\"]\nassert mm.shape == (100, 4)\n```\n\n## Getting Started with Columns Arithmetics\n\nYou can calculate with columns of a Pandas dataframes.\n```python\nimport numpy as np\nimport pandas as pd\nfrom rormula import Arithmetic\n\ndf = pd.DataFrame(\n   data=np.random.random((100, 3)), columns=[\"alpha\", \"beta\", \"gamma\"]\n)\ns = \"beta*alpha - 1 + 2^beta + alpha / gamma\"\nrormula = Arithmetic(s, \"s\")\ndf_ror = rormula.eval_asdf(df.copy())\npd_s = f's={s.replace(\"^\", \"**\")}'\nassert df_ror.shape == (100, 4)\nassert np.allclose(df_ror, df.eval(pd_s))\n```\nTo evaluate a string as data frame there is\n`Arithmetic.eval_asdf` which puts the result into your input dataframe.\n`Arithmetic.eval` returns the column as 2d-Numpy array with 1 column. In contrast to\n`pd.DataFrame.eval` the method `Arithmetic.eval` does not execute any Python code but understands\na list of predefined operators. Besides the usual suspects such as `+`, `-`, and `^` the operators contain\na conditioned restriction. You can use a comparison operator like `==` which compares float values with\na tolerance. The result of `==` is internally a list of indices that can be used to reduce the columns with `|`, see\nthe following example. \n```python\ndata = np.ones((100, 3))\ndata[5, :] = 2.5\ndata[7, :] = 2.5\ndf = pd.DataFrame(data=data, columns=[\"alpha\", \"beta\", \"gamma\"])\ns = \"beta|alpha==2.5\"\nrormula = Arithmetic(s, s)\nres = rormula.eval_asdf(df)\nassert res.shape == (2, 1)\nassert np.allclose(res, 2.5)\nprint(res)\n```\nThe output is\n```\n   reduced\n0      2.5\n1      2.5\n```\nSince the resulting dataframe has less rows than the input dataframe, the result is a new dataframe with a single column.\n\n## Contribute\n\nTo run the tests, you need to have [Rust](https://www.rust-lang.org/tools/install) installed. \n\n### Python Tests\n\n1. Go to the directory of the Python package\n   ```\n   cd rormula\n   ```\n2. Install dev dependencies via\n   ```\n   pip install -r requirements-dev.txt\n   ```\n3. Create a development build of Rormula\n   ```\n   maturin develop --release\n   ```\n4. Run \n   ```\n   python test/test.py\n   ```\n\n### Rust Tests\nRun\n```\ncargo test\n```\nfrom the project's root.\n\n## Rough Time Measurements\nWe compare the Rormula to the well-established and way more mature package [Formulaic](https://github.com/matthewwardrop/formulaic).\nThe [tests](rormula/test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is \n```\n- test just numerical 100 rows\nRormula took 0.0009s\nRormula asdf took 0.0213s\nFormulaic took 0.1193s\n- test numerical and categorical 100 rows\nRormula took 0.0032s\nRormula asdf took 0.0149s\nFormulaic took 0.1705s\n- test just numerical 100000 rows\nRormula took 0.2240s\nRormula asdf took 0.2895s\nFormulaic took 0.2300s\n```\nFor the first and forth lines that start with `Rormula took`, we have separated categorical and numerical data beforehand. \nFor the result in the second and fifth lines that start with `Rormula asdf took`, we pass and receive pandas dataframes.\nThe time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better.\n\n## Profiling\nWe use [Counts](https://github.com/nnethercote/counts/) for profiling Rust code.\n\nTo run profiling one can use\n```\nmaturin develop --release --features print_timings\npython test/test_wilkinson.py 2\u003e counts.txt\ncounts -i -e counts.txt\n```\nsee [`rormula/profile.sh`](rormula/profile.sh).\nTo profile other specific parts of the Rust-code use the `timing!`-macro.\n```rust\nlet res = timing!(some_calculation(), \"name of some calculation\");\n```\nNote that running in profiling mode makes the whole program slower and the time measurements of the section above will not hold anymore.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasf%2Frormula","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasf%2Frormula","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasf%2Frormula/lists"}