{"id":13600698,"url":"https://github.com/maxhumber/redframes","last_synced_at":"2025-04-05T16:03:44.534Z","repository":{"id":59430090,"uuid":"527230323","full_name":"maxhumber/redframes","owner":"maxhumber","description":"General Purpose Data Manipulation Library","archived":false,"fork":false,"pushed_at":"2023-03-17T22:29:02.000Z","size":1932,"stargazers_count":321,"open_issues_count":3,"forks_count":5,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-29T15:09:51.838Z","etag":null,"topics":["data-science","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxhumber.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":["maxhumber"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2022-08-21T14:28:24.000Z","updated_at":"2025-03-24T17:45:03.000Z","dependencies_parsed_at":"2024-01-16T23:26:43.390Z","dependency_job_id":"fa52cbd1-93c0-45d3-aa15-efc353a3a579","html_url":"https://github.com/maxhumber/redframes","commit_stats":{"total_commits":143,"total_committers":2,"mean_commits":71.5,"dds":0.034965034965035,"last_synced_commit":"6e3f1226358ad4e67f4343cbc4b1ee4b63475034"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fredframes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fredframes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fredframes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fredframes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxhumber","download_url":"https://codeload.github.com/maxhumber/redframes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247361614,"owners_count":20926642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","pandas","python"],"created_at":"2024-08-01T18:00:46.848Z","updated_at":"2025-04-05T16:03:44.491Z","avatar_url":"https://github.com/maxhumber.png","language":"Python","funding_links":["https://github.com/sponsors/maxhumber"],"categories":["Libraries"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg alt=\"redframes\" src=\"images/redframes.png\" height=\"200px\"\u003e\n  \u003cbr/\u003e\n  \u003cdiv align=\"center\"\u003e\n     \u003ca href=\"https://pandas.pydata.org/\"\u003e\u003cimg alt=\"Pandas Version\" src=\"https://img.shields.io/badge/pandas-≥1.5,\u003c3.0-blue\"\u003e\u003c/a\u003e  \n    \u003ca href=\"https://pypi.python.org/pypi/redframes\"\u003e\u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/redframes.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/redframes\"\u003e\u003cimg alt=\"Downloads\" src=\"https://pepy.tech/badge/redframes\"\u003e\u003c/a\u003e\n  \u003c/div\u003e\n  \u003cbr/\u003e\n\u003c/div\u003e\n\n\n\n### About\n\n**redframes** (**re**ctangular **d**ata **frames**) is a general purpose data manipulation library that prioritizes syntax,  simplicity, and speed (to a solution). Importantly, the library is fully interoperable with [pandas](https://github.com/pandas-dev/pandas), compatible with [scikit-learn](https://github.com/scikit-learn/scikit-learn), and works great with [matplotlib](https://github.com/matplotlib/matplotlib). \n\n\n\n### Install \u0026 Import\n\n```sh\npip install redframes\n```\n\n```python\nimport redframes as rf\n```\n\n\n\n### Quickstart\n\nCopy-and-paste this to get started:\n\n```python\nimport redframes as rf\n\ndf = rf.DataFrame({\n    'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],\n    'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],\n    'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],\n    'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']\n})\n\n# | bear                | genus      | weight (male, lbs)   | weight (female, lbs)   |\n# |:--------------------|:-----------|:---------------------|:-----------------------|\n# | Brown bear          | Ursus      | 300-860              | 205-455                |\n# | Polar bear          | Ursus      | 880-1320             | 330-550                |\n# | Asian black bear    | Ursus      | 220-440              | 110-275                |\n# | American black bear | Ursus      | 125-500              | 90-300                 |\n# | Sun bear            | Helarctos  | 60-150               | 45-90                  |\n# | Sloth bear          | Melursus   | 175-310              | 120-210                |\n# | Spectacled bear     | Tremarctos | 220-340              | 140-180                |\n# | Giant panda         | Ailuropoda | 190-275              | 155-220                |\n\n(\n    df\n        .rename({\"weight (male, lbs)\": \"male\", \"weight (female, lbs)\": \"female\"})\n        .gather([\"male\", \"female\"], into=(\"sex\", \"weight\"))\n        .split(\"weight\", into=[\"min\", \"max\"], sep=\"-\")\n        .gather([\"min\", \"max\"], into=(\"stat\", \"weight\"))\n        .mutate({\"weight\": lambda row: float(row[\"weight\"])})\n        .group([\"genus\", \"sex\"])\n        .rollup({\"weight\": (\"weight\", rf.stat.mean)})\n        .spread(\"sex\", using=\"weight\")\n        .mutate({\"dimorphism\": lambda row: round(row[\"male\"] / row[\"female\"], 2)})\n        .drop([\"male\", \"female\"])\n        .sort(\"dimorphism\", descending=True)\n)\n\n# | genus      |   dimorphism |\n# |:-----------|-------------:|\n# | Ursus      |         2.01 |\n# | Tremarctos |         1.75 |\n# | Helarctos  |         1.56 |\n# | Melursus   |         1.47 |\n# | Ailuropoda |         1.24 |\n```\n\n\n\nFor comparison, here's the equivalent pandas:\n\n```python\nimport pandas as pd\n\n# df = pd.DataFrame({...})\n\ndf = df.rename(columns={\"weight (male, lbs)\": \"male\", \"weight (female, lbs)\": \"female\"})\ndf = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')\ndf[[\"min\", \"max\"]] = df[\"weight\"].str.split(\"-\", expand=True)\ndf = df.drop(\"weight\", axis=1)\ndf = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')\ndf['weight'] = df[\"weight\"].astype('float')\ndf = df.groupby([\"genus\", \"sex\"])[\"weight\"].mean()\ndf = df.reset_index()\ndf = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')\ndf = df.reset_index()\ndf = df.rename_axis(None, axis=1)\ndf[\"dimorphism\"] = round(df[\"male\"] / df[\"female\"], 2)\ndf = df.drop([\"female\", \"male\"], axis=1)\ndf = df.sort_values(\"dimorphism\", ascending=False)\ndf = df.reset_index(drop=True)\n\n# 🤮\n```\n\n\n\n### IO\n\nSave, load, and convert `rf.DataFrame` objects:\n\n```python\n# save .csv\nrf.save(df, \"bears.csv\")\n\n# load .csv\ndf = rf.load(\"bears.csv\")\n\n# convert redframes → pandas\npandas_df = rf.unwrap(df)\n\n# convert pandas → redframes\ndf = rf.wrap(pandas_df)\n```\n\n\n\n### Verbs\n\nVerbs are [pure](https://en.wikipedia.org/wiki/Pure_function) and \"chain-able\" methods that manipulate `rf.DataFrame` objects. Here is the complete list (see *docstrings* for examples and more details):\n\n| Verb                                             | Description                                                  |\n| ------------------------------------------------ | ------------------------------------------------------------ |\n| `accumulate`\u003csup\u003e‡\u003c/sup\u003e                         | Run a cumulative sum over a column                           |\n| `append`                                         | Append rows from another DataFrame                           |\n| `combine`                                        | Combine multiple columns into a single column (opposite of `split`) |\n| `cross`                                          | Cross join columns from another DataFrame                    |\n| `dedupe`                                         | Remove duplicate rows                                        |\n| [`denix`](https://www.dictionary.com/browse/nix) | Remove rows with missing values                              |\n| `drop`                                           | Drop entire columns (opposite of `select`)                   |\n| `fill`                                           | Fill missing values \"down\", \"up\", or with a constant         |\n| `filter`                                         | Keep rows matching specific conditions                       |\n| `gather`\u003csup\u003e‡\u003c/sup\u003e                             | Gather columns into rows (opposite of `spread`)              |\n| `group`                                          | Prepare groups for compatible verbs\u003csup\u003e‡\u003c/sup\u003e              |\n| `join`                                           | Join columns from another DataFrame                          |\n| `mutate`                                         | Create a new, or overwrite an existing column                |\n| `pack`\u003csup\u003e‡\u003c/sup\u003e                               | Collate and concatenate row values for a target column (opposite of `unpack`) |\n| `rank`\u003csup\u003e‡\u003c/sup\u003e                               | Rank order values in a column                                |\n| `rename`                                         | Rename column keys                                           |\n| `replace`                                        | Replace matching values within columns                       |\n| `rollup`\u003csup\u003e‡\u003c/sup\u003e                             | Apply summary functions and/or statistics to target columns  |\n| `sample`                                         | Randomly sample any number of rows                           |\n| `select`                                         | Select specific columns (opposite of `drop`)                 |\n| `shuffle`                                        | Shuffle the order of all rows                                |\n| `sort`                                           | Sort rows by specific columns                                |\n| `split`                                          | Split a single column into multiple columns (opposite of `combine`) |\n| `spread`                                         | Spread rows into columns (opposite of `gather`)              |\n| `take`\u003csup\u003e‡\u003c/sup\u003e                               | Take any number of rows (from the top/bottom)                |\n| `unpack`                                         | \"Explode\" concatenated row values into multiple rows (opposite of `pack`) |\n\n\n\n### Properties\n\nIn addition to all of the verbs there are several properties attached to each `DataFrame` object:\n\n```python\ndf[\"genus\"] \n# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']\n\ndf.columns \n# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']\n\ndf.dimensions\n# {'rows': 8, 'columns': 4}\n\ndf.empty\n# False\n\ndf.memory\n# '2 KB'\n\ndf.types\n# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}\n```\n\n\n\n### matplotlib\n\n`rf.DataFrame` objects integrate seamlessly with `matplotlib`:\n\n```python\nimport redframes as rf\nimport matplotlib.pyplot as plt\n\nfootball = rf.DataFrame({\n    'position': ['TE', 'K', 'RB', 'WR', 'QB'],\n    'avp': [116.98, 131.15, 180, 222.22, 272.91]\n})\n\ndf = (\n    football\n        .mutate({\"color\": lambda row: row[\"position\"] in [\"WR\", \"RB\"]})\n        .replace({\"color\": {False: \"orange\", True: \"red\"}})\n)\n\nplt.barh(df[\"position\"], df[\"avp\"], color=df[\"color\"]);\n```\n\n\u003cimg alt=\"redframes\" src=\"images/bars.png\" height=\"200px\"\u003e\n\n\n\n### scikit-learn\n\n`rf.DataFrame` objects are fully compatible with `sklearn` functions, estimators, and transformers:\n\n```python\nimport redframes as rf\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\n\ndf = rf.DataFrame({\n    \"touchdowns\": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],\n    \"age\": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],\n    \"mvp\": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]\n})\n\ntarget = \"touchdowns\"\ny = df[target]\nX = df.drop(target)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)\n\nmodel = LinearRegression()\nmodel.fit(X_train, y_train)\nmodel.score(X_test, y_test)\n# 0.5083194901655527\n\nprint(X_train.take(1))\n# rf.DataFrame({'age': [21], 'mvp': [0]})\n\nX_new = rf.DataFrame({'age': [22], 'mvp': [1]})\nmodel.predict(X_new)\n# array([19.])\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhumber%2Fredframes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxhumber%2Fredframes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhumber%2Fredframes/lists"}