{"id":13474453,"url":"https://github.com/hi-primus/optimus","last_synced_at":"2025-05-14T00:08:38.700Z","repository":{"id":37557941,"uuid":"97071697","full_name":"hi-primus/optimus","owner":"hi-primus","description":":truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark","archived":false,"fork":false,"pushed_at":"2024-12-02T14:09:25.000Z","size":115279,"stargazers_count":1508,"open_issues_count":29,"forks_count":232,"subscribers_count":36,"default_branch":"develop","last_synced_at":"2025-05-10T15:17:36.575Z","etag":null,"topics":["big-data-cleaning","bigdata","cudf","dask","dask-cudf","data-analysis","data-cleaner","data-cleaning","data-cleansing","data-exploration","data-extraction","data-preparation","data-profiling","data-science","data-transformation","data-wrangling","machine-learning","pyspark","spark"],"latest_commit_sha":null,"homepage":"https://hi-optimus.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hi-primus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-13T02:31:18.000Z","updated_at":"2025-05-04T13:53:19.000Z","dependencies_parsed_at":"2023-10-02T17:45:20.838Z","dependency_job_id":"a594265e-a472-4371-8e5e-256fce9b2e88","html_url":"https://github.com/hi-primus/optimus","commit_stats":{"total_commits":5689,"total_committers":25,"mean_commits":227.56,"dds":"0.42467920548426785","last_synced_commit":"cb73842d5662f9781bacb50afd24b94cfb586b95"},"previous_names":["ironmussa/optimus"],"tags_count":144,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hi-primus%2Foptimus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hi-primus%2Foptimus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hi-primus%2Foptimus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hi-primus%2Foptimus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hi-primus","download_url":"https://codeload.github.com/hi-primus/optimus/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253592907,"owners_count":21932902,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data-cleaning","bigdata","cudf","dask","dask-cudf","data-analysis","data-cleaner","data-cleaning","data-cleansing","data-exploration","data-extraction","data-preparation","data-profiling","data-science","data-transformation","data-wrangling","machine-learning","pyspark","spark"],"created_at":"2024-07-31T16:01:12.455Z","updated_at":"2025-05-14T00:08:38.638Z","avatar_url":"https://github.com/hi-primus.png","language":"Python","funding_links":["https://opencollective.com/optimus"],"categories":["The Data Science Toolbox","Data Analysis","Python","Deep Learning Framework","数据分析","数据管道和流处理","科学计算和数据分析"],"sub_categories":["Miscellaneous Tools","Productivity","Deployment \u0026 Distribution","Drone Frames"],"readme":"# Optimus\n\n[![Logo Optimus](https://raw.githubusercontent.com/hi-primus/optimus/develop-23.5/images/optimus-logo.png)](https://hi-optimus.com)\n\n[![Tests](https://github.com/hi-primus/optimus/actions/workflows/main.yml/badge.svg)](https://github.com/hi-primus/optimus/actions/workflows/main.yml)\n[![Docker image updated](https://github.com/hi-primus/optimus/actions/workflows/docker.yml/badge.svg)](https://hub.docker.com/r/hiprimus/optimus)\n[![PyPI Latest Release](https://img.shields.io/pypi/v/pyoptimus.svg)](https://pypi.org/project/pyoptimus/) \n[![GitHub release](https://img.shields.io/github/release/hi-primus/optimus.svg?include_prereleases)](https://github.com/hi-primus/optimus/releases)\n[![CalVer](https://img.shields.io/badge/calver-YY.MM.MICRO-22bfda.svg)](http://calver.org)\n\n[![Downloads](https://pepy.tech/badge/pyoptimus)](https://pepy.tech/project/pyoptimus)\n[![Downloads](https://pepy.tech/badge/pyoptimus/month)](https://pepy.tech/project/pyoptimus/month)\n[![Downloads](https://pepy.tech/badge/pyoptimus/week)](https://pepy.tech/project/pyoptimus/week)\n[![Mentioned in Awesome Data Science](https://awesome.re/mentioned-badge.svg)](https://github.com/bulutyazilim/awesome-datascience) \n[![Slack](https://img.shields.io/badge/chat-slack-red.svg?logo=slack\u0026color=36c5f0)](https://communityinviter.com/apps/hi-bumblebee/welcome)\n\n# Overview\n\nOptimus is an opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex or Spark. \n\nSome amazing things Optimus can do for you:\n* Process using a simple API, making it easy to use for newcomers.\n* More than 100 functions to handle strings, process dates, urls and emails.\n* Easily plot data from any size.\n* Out of box functions to explore and fix data quality. \n* Use the same code to process your data in your laptop or in a remote cluster of GPUs.\n\n[See Documentation](https://docs.hi-optimus.com/en/latest/)\n\n## Try Optimus\nTo launch a live notebook server to test optimus using binder or Colab, click on one of the following badges:\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hi-primus/optimus/develop-23.5?filepath=https%3A%2F%2Fraw.githubusercontent.com%2Fhi-primus%2Foptimus%2Fdevelop-23.5%2Fexamples%2F10_min_to_optimus.ipynb)\n[![Colab](https://img.shields.io/badge/launch-colab-yellow.svg?logo=googlecolab\u0026color=e6a210)](https://colab.research.google.com/github/hi-primus/optimus/blob/master/examples/10_min_to_optimus_colab.ipynb)\n\n## Installation (pip): \nIn your terminal just type:\n```\npip install pyoptimus\n```\n\nBy default Optimus install Pandas as the default engine, to install other engines you can use the following commands:\n\n| Engine    | Command                                |\n|-----------|----------------------------------------|\n| Dask      | ```pip install pyoptimus[dask]```      |\n| cuDF      | ```pip install pyoptimus[cudf]```      |\n| Dask-cuDF | ```pip install pyoptimus[dask-cudf]``` |\n| Vaex      | ```pip install pyoptimus[vaex]```      |\n| Spark     | ```pip install pyoptimus[spark]```     |\n\nTo install from the repo: \n```\npip install git+https://github.com/hi-primus/optimus.git@develop-23.5\n```\n\nTo install other engines: \n```\npip install git+https://github.com/hi-primus/optimus.git@develop-23.5#egg=pyoptimus[dask]\n```\n\n\n\n### Requirements\n* Python 3.7 or 3.8\n\n## Examples\n\nYou can go to [10 minutes to Optimus](https://github.com/hi-primus/optimus/blob/develop-23.5/examples/10_min_to_optimus.ipynb) where you can find the basics to start working in a notebook.\n\nAlso you can go to the [Examples](https://github.com/hi-primus/optimus/tree/develop-23.5/examples/examples.md) section and find specific notebooks about data cleaning, data munging, profiling, data enrichment and how to create ML and DL models.\n\nHere's a handy [Cheat Sheet](https://htmlpreview.github.io/?https://github.com/hi-primus/optimus/blob/develop-23.5/docs/cheatsheet/optimus_cheat_sheet.html) with the most common Optimus' operations.\n\n## Start Optimus\n\nStart Optimus using ```\"pandas\"```, ```\"dask\"```, ```\"cudf\"```,```\"dask_cudf\"```,```\"vaex\"``` or ```\"spark\"```.\n\n```python\nfrom optimus import Optimus\nop = Optimus(\"pandas\")\n```\n\n## Loading data\n\nNow Optimus can load data in csv, json, parquet, avro and excel formats from a local file or from a URL.\n\n```python\n#csv\ndf = op.load.csv(\"../examples/data/foo.csv\")\n\n#json\ndf = op.load.json(\"../examples/data/foo.json\")\n\n# using a url\ndf = op.load.json(\"https://raw.githubusercontent.com/hi-primus/optimus/develop-23.5/examples/data/foo.json\")\n\n# parquet\ndf = op.load.parquet(\"../examples/data/foo.parquet\")\n\n# ...or anything else\ndf = op.load.file(\"../examples/data/titanic3.xls\")\n```\n\nAlso, you can load data from Oracle, Redshift, MySQL and Postgres databases.\n\n## Saving Data\n\n```python\n#csv\ndf.save.csv(\"data/foo.csv\")\n\n# json\ndf.save.json(\"data/foo.json\")\n\n# parquet\ndf.save.parquet(\"data/foo.parquet\")\n```\n\nYou can also save data to oracle, redshift, mysql and postgres.\n\n## Create dataframes\n\nAlso, you can create a dataframe from scratch\n```python\ndf = op.create.dataframe({\n    'A': ['a', 'b', 'c', 'd'],\n    'B': [1, 3, 5, 7],\n    'C': [2, 4, 6, None],\n    'D': ['1980/04/10', '1980/04/10', '1980/04/10', '1980/04/10']\n})\n```\n\nUsing `display` you have a beautiful way to show your data with extra information like column number, column data type and marked white spaces.\n\n```python\ndisplay(df)\n```\n![](https://github.com/hi-primus/optimus/tree/develop-23.5/readme/images/table.png)\n\n## Cleaning and Processing\n \nOptimus was created to make data cleaning a breeze. The API was designed to be super easy to newcomers and very familiar for people that comes from Pandas.\nOptimus expands the standard DataFrame functionality adding `.rows` and `.cols` accessors.\n\nFor example you can load data from a url, transform and apply some predefined cleaning functions:\n\n```python\nnew_df = df\\\n    .rows.sort(\"rank\", \"desc\")\\\n    .cols.lower([\"names\", \"function\"])\\\n    .cols.date_format(\"date arrival\", \"yyyy/MM/dd\", \"dd-MM-YYYY\")\\\n    .cols.years_between(\"date arrival\", \"dd-MM-YYYY\", output_cols=\"from arrival\")\\\n    .cols.normalize_chars(\"names\")\\\n    .cols.remove_special_chars(\"names\")\\\n    .rows.drop(df[\"rank\"]\u003e8)\\\n    .cols.rename(\"*\", str.lower)\\\n    .cols.trim(\"*\")\\\n    .cols.unnest(\"japanese name\", output_cols=\"other names\")\\\n    .cols.unnest(\"last position seen\", separator=\",\", output_cols=\"pos\")\\\n    .cols.drop([\"last position seen\", \"japanese name\", \"date arrival\", \"cybertronian\", \"nulltype\"])\n```\n\n# Need help? 🛠️\n\n## Feedback\n\nFeedback is what drive Optimus future, so please take a couple of minutes to help shape the Optimus' Roadmap:  http://bit.ly/optimus_survey \n\nAlso if you want to a suggestion or feature request use https://github.com/hi-primus/optimus/issues\n\n## Troubleshooting\n\nIf you have issues, see our [Troubleshooting Guide](https://github.com/hi-primus/optimus/tree/develop-23.5/troubleshooting.md)\n\n# Contributing to Optimus 💡\n\nContributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions  \nincluding: \n \n* [Documentation](https://docs.hi-optimus.com/en/latest/) updates, enhancements, designs, or bugfixes. \n* Spelling or grammar fixes. \n* README.md corrections or redesigns. \n* Adding unit, or functional [tests](https://github.com/hi-primus/optimus/tree/develop-23.5/tests)  \n* Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.\n* [Blogging, speaking about, or creating tutorials](https://hioptimus.com/category/blog/) about Optimus and its many features. \n* Helping others on our official chats\n \n# Backers and Sponsors\n\nBecome a [backer](https://opencollective.com/optimus#backer) or a [sponsor](https://opencollective.com/optimus#sponsor) and get your image on our README on Github with a link to your site. \n\n[![OpenCollective](https://opencollective.com/optimus/backers/badge.svg)](#backers) [![OpenCollective](https://opencollective.com/optimus/sponsors/badge.svg)](#sponsors)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhi-primus%2Foptimus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhi-primus%2Foptimus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhi-primus%2Foptimus/lists"}