{"id":15287734,"url":"https://github.com/oneoffcoder/pyspark-formula","last_synced_at":"2025-04-13T06:04:44.850Z","repository":{"id":55919410,"uuid":"319143478","full_name":"oneoffcoder/pyspark-formula","owner":"oneoffcoder","description":"R-like formula approach to Spark Dataframes","archived":false,"fork":false,"pushed_at":"2020-12-10T17:12:40.000Z","size":171,"stargazers_count":10,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T06:03:57.380Z","etag":null,"topics":["classification","clustering","dataframes","interaction-design","patsy","pyspark","regression","rlike-formulas","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oneoffcoder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"vangj","patreon":"vangj","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":"https://oneoffcoder.com/"}},"created_at":"2020-12-06T22:19:36.000Z","updated_at":"2022-02-12T03:27:41.000Z","dependencies_parsed_at":"2022-08-15T09:30:38.597Z","dependency_job_id":null,"html_url":"https://github.com/oneoffcoder/pyspark-formula","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneoffcoder%2Fpyspark-formula","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneoffcoder%2Fpyspark-formula/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneoffcoder%2Fpyspark-formula/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneoffcoder%2Fpyspark-formula/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oneoffcoder","download_url":"https://codeload.github.com/oneoffcoder/pyspark-formula/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248670437,"owners_count":21142904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","clustering","dataframes","interaction-design","patsy","pyspark","regression","rlike-formulas","spark"],"created_at":"2024-09-30T15:36:10.541Z","updated_at":"2025-04-13T06:04:44.811Z","avatar_url":"https://github.com/oneoffcoder.png","language":"Python","funding_links":["https://github.com/sponsors/vangj","https://patreon.com/vangj","https://oneoffcoder.com/","https://www.patreon.com/vangj"],"categories":[],"sub_categories":[],"readme":"![ydot logo](https://ydot.readthedocs.io/en/latest/_images/logo.png)\n\n# ydot\n\nR-like formulas for Spark Dataframes.\n\n- [Documentation](https://ydot.readthedocs.io/)\n- [PyPi](https://pypi.org/project/ydot/) \n- [Gitter](https://gitter.im/dataflava/ydot)\n\nNow you have the expressive power of R-like formulas to produce design matrices for your experimental needs. This API is based off of [patsy](https://patsy.readthedocs.io/en/latest/), but for use with Apache Spark dataframes. Given a Spark dataframe, you can express your design matrices with something that resembles the following.\n\n`y ~ x1 + x2 + (x3 + a + b)**2`\n\nHere's a short and sweet example.\n\n```python\nfrom ydot.spark import smatrices\n\nspark_df = get_a_spark_dataframe()\nformula = 'y ~ x1 + x2 + (x3 + a + b)**2'\ny, X = smatrices(formula, spark_df)\n```\n\n# Software Copyright\n\n```\nCopyright 2020 One-Off Coder\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n\n# Book Copyright\n\nCopyright 2020 One-Off Coder\n\nThis work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/) by [One-Off Coder](https://www.oneoffcoder.com).\n\n![Creative Commons Attribution 4.0 International License](https://i.creativecommons.org/l/by/4.0/88x31.png \"Creative Commons Attribution 4.0 International License\")\n\n# Art Copyright\n\nCopyright 2020 Daytchia Vang\n\n# Citation\n\n```\n@misc{oneoffcoder_ydot_2020,\ntitle={ydot, R-like formulas for Spark Dataframes},\nurl={https://github.com/oneoffcoder/pyspark-formula},\nauthor={Jee Vang},\nyear={2020},\nmonth={Dec}}\n```\n\n# Sponsor, Love\n\n- [Patreon](https://www.patreon.com/vangj)\n- [GitHub](https://github.com/sponsors/vangj)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneoffcoder%2Fpyspark-formula","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foneoffcoder%2Fpyspark-formula","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneoffcoder%2Fpyspark-formula/lists"}