{"id":13857140,"url":"https://github.com/dodger487/dplython","last_synced_at":"2025-07-13T20:31:02.534Z","repository":{"id":57423866,"uuid":"52316552","full_name":"dodger487/dplython","owner":"dodger487","description":"dplyr for python","archived":false,"fork":false,"pushed_at":"2016-12-30T19:23:20.000Z","size":1104,"stargazers_count":762,"open_issues_count":26,"forks_count":58,"subscribers_count":35,"default_branch":"master","last_synced_at":"2024-07-19T18:43:44.972Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dodger487.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-23T00:21:58.000Z","updated_at":"2024-06-24T03:27:46.000Z","dependencies_parsed_at":"2022-08-30T01:20:09.835Z","dependency_job_id":null,"html_url":"https://github.com/dodger487/dplython","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dodger487%2Fdplython","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dodger487%2Fdplython/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dodger487%2Fdplython/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dodger487%2Fdplython/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dodger487","download_url":"https://codeload.github.com/dodger487/dplython/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213988325,"owners_count":15666964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:27.487Z","updated_at":"2024-08-05T03:02:51.020Z","avatar_url":"https://github.com/dodger487.png","language":"Python","funding_links":[],"categories":["Data Manipulation","Python"],"sub_categories":["Pipelines"],"readme":"# Dplython: Dplyr for Python\n\n[![Build Status](https://travis-ci.org/dodger487/dplython.svg?branch=master)](https://travis-ci.org/dodger487/dplython)\n\nWelcome to Dplython: Dplyr for Python.\n\nDplyr is a library for the language R designed to make data analysis fast and easy.\nThe philosophy of Dplyr is to constrain data manipulation to a few simple functions that correspond to the most common tasks.\nThis maps thinking closer to the process of writing code, helping you move closer to analyze data at the \"speed of thought\".\n\nThe goal of this project is to implement the functionality of the R package Dplyr on top of Python's pandas.\n\n* Dplyr: [Click here](https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html)\n* Pandas: [Click here](http://pandas.pydata.org/pandas-docs/stable/10min.html)\n\nThis is version 0.0.7.\nIt's experimental and subject to change.\n\n## Introductory Video\nHere is a 20 minute video explaining dplython, given at PyGotham 2016.\n\n[![PyGotham Dplython video](http://img.youtube.com/vi/4YAcwCe1mAE/0.jpg)](https://www.youtube.com/watch?v=4YAcwCe1mAE \"PyGotham Dplython Video\")\n\nClick the awkward picture above to see the talk!\nNote that sound doesn't start until about 1 minute in due to microphone issues.\n\n## Installation\nTo install, use pip:\n```\npip install dplython\n```\n\nTo get the latest development version, you can clone this repo or use the command:\n```\npip install git+https://github.com/dodger487/dplython.git\n```\n\n## Contributing\nWe welcome your feature requests, open issues, bug reports, and pull requests!\nPlease use GitHub's interface.\nAlso consider joining the [dplython mailing list](https://groups.google.com/forum/#!forum/dplython).\n\n## Example usage\n```python\nimport pandas\nfrom dplython import (DplyFrame, X, diamonds, select, sift, sample_n,\n    sample_frac, head, arrange, mutate, group_by, summarize, DelayFunction) \n\n# The example `diamonds` DataFrame is included in this package, but you can \n# cast a DataFrame to a DplyFrame in this simple way:\n# diamonds = DplyFrame(pandas.read_csv('./diamonds.csv'))\n\n# Select specific columns of the DataFrame using select, and \n#   get the first few using head\ndiamonds \u003e\u003e select(X.carat, X.cut, X.price) \u003e\u003e head(5)\n\"\"\"\nOut:\n   carat        cut  price\n0   0.23      Ideal    326\n1   0.21    Premium    326\n2   0.23       Good    327\n3   0.29    Premium    334\n4   0.31       Good    335\n\"\"\"\n\n# Filter out rows using sift\ndiamonds \u003e\u003e sift(X.carat \u003e 4) \u003e\u003e select(X.carat, X.cut, X.depth, X.price)\n\"\"\"\nOut:\n       carat      cut  depth  price\n25998   4.01  Premium   61.0  15223\n25999   4.01  Premium   62.5  15223\n27130   4.13     Fair   64.8  17329\n27415   5.01     Fair   65.5  18018\n27630   4.50     Fair   65.8  18531\n\"\"\"\n\n# Sample with sample_n or sample_frac, sort with arrange\n(diamonds \u003e\u003e \n  sample_n(10) \u003e\u003e \n  arrange(X.carat) \u003e\u003e \n  select(X.carat, X.cut, X.depth, X.price))\n\"\"\"\nOut:\n       carat        cut  depth  price\n37277   0.23  Very Good   61.5    484\n17728   0.30  Very Good   58.8    614\n33255   0.32      Ideal   61.1    825\n38911   0.33      Ideal   61.6   1052\n31491   0.34    Premium   60.3    765\n37227   0.40    Premium   61.9    975\n2578    0.81    Premium   60.8   3213\n15888   1.01       Fair   64.6   6353\n26594   1.74      Ideal   62.9  16316\n25727   2.38    Premium   62.4  14648\n\"\"\"\n\n# You can: \n#   add columns with mutate (referencing other columns!)\n#   group rows into dplyr-style groups with group_by\n#   collapse rows into single rows using sumarize\n(diamonds \u003e\u003e \n  mutate(carat_bin=X.carat.round()) \u003e\u003e \n  group_by(X.cut, X.carat_bin) \u003e\u003e \n  summarize(avg_price=X.price.mean()))\n\"\"\"\nOut:\n       avg_price  carat_bin        cut\n0     863.908535          0      Ideal\n1    4213.864948          1      Ideal\n2   12838.984078          2      Ideal\n...\n27  13466.823529          3       Fair\n28  15842.666667          4       Fair\n29  18018.000000          5       Fair\n\"\"\"\n\n# If you have column names that don't work as attributes, you can use an \n# alternate \"get item\" notation with X.\ndiamonds[\"column w/ spaces\"] = range(len(diamonds))\ndiamonds \u003e\u003e select(X[\"column w/ spaces\"]) \u003e\u003e head()\n\"\"\"\nOut:\n   column w/ spaces\n0                 0\n1                 1\n2                 2\n3                 3\n4                 4\n5                 5\n6                 6\n7                 7\n8                 8\n9                 9\n\"\"\"\n\n# It's possible to pass the entire dataframe using X._ \ndiamonds \u003e\u003e sample_n(6) \u003e\u003e select(X.carat, X.price) \u003e\u003e X._.T\n\"\"\"\nOut:\n         18966    19729   9445   49951    3087    33128\ncarat     1.16     1.52     0.9    0.3     0.74    0.31\nprice  7803.00  8299.00  4593.0  540.0  3315.00  816.00\n\"\"\"\n\n# To pass the DataFrame or columns into functions, apply @DelayFunction\n@DelayFunction\ndef PairwiseGreater(series1, series2):\n  index = series1.index\n  newSeries = pandas.Series([max(s1, s2) for s1, s2 in zip(series1, series2)])\n  newSeries.index = index\n  return newSeries\n\ndiamonds \u003e\u003e PairwiseGreater(X.x, X.y)\n\n\n# Passing entire dataframe and plotting with ggplot\nfrom ggplot import ggplot, aes, geom_point, facet_wrap\nggplot = DelayFunction(ggplot)  # Simple installation\n(diamonds \u003e\u003e ggplot(aes(x=\"carat\", y=\"price\", color=\"cut\"), data=X._) + \n  geom_point() + facet_wrap(\"color\"))\n```\n![Ggplot example 1](http://dodger487.github.com/figs/dplython/ggplot_img1.png)\n\n```python\n(diamonds \u003e\u003e\n  sift((X.clarity == \"I1\") | (X.clarity == \"IF\")) \u003e\u003e \n  ggplot(aes(x=\"carat\", y=\"price\", color=\"color\"), X._) + \n    geom_point() + \n    facet_wrap(\"clarity\"))\n```\n![Ggplot example 2](http://dodger487.github.com/figs/dplython/ggplot_img2.png)\n\n```python\n# Matplotlib works as well!\nimport pylab as pl\npl.scatter = DelayFunction(pl.scatter)\ndiamonds \u003e\u003e sample_frac(0.1) \u003e\u003e pl.scatter(X.carat, X.price)\n```\n![MPL example 2](http://dodger487.github.com/figs/dplython/plt_img1.png)\n\n\nThis is very new and I'm matching changes.\nLet me know if you'd like to see a feature or think there's a better way I can do something.\n\n## Other approaches\n* [pandas-ply](http://pythonhosted.org/pandas-ply/)\n\nDevelopment of dplython began before I knew pandas-ply existed.\nAfter I found it, I chose \"X\" as the manager to be consistent.\nPandas-ply is a great approach and worth taking a look.\nThe main contrasts between the two are that:\n* dplython uses dplyr-style groups, as opposed to the SQL-style groups of pandas and pandas-ply\n* dplython maps a little more directly onto dplyr, for example having mutate instead of an expanded select.\n* Use of operators to connect operations instead of method-chaining\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdodger487%2Fdplython","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdodger487%2Fdplython","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdodger487%2Fdplython/lists"}