{"id":30855916,"url":"https://github.com/sermetpekin/perse","last_synced_at":"2025-09-07T11:11:46.419Z","repository":{"id":261666506,"uuid":"884801160","full_name":"SermetPekin/perse","owner":"SermetPekin","description":"Perse is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries Pandas, Polars, and DuckDB into a single, unified DataFrame object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling.","archived":false,"fork":false,"pushed_at":"2024-12-12T13:38:40.000Z","size":135,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-01T08:31:23.853Z","etag":null,"topics":["data","data-science","data-structures","duckdb","pandas","polars"],"latest_commit_sha":null,"homepage":"https://perse.readthedocs.io/en/latest/home.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SermetPekin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-07T12:16:35.000Z","updated_at":"2025-02-01T19:03:05.000Z","dependencies_parsed_at":"2024-11-07T19:42:52.997Z","dependency_job_id":"545ad08a-7dec-4260-a798-258cff4eaef4","html_url":"https://github.com/SermetPekin/perse","commit_stats":null,"previous_names":["sermetpekin/perse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SermetPekin/perse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SermetPekin%2Fperse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SermetPekin%2Fperse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SermetPekin%2Fperse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SermetPekin%2Fperse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SermetPekin","download_url":"https://codeload.github.com/SermetPekin/perse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SermetPekin%2Fperse/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274026718,"owners_count":25209740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","data-structures","duckdb","pandas","polars"],"created_at":"2025-09-07T11:11:11.031Z","updated_at":"2025-09-07T11:11:46.411Z","avatar_url":"https://github.com/SermetPekin.png","language":"Python","readme":"\n[![Python Package](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml/badge.svg?2)](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml)\n[![PyPI](https://img.shields.io/pypi/v/perse)](https://img.shields.io/pypi/v/perse) ![PyPI Downloads](https://static.pepy.tech/badge/perse?2)![t](https://img.shields.io/badge/status-maintained-yellow.svg) [![](https://img.shields.io/github/license/SermetPekin/perse.svg)](https://github.com/SermetPekin/perse/blob/master/LICENSE.md) [![](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) \n\n\n\n# Perse\n\n**Perse** is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries **Pandas**, **Polars**, and **DuckDB** into a single, unified `DataFrame` object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling experience.\n\nThis package is currently experimental, with a focus on essential functions. We plan to expand its capabilities by integrating more features from Pandas, Polars, and DuckDB in future versions.\n\n## Key Features\n\nThe `Perse` DataFrame currently supports the following functionalities:\n\n### 1. Data Manipulation\nCore data-handling tools inspired by Pandas and Polars.\n\n- **Indexing and Selection**: Access specific rows or columns with `.loc` and `.iloc` properties.\n- **Column Operations**: Add, modify, or delete columns efficiently.\n- **Row Filtering**: Filter rows based on specific conditions.\n- **Aggregation**: Summarize data with aggregations like `sum`, `mean`, `count`.\n- **Sorting**: Sort data based on column values.\n- **Custom Function Application**: Apply custom functions to columns, supporting both element-wise operations and complex transformations.\n\n### 2. SQL Querying\nUse DuckDB's SQL engine to run SQL queries directly on the DataFrame, ideal for complex filtering and data manipulation.\n\n- **Direct SQL Queries**: Run SQL queries directly on data using DuckDB’s powerful engine.\n- **Seamless Integration**: Convert between Polars and DuckDB seamlessly for efficient querying on large datasets.\n- **Advanced Filtering**: Filter, join, and group data using SQL syntax.\n\n### 3. Data Transformation\nA collection of versatile data transformation functions.\n\n- **Pivot and Unpivot**: Reshape data for summary reports and visualizations.\n- **Melt/Stack**: Transform data between wide and long formats.\n- **Mapping and Replacing**: Map values based on conditions or replace them in columns.\n- **Grouping and Window Functions**: Group by specific columns and apply aggregations or window functions for advanced data summarization.\n\n### 4. Compatibility and Conversion\nInteroperability between Pandas, Polars, and DuckDB formats, offering flexibility in data manipulation.\n\n- **Pandas Compatibility**: Conversion utilities to easily move data between Pandas and Polars.\n- **Automatic Data Handling**: Automatically convert and handle data depending on the operation, allowing users to work flexibly with either Pandas or Polars.\n- **File I/O Support**: Read and write from common file formats (e.g., CSV, Parquet, JSON).\n\n### 5. Visualization\nBasic plotting capabilities that make it easy to visualize data directly from the Perse DataFrame.\n\n- **Line, Bar, and Scatter Plots**: Quick visualizations with common plot types.\n- **Customization**: Customize plot titles, labels, and legends with Matplotlib.\n- **Direct Plotting**: Plot directly from the Perse DataFrame, which internally uses Pandas’ Matplotlib integration.\n\n### 6. Data Integrity and Locking\nFeatures designed to prevent accidental modifications and ensure data integrity.\n\n- **Locking Mechanism**: Lock the DataFrame to prevent accidental edits.\n- **Unlocking**: Explicitly unlock to allow modifications.\n- **Validation**: Ensure data type consistency across columns for critical operations.\n\n## Installation\n\nTo install Perse, run:\n\n```bash\npip install perse\n```\n\n### Usage \n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n\n# 1. Add a New Column \ndf.add_column(\"D\", np.random.random(10), inplace=True)\nprint(\"DataFrame with new column D:\\n\", df)\n\n# 2. Filter Rows\ndf2 = df.filter_rows(df.dl[\"A\"] \u003e 50, inplace=False) # default inplace = False \nprint(\"Filtered DataFrame (A \u003e 50):\\n\", df2)\n\n# 3. SQL Querying with DuckDB\ndf2 = df.query(\"SELECT A, AVG(B) AS avg_B FROM this GROUP BY A\")\nprint(\"SQL Query Result:\\n\", df2)\n\n# 4. Visualization\ndf.plot(kind=\"scatter\", x=\"A\", y=\"B\", title=\"Scatter Plot of A vs B\", xlabel=\"A values\", ylabel=\"B values\")\n\n# 5. Convert to Pandas\ndf2 = df.to_pandas()\nprint(\"Converted to Pandas DataFrame:\\n\", df2)\n\n\n```\n### Exporting data\n```python\n\nfrom perse import DataFrame\nimport numpy as np\n\n# Generate sample data\nnp.random.seed(42)\ndata = {\n    \"A\": np.random.randint(0, 100, 10),\n    \"B\": np.random.random(10),\n    \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10),\n}\n\ndf = DataFrame(data)\n\n# Export as CSV file\ndf.to_csv('example.csv')\n\n# Export as Excel file\ndf.to_excel('example.xlsx')\n\n# Export as JSON file\ndf.to_json('example.json')\n\n\n# Alternatively this concise expression can also be used\ndf \u003e 'example.csv'\ndf \u003e 'example.xlsx'\ndf \u003e 'example.json'\n```\n\n\nPipe Operator\n================\nIn Python, the | operator is traditionally used as the OR operator. However, in the DataFrame class, the | operator has been repurposed for a functional, chainable approach, similar to other modern data processing libraries. This enables more readable and flexible expressions.\n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n# Applying the print function to the DataFrame instance\ndf | print\n\n# Chaining functions: the instance is returned if no modification is made\ndf2 = df | print | print\n\n# Using a lambda function to call `to_csv` with arguments, demonstrating flexibility in piping\n_ = df | (lambda x: x.to_csv('example.csv'))\n\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsermetpekin%2Fperse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsermetpekin%2Fperse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsermetpekin%2Fperse/lists"}