{"id":15654067,"url":"https://github.com/sdpython/pandas-streaming","last_synced_at":"2025-06-30T19:37:20.455Z","repository":{"id":30076622,"uuid":"104373976","full_name":"sdpython/pandas-streaming","owner":"sdpython","description":"Streaming API for pandas applied to big datasets ","archived":false,"fork":false,"pushed_at":"2024-09-15T10:04:25.000Z","size":904,"stargazers_count":31,"open_issues_count":3,"forks_count":9,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-11T12:41:25.276Z","etag":null,"topics":["numpy","pandas","python3","streaming-data","streaming-data-processing"],"latest_commit_sha":null,"homepage":"https://sdpython.github.io/doc/pandas-streaming/dev/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sdpython.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOGS.rst","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-09-21T16:45:17.000Z","updated_at":"2025-01-06T07:17:31.000Z","dependencies_parsed_at":"2023-07-22T19:33:12.381Z","dependency_job_id":"1f1f81cb-be08-4624-b50a-a68aa0a55e33","html_url":"https://github.com/sdpython/pandas-streaming","commit_stats":null,"previous_names":["sdpython/pandas-streaming","sdpython/pandas_streaming"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/sdpython/pandas-streaming","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpython%2Fpandas-streaming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpython%2Fpandas-streaming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpython%2Fpandas-streaming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpython%2Fpandas-streaming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sdpython","download_url":"https://codeload.github.com/sdpython/pandas-streaming/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpython%2Fpandas-streaming/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262839720,"owners_count":23372780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["numpy","pandas","python3","streaming-data","streaming-data-processing"],"created_at":"2024-10-03T12:49:23.567Z","updated_at":"2025-06-30T19:37:20.416Z","avatar_url":"https://github.com/sdpython.png","language":"Python","readme":"pandas-streaming: streaming API over pandas\n===========================================\n\n.. image:: https://ci.appveyor.com/api/projects/status/4te066r8ne1ymmhy?svg=true\n    :target: https://ci.appveyor.com/project/sdpython/pandas-streaming\n    :alt: Build Status Windows\n\n.. image:: https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming\n    :target: https://dev.azure.com/xavierdupre3/pandas_streaming/\n\n.. image:: https://badge.fury.io/py/pandas_streaming.svg\n    :target: http://badge.fury.io/py/pandas_streaming\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n    :alt: MIT License\n    :target: https://opensource.org/license/MIT/\n\n.. image:: https://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8 \n    :target: https://codecov.io/gh/sdpython/pandas-streaming\n\n.. image:: http://img.shields.io/github/issues/sdpython/pandas_streaming.png\n    :alt: GitHub Issues\n    :target: https://github.com/sdpython/pandas_streaming/issues\n\n.. image:: https://pepy.tech/badge/pandas_streaming/month\n    :target: https://pepy.tech/project/pandas_streaming/month\n    :alt: Downloads\n\n.. image:: https://img.shields.io/github/forks/sdpython/pandas_streaming.svg\n    :target: https://github.com/sdpython/pandas_streaming/\n    :alt: Forks\n\n.. image:: https://img.shields.io/github/stars/sdpython/pandas_streaming.svg\n    :target: https://github.com/sdpython/pandas_streaming/\n    :alt: Stars\n\n.. image:: https://img.shields.io/github/repo-size/sdpython/pandas_streaming\n    :target: https://github.com/sdpython/pandas_streaming/\n    :alt: size\n\n`pandas-streaming \u003chttps://sdpython.github.io/doc/pandas-streaming/dev/\u003e`_\naims at processing big files with `pandas \u003chttps://pandas.pydata.org/\u003e`_,\ntoo big to hold in memory, too small to be parallelized with a significant gain.\nThe module replicates a subset of *pandas* API\nand implements other functionalities for machine learning.\n\n.. code-block:: python\n\n    from pandas_streaming.df import StreamingDataFrame\n    sdf = StreamingDataFrame.read_csv(\"filename\", sep=\"\\t\", encoding=\"utf-8\")\n\n    for df in sdf:\n        # process this chunk of data\n        # df is a dataframe\n        print(df)\n\nThe module can also stream an existing dataframe.\n\n.. code-block:: python\n\n    import pandas\n    df = pandas.DataFrame([dict(cf=0, cint=0, cstr=\"0\"),\n                           dict(cf=1, cint=1, cstr=\"1\"),\n                           dict(cf=3, cint=3, cstr=\"3\")])\n\n    from pandas_streaming.df import StreamingDataFrame\n    sdf = StreamingDataFrame.read_df(df)\n\n    for df in sdf:\n        # process this chunk of data\n        # df is a dataframe\n        print(df)\n\nIt contains other helpers to split datasets into\ntrain and test with some weird constraints.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdpython%2Fpandas-streaming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsdpython%2Fpandas-streaming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdpython%2Fpandas-streaming/lists"}