{"id":20749164,"url":"https://github.com/g-research/parquetsharp.dataframe","last_synced_at":"2025-04-28T12:22:47.367Z","repository":{"id":44604669,"uuid":"438904607","full_name":"G-Research/ParquetSharp.DataFrame","owner":"G-Research","description":"ParquetSharp.DataFrame is a .NET library for reading and writing Apache Parquet files into/from .NET DataFrames, using ParquetSharp","archived":false,"fork":false,"pushed_at":"2025-03-12T01:01:59.000Z","size":51,"stargazers_count":23,"open_issues_count":2,"forks_count":12,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-04-23T00:23:30.089Z","etag":null,"topics":["big-data","csharp","dataframe","dotnet","parquet"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/G-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-16T07:51:11.000Z","updated_at":"2025-03-24T19:47:03.000Z","dependencies_parsed_at":"2024-03-27T04:23:05.364Z","dependency_job_id":"bc093267-1aff-4c1f-8790-1db188015b95","html_url":"https://github.com/G-Research/ParquetSharp.DataFrame","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2FParquetSharp.DataFrame","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2FParquetSharp.DataFrame/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2FParquetSharp.DataFrame/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2FParquetSharp.DataFrame/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/G-Research","download_url":"https://codeload.github.com/G-Research/ParquetSharp.DataFrame/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311689,"owners_count":21569079,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","csharp","dataframe","dotnet","parquet"],"created_at":"2024-11-17T08:21:21.100Z","updated_at":"2025-04-28T12:22:47.349Z","avatar_url":"https://github.com/G-Research.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ParquetSharp.DataFrame\n\n[![CI Status](https://github.com/G-Research/ParquetSharp.DataFrame/actions/workflows/ci.yml/badge.svg?branch=main\u0026event=push)](https://github.com/G-Research/ParquetSharp.DataFrame/actions/workflows/ci.yml?query=branch%3Amain+event%3Apush)\n[![NuGet latest release](https://img.shields.io/nuget/v/ParquetSharp.DataFrame.svg)](https://www.nuget.org/packages/ParquetSharp.DataFrame)\n\nParquetSharp.DataFrame is a .NET library for reading and writing Apache Parquet files into/from .NET [DataFrames][1], using [ParquetSharp][2].\n\n[1]: https://docs.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe\n[2]: https://github.com/G-Research/ParquetSharp\n\n## Reading Parquet files\n\nParquet data is read into a `DataFrame` using `ToDataFrame` extension methods on `ParquetFileReader`,\nfor example:\n\n```C#\nusing ParquetSharp;\n\nusing (var parquetReader = new ParquetFileReader(parquet_file_path))\n{\n    var dataFrame = parquetReader.ToDataFrame();\n    parquetReader.Close();\n}\n```\n\nOverloads are provided that allow you to read specific columns from the Parquet file,\nand/or a subset of row groups:\n\n```C#\nvar dataFrame = parquetReader.ToDataFrame(columns: new [] {\"col_1\", \"col_2\"});\n```\n\n```C#\nvar dataFrame = parquetReader.ToDataFrame(rowGroupIndices: new [] {0, 1});\n```\n\n## Writing Parquet files\n\nParquet files are written using the `ToParquet` extension method on `DataFrame`:\n\n```C#\nusing ParquetSharp;\nusing Microsoft.Data.Analysis;\n\nvar dataFrame = new DataFrame(columns);\ndataFrame.ToParquet(parquet_file_path);\n```\n\nParquet writing options can be overridden by providing an instance of `WriterProperties`:\n\n```C#\nusing (var propertiesBuilder = new WriterPropertiesBuilder())\n{\n    propertiesBuilder.Compression(Compression.Snappy);\n    using (var properties = propertiesBuilder.Build())\n    {\n        dataFrame.ToParquet(parquet_file_path, properties);\n    }\n}\n```\n\nThe logical type to use when writing a column can optionally be overridden.\nThis is required when writing decimal columns, as you must specify the precision and scale to be used\n(see the [Parquet documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal) for more details).\nThis also allows writing an integer column as a Parquet date or time.\n\n```C#\ndataFrame.ToParquet(parquet_file_path, logicalTypeOverrides: new Dictionary\u003cstring, LogicalType\u003e\n{\n    {\"decimal_column\", LogicalType.Decimal(precision: 29, scale: 3)},\n    {\"date_column\", LogicalType.Date()},\n    {\"time_column\", LogicalType.Time(isAdjustedToUtc: true, TimeUnit.Millis)},\n});\n```\n\n## Contributing\n\nWe welcome new contributors! We will happily receive PRs for bug fixes or small changes.\nIf you're contemplating something larger please get in touch first by opening a GitHub Issue describing the problem and how you propose to solve it.\n\n## Security\n\nPlease see our [security policy](https://github.com/G-Research/ParquetSharp.DataFrame/blob/main/SECURITY.md) for details on reporting security vulnerabilities.\n\n## License\n\nCopyright 2021 G-Research\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use these files except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg-research%2Fparquetsharp.dataframe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fg-research%2Fparquetsharp.dataframe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg-research%2Fparquetsharp.dataframe/lists"}