{"id":26871425,"url":"https://github.com/popbones/bundlr-go","last_synced_at":"2025-03-31T07:19:29.859Z","repository":{"id":54538273,"uuid":"300123373","full_name":"popbones/bundlr-go","owner":"popbones","description":"A go package for handing parted file sets","archived":false,"fork":false,"pushed_at":"2021-02-19T12:48:48.000Z","size":79,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-06-19T15:16:33.601Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/popbones.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-01T03:02:18.000Z","updated_at":"2021-02-19T12:48:50.000Z","dependencies_parsed_at":"2022-08-13T19:01:08.278Z","dependency_job_id":null,"html_url":"https://github.com/popbones/bundlr-go","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/popbones%2Fbundlr-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/popbones%2Fbundlr-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/popbones%2Fbundlr-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/popbones%2Fbundlr-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/popbones","download_url":"https://codeload.github.com/popbones/bundlr-go/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246429472,"owners_count":20775809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-31T07:19:29.165Z","updated_at":"2025-03-31T07:19:29.853Z","avatar_url":"https://github.com/popbones.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bundlr\n\n[![Build status](https://travis-ci.com/popbones/bundlr-go.svg)](https://travis-ci.com/popbones/bundlr-go.svg)\n[![Code Coverage](https://codecov.io/gh/popbones/bundlr-go/graph/badge.svg)](https://codecov.io/gh/popbones/bundlr-go)\n[![Report Card](https://goreportcard.com/badge/github.com/popbones/bundlr-go)](https://goreportcard.com/report/github.com/popbones/bundlr-go)\n[![GoDoc](https://godoc.org/github.com/nathany/looper?status.svg)](https://pkg.go.dev/github.com/popbones/bundlr-go/bundlr)\n\nBundlr is a go package helps the handling of parted file sets.\n\nBundlr is a go package that helps to handle parted file sets.\nA parted file here is a series of files that stored homogenous set of records. The primary motivation is to avoid one large monolithic file for large datasets.\n\nI developed this package because we needed to exchange parquet files with Apache Spark via S3. Being able to split the dataset into multiple files helps the memory footprint of the parquet go package we are using as well as make the parallelised processing easier.\n\nThis package only handles the split reading and writing of data. It does not dictate the actual storage backend or the file format.\n\n## Bundle\n\nWe call the data Bundlr handles a \"bundle\". It is essentially a directory with a sub-directory called `data`.\n\n```\nfoo.bundle/\n    |- data/\n        |- data_000000.dat\n        |- data_000001.dat\n        |- ...\n```  \n\nCurrently, new files are created when a specified number of records has been written to the current file.\n\nThe actual format of the file depending on the decoder/encoder configurated.\n\nIn the future we may add more information to the structure to facilitate more functionality. For example, a manifest file or additional resource files.\n\n## How to use\n\nCheck `examples/parquet-on-s3`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpopbones%2Fbundlr-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpopbones%2Fbundlr-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpopbones%2Fbundlr-go/lists"}