https://github.com/nftstorage/carbites
***Notice: This repository is no longer maintained.*** 🚗 🚙 🚕 Chunking for CAR files. Split a single CAR into multiple CARs.
https://github.com/nftstorage/carbites
car chunking cid ipld multiformats splitting
Last synced: 3 months ago
JSON representation
***Notice: This repository is no longer maintained.*** 🚗 🚙 🚕 Chunking for CAR files. Split a single CAR into multiple CARs.
- Host: GitHub
- URL: https://github.com/nftstorage/carbites
- Owner: nftstorage
- License: other
- Created: 2021-06-02T00:07:45.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-04-01T11:00:40.000Z (over 1 year ago)
- Last Synced: 2025-06-19T13:05:54.717Z (4 months ago)
- Topics: car, chunking, cid, ipld, multiformats, splitting
- Language: JavaScript
- Homepage:
- Size: 322 KB
- Stars: 24
- Watchers: 1
- Forks: 7
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# carbites
[](https://github.com/nftstorage/carbites/actions/workflows/main.yml)
[](https://david-dm.org/nftstorage/carbites)
[](https://standardjs.com)
[](https://bundlephobia.com/package/carbites)Chunking for [CAR files](https://ipld.io/specs/transport/car/). Split a single CAR into multiple CARs.
## Install
```sh
npm install carbites
```## Usage
Carbites supports 3 different strategies:
1. [**Simple**](#simple) (default) - fast but naive, only the first CAR output has a root CID, subsequent CARs have a placeholder "empty" CID.
2. [**Rooted**](#rooted) - like simple, but creates a custom root node to ensure all blocks in a CAR are referenced.
3. [**Treewalk**](#treewalk) - walks the DAG to pack sub-graphs into each CAR file that is output. Every CAR has the same root CID, but contains a different portion of the DAG.### Simple
```js
import { CarSplitter } from 'carbites'
import { CarReader } from '@ipld/car'
import fs from 'fs'const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new CarSplitter(bigCar, targetSize) // (simple strategy)for await (const car of splitter.cars()) {
// Each `car` is an AsyncIterable
}
```⚠️ Note: The first CAR output has roots in the header, subsequent CARs have an empty root CID [`bafkqaaa`](https://cid.ipfs.io/#bafkqaaa) as [recommended](https://ipld.io/specs/transport/car/carv1/#number-of-roots).
### Rooted
Instead of an empty CID, carbites can generate a special root node for each split CAR that references all the blocks _and_ the original roots (only in the first CAR). To do this, use the `RootedCarSplitter` constructor. When reading/extracting data from the CARs, the root node should be discarded.
Example
```js
import { RootedCarSplitter } from 'carbites/rooted'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new RootedCarSplitter(bigCar, targetSize)const cars = splitter.cars()
// Every CAR has a single root - a CBOR node that is an tuple of `/carbites/1`,
// an array of root CIDs and an array of block CIDs.
// e.g. ['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']]const { done, value: car } = await cars.next()
const reader = await CarReader.fromIterable(car)
const rootCids = await reader.getRoots()
const rootNode = dagCbor.decode(await reader.get(rootCids[0]))console.log(rootNode[0]) // /carbites/1
console.log(rootNode[1]) // Root CIDs (only in first CAR)
/*
[
CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq)
]
*/
console.log(rootNode[2]) // Block CIDs (all blocks in this CAR)
/*
[
CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq),
CID(bafyreihcsxqhd6agqpboc3wrlvpy5bwuxctv5upicdnt3u2wojv4exxl24),
CID(bafyreiasq7d2ihbqm5xvhjjzlmzsensuadrpmpt2tkjsuwq42xpa34qevu)
]
*/
```The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.
### Treewalk
Every CAR file has the _same_ root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
Example
```js
import { TreewalkCarSplitter } from 'carbites/treewalk'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const [rootCid] = await bigCar.getRoots()
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new TreewalkCarSplitter(bigCar, targetSize)for await (const car of splitter.cars()) {
// Each `car` is an AsyncIterable
const reader = await CarReader.fromIterable(car)
const [splitCarRootCid] = await reader.getRoots()
console.assert(rootCid.equals(splitCarRootCid)) // all cars will have the same root
}
```### CLI
Install the CLI tool to use Carbites from the comfort of your terminal:
```sh
npm i -g carbites-cli# Split a big CAR into many smaller CARs
carbites split big.car --size 100MB --strategy simple # (default size & strategy)# Join many split CARs back into a single CAR.
carbites join big-0.car big-1.car ...
# Note: not a tool for joining arbitrary CARs together! The split CARs MUST
# belong to the same CAR and big-0.car should be the first argument.
```## API
* [`class CarSplitter`](#class-carsplitter)
* [`constructor(car: AsyncIterable, targetSize: number)`](#constructorcar-asynciterableuint8array-targetsize-number)
* [`cars(): AsyncGenerator & RootsReader>`](#cars-asyncgeneratorasynciterableuint8array--rootsreader)
* [`static async fromBlob(blob: Blob, targetSize: number): CarSplitter`](#static-async-fromblobblob-blob-targetsize-number-carsplitter)
* [`static async fromIterable(iterable: AsyncIterable, targetSize: number): CarSplitter`](#static-async-fromiterableiterable-asynciterableuint8array-targetsize-number-carsplitter)
* [`class CarJoiner`](#class-carjoiner)
* [`constructor(cars: Iterable>)`](#constructorcars-iterableasynciterableuint8array)
* [`car(): AsyncGenerator`](#car-asyncgeneratoruint8array)
* [`class RootedCarSplitter`](#class-rootedcarsplitter)
* [`class RootedCarJoiner`](#class-rootedcarjoiner)
* [`class TreewalkCarSplitter`](#class-treewalkcarsplitter)
* [`class TreewalkCarJoiner`](#class-treewalkcarjoiner)### `class CarSplitter`
Split a CAR file into several smaller CAR files.
Import in the browser:
```js
import { CarSplitter } from 'https://cdn.skypack.dev/carbites'
```Import in Node.js:
```js
import { CarSplitter } from 'carbites'
```Note: This is an alias of `SimpleCarSplitter` - the default strategy for splitting CARs.
#### `constructor(car: CarReader, targetSize: number)`
Create a new `CarSplitter` for the passed CAR file, aiming to generate CARs of around `targetSize` bytes in size.
#### `cars(): AsyncGenerator & RootsReader>`
Split the CAR file and create multiple smaller CAR files. Returns an `AsyncGenerator` that yields the split CAR files (of type `AsyncIterable`).
The CAR files output also implement the [`RootsReader`](https://github.com/ipld/js-car/blob/8c74dc3c7273213b83f4610e4f88cf1ad2830fa6/api.ts#L18-L21) interface from `@ipld/car` which means you can call `getRoots(): Promise` to obtain the root CIDs.
#### `static async fromBlob(blob: Blob, targetSize: number): CarSplitter`
Convenience function to create a new `CarSplitter` from a [blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob) of CAR file data.
#### `static async fromIterable(iterable: AsyncIterable, targetSize: number): CarSplitter`
Convenience function to create a new `CarSplitter` from an `AsyncIterable` of CAR file data.
### `class CarJoiner`
Join together split CAR files into a single big CAR.
Import in the browser:
```js
import { CarJoiner } from 'https://cdn.skypack.dev/carbites'
```Import in Node.js:
```js
import { CarJoiner } from 'carbites'
```Note: This is an alias of `SimpleCarJoiner` - a joiner for the the default CAR splitting strategy.
#### `constructor(cars: Iterable)`
Create a new `CarJoiner` for joining the passed CAR files together.
#### `car(): AsyncGenerator`
Join the CAR files together and return the joined CAR.
### `class RootedCarSplitter`
Split a CAR file into several smaller CAR files ensuring every CAR file contains a single root node that references all the blocks _and_ the original roots (only in the first CAR). When reading/extracting data from the CARs, the root node should be discarded.
Import in the browser:
```js
import { RootedCarSplitter } from 'https://cdn.skypack.dev/carbites/rooted'
```Import in Node.js:
```js
import { RootedCarSplitter } from 'carbites/rooted'
```The API is the same as for [`CarSplitter`](#class-carsplitter).
#### Root Node Format
The root node is a `dag-cbor` node that is a tuple of the string `/carbites/1`, an array of root CIDs (only seen in first CAR) and an array of block CIDs (all the blocks in the CAR). e.g. `['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']]`.
Note: The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.
### `class RootedCarJoiner`
Join together CAR files that were split using [`RootedCarSplitter`](#class-rootedcarsplitter).
The API is the same as for [`CarJoiner`](#class-carjoiner).
### `class TreewalkCarSplitter`
Split a CAR file into several smaller CAR files. Every CAR file has the _same_ root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
Import in the browser:
```js
import { TreewalkCarSplitter } from 'https://cdn.skypack.dev/carbites/treewalk'
```Import in Node.js:
```js
import { TreewalkCarSplitter } from 'carbites/treewalk'
```The API is the same as for [`CarSplitter`](#class-carsplitter).
### `class TreewalkCarJoiner`
Join together CAR files that were split using [`TreewalkCarSplitter`](#class-treewalkcarsplitter).
The API is the same as for [`CarJoiner`](#class-carjoiner).
## Releasing
You can publish by either running `npm publish` in the `dist` directory or using `npx ipjs publish`.
## Related
* [Carbites in Golang](https://github.com/alanshaw/go-carbites)## Contribute
Feel free to dive in! [Open an issue](https://github.com/nftstorage/carbites/issues/new) or submit PRs.
## License
Dual-licensed under [MIT](https://github.com/nftstorage/carbites/blob/main/LICENSE-MIT) + [Apache 2.0](https://github.com/nftstorage/carbites/blob/main/LICENSE-APACHE)