https://github.com/davidbuchanan314/pack-analysis

Reverse engineering the https://pack.ac/ file format
https://github.com/davidbuchanan314/pack-analysis

Last synced: 7 days ago
JSON representation

Reverse engineering the https://pack.ac/ file format

Host: GitHub
URL: https://github.com/davidbuchanan314/pack-analysis
Owner: DavidBuchanan314
License: mit
Created: 2024-03-23T03:47:35.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-26T07:38:45.000Z (over 1 year ago)
Last Synced: 2025-02-12T20:48:29.779Z (5 months ago)
Language: Python
Size: 32.2 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pack-analysis

The [`pack`](https://github.com/PackOrganization/Pack) file format is an interesting new file archive format. However, they forgot to document the format, so I'll do my best to fill in the gaps.

This document is a work-in-progress based on reverse-engineering (I don't know Pascal), and may contain incorrect information.

A `.pack` file is an sqlite3 database, but with the magic bytes of `Pack\x00\x20` rather than the usual `SQLite format 3`. This makes it impossible to use standard sqlite3 bindings to work with pack files. You'll need to build your own sqlite if you want seamless support (build with custom [`SQLITE_FILE_HEADER`](https://github.com/sqlite/sqlite/blob/378bf82e2bc09734b8c5869f9b148efe37d29527/src/btreeInt.h#L236-L250)) (or, use a custom VFS).

The two bytes after `Pack` are actually version information, per this [comment](https://news.ycombinator.com/item?id=39801059)

> Two byte after 'Pack' header in little endian as (1 (Draft) shl 13 + 0 (version 0) = 8192). Final would be 0, so the first Final version will be 0 shl 13 + 1 = 1. and the second will be 2. It is by design, so any Draft version gets a higher number, preventing future mixups.

`unpack.py` implements a crude extraction utility (which dumps all files into the cwd)

Beware, my implementation is very vulnerable to unbounded recursion and/or decompression bombs. I haven't checked whether the reference implementation is, too.

### Schema

There are 3 tables, `Content`, `Item`, and `ItemContent`. Their schemae are as follows:

```sql

CREATE TABLE Content(

	ID INTEGER PRIMARY KEY,

	Value BLOB

);

```

```sql

CREATE TABLE Item(

	ID INTEGER PRIMARY KEY,

	Parent INTEGER,

	Kind INTEGER,

	Name TEXT

);

```

```sql

CREATE TABLE ItemContent(

	ID INTEGER PRIMARY KEY,

	Item INTEGER,

	ItemPosition INTEGER,

	Content INTEGER,

	ContentPosition INTEGER,

	Size INTEGER

);

```

The data is stored in the `Value` column of the Content table (zstd compressed), with potentially many small files stored in a single blob (conversely, a single large file may be split over many blobs).

Files and directories are listed in the `Item` table. The root directory is implicitly ID 0 (Items with Parent=0 are therefore in the root directory).

The `ItemContent` table describes where to find the actual file contents, within the `Value` blob(s).

The offsets given in `ItemContent` refer to *decompressed* offsets. This means if the file you're looking for is midway through a blob, you need to decompress it from the start. This isn't quite as bad as it sounds, because `pack` seems to limit blobs to 8MB (uncompressed size) each, by default.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/davidbuchanan314/pack-analysis

Awesome Lists containing this project

README