{"id":51111543,"url":"https://github.com/rupor-github/metabib","last_synced_at":"2026-06-24T18:01:44.728Z","repository":{"id":366362073,"uuid":"1276014692","full_name":"rupor-github/metabib","owner":"rupor-github","description":"(WIP) Metadata extractor from Flibusta-style SQL dumps and FB2 archives. Replacement for InpxCreator/lib2inpx. ","archived":false,"fork":false,"pushed_at":"2026-06-21T12:44:19.000Z","size":114,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-21T14:39:08.361Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rupor-github.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-21T12:38:37.000Z","updated_at":"2026-06-21T14:21:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rupor-github/metabib","commit_stats":null,"previous_names":["rupor-github/metabib"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/rupor-github/metabib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rupor-github%2Fmetabib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rupor-github%2Fmetabib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rupor-github%2Fmetabib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rupor-github%2Fmetabib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rupor-github","download_url":"https://codeload.github.com/rupor-github/metabib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rupor-github%2Fmetabib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34743466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-24T18:01:43.884Z","updated_at":"2026-06-24T18:01:44.720Z","avatar_url":"https://github.com/rupor-github.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003e\n    \u003cimg src=\"docs/library.svg\" style=\"vertical-align:middle; width:14%\" align=\"absmiddle\"/\u003e\n    \u003cspan style=\"vertical-align:middle;\"\u003e\u0026nbsp;\u0026nbsp;Metadata extractor from Flibusta-style SQL dumps and FB2 archives.\u003c/span\u003e\n\u003c/h1\u003e\n\n## metabib\n[![GitHub Release](https://img.shields.io/github/release/rupor-github/metabib.svg)](https://github.com/rupor-github/metabib/releases)\n\n`metabib` extracts metadata from Flibusta-style SQL dumps and FB2 archives into\nJSON Lines. It first builds cache manifests for database dumps and/or archives,\nthen merges those cached artifacts into final JSONL. The project is intended as\na modern replacement for the outdated and cross-platform maintenance-heavy\n[InpxCreator](https://github.com/rupor-github/InpxCreator), which depends on an\nembedded MySQL library that was dropped by both MySQL and MariaDB many versions\nago.\n\n`metabib` is not trying to reproduce the full `lib2inpx`/InpxCreator feature\nset. The intentionally unported areas include:\n\n- library content formats other than FB2;\n- Librusec schema differences and Flibusta-specific assumptions;\n- historical dump schema changes and migration compatibility;\n- legacy INPX compatibility quirks for specific catalog readers.\n- creation of INPX daily updates is not presently supported because the legacy\napproach has too many limitations.\n\nInstead, `metabib` aims to provide an easily parsable source of truth for the\ngrowing number of catalog programs. INPX is useful as an interchange artifact,\nbut it is far from optimal as a primary metadata source: it carries limitations\nand assumptions from the program it was originally created for, MyHomeLib,\nrather than representing a neutral catalog model.\n\n`metabib` caches information extracted from SQL dumps and book archives into\nmanifest files so expensive extraction work can be reused later. Database dumps\nand archives have separate manifests, which makes it possible to update\ndatabase-derived metadata without re-parsing the whole archive set. Cached\nmanifests and combined output records are JSON data with well-defined schemas,\nmaking the resulting dataset easy to validate, transform, and consume from other\ntools.\n\n## What It Does\n\n`metabib` is organized around reusable processing passes:\n\n- `fetch` downloads new daily archive updates and SQL dumps from a configured\n  remote library profile;\n- `rollup` folds daily FB2 update ZIPs into size-bounded local archive ZIPs;\n- `cache` imports SQL dumps, queries database metadata, walks FB2 archive\n  entries, parses FB2 descriptions, and writes manifest files for each selected\n  source;\n- `merge` reads existing manifests and combines database-derived and\n  archive-derived metadata into final `metabib.record/1` JSONL records described\n  by `docs/metabib.schema.json`;\n- `mhl-inpx` consumes the merged JSONL dataset and metadata sidecar to produce a\n  MyHomeLib-compatible FB2 INPX without coupling the main extraction pipeline to\n  legacy output constraints.\n\nThe same transformation approach can support other derived artifacts later,\nincluding update lineages and differential update schemes.\n\n## MariaDB Binaries\n\nWhen the cache pass processes SQL dumps in the default managed mode, `metabib`\ndiscovers MariaDB binaries recursively in `./mariadb` first, then in `PATH`,\nstarts a private local MariaDB server, imports `*.sql` dumps with the discovered\n`mariadb` or `mysql` client, and stops the server when processing is done. It\ndoes not require a system database service.\n\nTo use an existing MariaDB service instead of the managed local instance, set\n`database.dsn` or `database.managed: false` in the configuration file.\n\nThe easiest portable setup for managed mode is to keep a local MariaDB unpacked\nnext to the `metabib` executable or project checkout.\n\nThis approach should allow `metabib` to run on any platform supported by Go that\nalso has recent MariaDB binaries available, whether those binaries come from the\nsystem, a system package, or a separately compiled distribution for that\nplatform. Finding suitable MariaDB binaries for a particular platform is the\nuser's responsibility.\n\nOn Windows, download MariaDB from \u003chttps://mariadb.org/download/\u003e, select the\nZIP archive package, and unzip it into a `mariadb` directory inside the `metabib`\ndirectory. `metabib` will discover binaries such as `mariadbd.exe`,\n`mariadb-install-db.exe`, and `mariadb.exe` from that tree automatically.\n\nThe same ZIP/tarball approach also works on Linux. On Linux it is often simpler\nto install the distribution package instead, for example:\n\n```sh\nsudo apt install mariadb-server -y\n```\n\nIf you only want the binaries available for `metabib` managed mode and do not\nwant MariaDB running as a system service, disable the service after installing\nit, for example:\n\n```sh\nsudo systemctl disable mariadb\n```\n\nOn Synology, install the `MariaDB 10` package and point `metabib` at the packaged\nbinaries explicitly, for example:\n\n```yaml\nversion: 1\nprocessing:\n  manifests:\n    archive_dir: \"/volume4/backup/library/manifests\"\ndatabase:\n  server_path: \"/volume4/@appstore/MariaDB10/usr/local/mariadb10.11/bin/mariadbd\"\n  install_db_path: \"/volume4/@appstore/MariaDB10/usr/local/mariadb10.11/bin/mariadb-install-db\"\n  client_path: \"/volume4/@appstore/MariaDB10/usr/local/mariadb10.11/bin/mariadb\"\n```\n\n## Usage\n\n### Flibusta Script\n\n`scripts/fb2_flibusta.sh` automates the common Flibusta FB2 workflow. The\n`metabib` executable is expected to be in the same directory as the script; if\n`metabib.yaml` exists there, it is passed to every `metabib` invocation.\n\nRun the full update workflow:\n\n```sh\nscripts/fb2_flibusta.sh /volume4/backup/library full\n```\n\nRun indexing only from existing local archives and the latest existing SQL dump\ndirectory matching `\u003clibrary-root\u003e/flibusta_*`:\n\n```sh\nscripts/fb2_flibusta.sh /volume4/backup/library reindex\n```\n\nBoth modes accept an optional third argument with the user account whose home\ndirectory should be used as the working directory. This is useful for Synology\nTask Scheduler setups:\n\n```sh\nscripts/fb2_flibusta.sh /volume4/backup/library full myuser\n```\n\nThe `full` mode runs `fetch`, `rollup`, `cache`, `merge`, and `mhl-inpx`. It exits\nearly when no new daily archives are downloaded or when rollup does not finalize a\nnew archive. The `reindex` mode skips download and rollup and reruns `cache`,\n`merge`, and `mhl-inpx` from already available data.\n\nExpected library layout under `\u003clibrary-root\u003e`:\n\n- `flibusta/`: finalized local FB2 archives and active `.merging` archive.\n- `upd_flibusta/`: downloaded daily update archives.\n- `flibusta_\u003ctimestamp\u003e/`: downloaded SQL dumps.\n- `inpx/`: generated INPX files and merged JSONL sidecars/parts.\n\nThe script writes a console log next to itself named like\n`flibusta_full_20260622_103000.log` or `flibusta_reindex_20260622_103000.log`.\nFor a single combined script and `metabib` debug log, configure logging in the\nsame-directory `metabib.yaml` like this:\n\n```yaml\nlogging:\n  console:\n    level: debug\n  file:\n    level: none\n```\n\nWith that configuration, the script log includes phase separators, `metabib`\ndebug messages, and MariaDB process/client output. The script no longer manages\nor renames `metabib.log`.\n\n### Fetch Remote Updates\n\nDownload new daily archive ZIPs and current SQL dumps using a configured remote\nlibrary profile:\n\n```sh\nmetabib fetch --library flibusta --to upd_flibusta --tosql flibusta_20260622 --continue\n```\n\n`fetch` replaces the old `libget2` role. It reads profiles from the `fetch`\nsection of the YAML configuration, discovers the last local book ID from existing\n`fb2-*.zip` or `fb2-*.merging` archives in `--to`, downloads only newer daily\narchive updates, and decompresses downloaded `*.sql.gz` dumps into `--tosql`.\nWhen `--tosql` is omitted, the SQL output directory is generated from the library\nname and current UTC timestamp. Use `--nosql` to download archive updates only.\n\nThe command preserves the old `libget2` automation exit-code contract: exit code\n`0` means no new archive updates were downloaded, exit code `1` means an error\noccurred, and exit code `2` means one or more new archive updates were downloaded.\nUse code `2` to decide whether archive rollup or index/cache rebuild work is\nneeded.\n\nAvailable `fetch` arguments:\n\n- `--library NAME`, `-l NAME`: fetch profile name from configuration. Default is\n  `flibusta`.\n- `--to DIR`, `-o DIR`: required destination directory for daily archive ZIPs.\n- `--tosql DIR`: destination directory for decompressed SQL dump files.\n- `--nosql`: skip SQL dump downloads.\n- `--retry N`: download attempts per index or file. Default is `3`.\n- `--timeout SECONDS`: per-request timeout. Default is `20`.\n- `--chunksize MB`: download chunk size used while streaming files. Default is\n  `10`.\n- `--continue`: resume partial downloads when the server supports ranges.\n- `--sticky`: ignore HTTP redirects and keep using the original host.\n\n### Roll Up Daily Archives\n\nRoll downloaded daily update ZIPs into local size-bounded FB2 archives:\n\n```sh\nmetabib rollup --archives flibusta --updates upd_flibusta --keep-updates\n```\n\n`rollup` replaces the old `libmerge` role. It keeps finalized archives and the\nactive `.merging` archive in `--archives`, reads daily update ZIPs from each\n`--updates` directory, and appends ZIP entries without recompressing them. If no\n`--updates` directory is provided, `rollup` scans `--archives` for update ZIPs as\nwell. Generated archive names use the ID width of the existing `.merging` archive\nor latest finalized `fb2-*.zip`; new archive directories default to 10-digit IDs.\n\nThe command preserves the old `libmerge` automation exit-code contract: exit code\n`0` means no finalized archive was produced, exit code `1` means an error\noccurred, and exit code `2` means one or more finalized `fb2-*.zip` archives were\ncreated. Use code `2` to decide whether cache/index rebuild work is needed.\n\nAvailable `rollup` arguments:\n\n- `--archives DIR`, `-a DIR`: required directory for finalized `fb2-*.zip`\n  archives and the active `fb2-*.merging` archive.\n- `--updates DIR`, `-u DIR`: directory containing daily update ZIPs; can be\n  repeated. Defaults to `--archives` when omitted.\n- `--size MB`: finalized archive target size in decimal megabytes. Default is\n  `2000`.\n- `--keep-updates`: keep consumed daily update ZIPs instead of removing them.\n\n### Build Cache Manifests\n\n```sh\nmetabib cache \\\n  --database-dumps /path/to/sql-dumps \\\n  --archives /path/to/flibusta\n\nmetabib merge \\\n  --database-dumps /path/to/sql-dumps \\\n  --archives /path/to/flibusta \\\n  --output metabib\n```\n\nTo use an already imported database:\n\n```sh\nmetabib cache --rebuild --no-import --database-dumps /path/to/sql-dumps\n```\n\nForce a clean managed database rebuild:\n\n```sh\nmetabib cache --rebuild --database-dumps /path/to/sql-dumps --db-overwrite\n```\n\nUse an existing MariaDB service instead of a managed one:\n\n```sh\nmetabib cache --rebuild --database-dumps /path/to/sql-dumps --config metabib.yaml\n```\n\nBuild only archive manifests without starting MariaDB:\n\n```sh\nmetabib cache --archives /path/to/flibusta\n```\n\n`cache` builds missing selected manifests by default. Existing manifests are\nchecked using source modification times; stale or invalid manifests fail unless\n`--rebuild` is used. Use `cache --check-md5` to additionally verify MD5 checksums\nrecorded in existing manifests.\n\nManifests are portable across directories and machines. Stored absolute paths are\nkept as provenance, but manifest matching uses archive or dump file names,\nrecorded metadata, processing settings, timestamps for freshness, and optional\nMD5 checksums when `--check-md5` is enabled.\n\nBy default, `cache` requires all SQL dump files to report the same dump date\nbefore import. Use `cache --allow-dump-date-mismatch` to accept mixed dump dates;\nper-file dump dates are still recorded, while the top-level manifest `dump_date`\nis omitted.\n\nMerge from archives only, database only, or both:\n\n```sh\nmetabib merge --archives /path/to/flibusta --output archive-only\nmetabib merge --database-dumps /path/to/sql-dumps --output database-only\nmetabib merge --database-dumps /path/to/sql-dumps --archives /path/to/flibusta --output combined\n```\n\n`merge` never starts MariaDB and never reads archives directly. It fails when a\nselected manifest is missing, invalid, or stale. Use `--check-md5` for full\nsource checksum verification, or `--allow-stale` to warn and continue with stale\nmanifests.\n\nMerged JSONL output is zstd-compressed by default, using the same compression\nlevel as manifest files. Use `--output-compression zstd`, `gz`, `zip`, or `none`\nto select a different output container. The `--output` value is an output prefix,\nnot a final file name: `metabib merge --output all` writes files named like\n`all.\u003cbookid_start\u003e-\u003cbookid_end\u003e.jsonl.zst`. Existing output files are replaced;\nwhen that happens, `metabib` logs an overwrite warning.\n\n`merge` also writes a metadata sidecar using the same compression mode, for\nexample `all.meta.json.zst`. It records the database dump date and archive entry\nlayout needed for exact MyHomeLib INPX generation.\n\n### INPX Generation\n\nBuild a MyHomeLib-compatible FB2 INPX from merged JSONL parts:\n\n```sh\nmetabib mhl-inpx --input all --output flibusta\n```\n\n`mhl-inpx` is intentionally FB2-only. It consumes the merged JSONL dataset and the\nmerge metadata sidecar; it does not read SQL dumps, start MariaDB, or parse FB2\narchives directly. FB2 fallback metadata is read from\n`sources.fb2.description.title_info`.\n\nAvailable `mhl-inpx` arguments:\n\n- `--input PREFIX`, `-i PREFIX`: required input prefix. `metabib mhl-inpx --input all`\n  discovers one `all.meta.json*` sidecar and all matching `all.*.jsonl*` parts,\n  including uncompressed, zstd, gzip, and ZIP-compressed merge outputs.\n- `--output PREFIX`, `-o PREFIX`: required output prefix. The dump date from merge\n  metadata is appended automatically, so `--output flibusta` writes a file named\n  like `flibusta_20260603.inpx`.\n- `--format MODE`: INPX record layout. Supported values are `2x` and `ruks`.\n  Default is `2x`, matching the classic MyHomeLib/lib2inpx format. `ruks` appends\n  MD5 and replacement fields when available.\n- `--sequence MODE`: database sequence selection. Supported values are `author`,\n  `publisher`, and `ignore`. Default is `author`, matching lib2inpx FB2 mode.\n- `--prefer-fb2 MODE`: how FB2 metadata is used relative to database metadata for\n  authors and sequences. Supported values are `ignore`, `merge`, `complement`,\n  and `replace`. Default is `complement`, matching the historical Flibusta script:\n  database authors and sequence data are preferred when present, and FB2 metadata\n  fills missing values. Use `replace` when FB2 author order should win.\n\nINPX-specific defaults live under the `inpx` section of the YAML configuration.\nThey include MyHomeLib field length limits and the `collection.info` template.\nThe default template is lib2inpx-compatible.\n\n```yaml\ninpx:\n  quick_fix: true\n  comment_template: \"\\ufeff%s FB2 - %s\\r\\n%s\\r\\n65536\\r\\nЛокальные архивы библиотеки %s (FB2) %s\"\n```\n\nTemplate arguments are library name, display date, generated collection name,\nlibrary name, and display date. If you replace the template and still need\nMyHomeLib/lib2inpx compatibility, keep the leading `\\ufeff` BOM.\n\nExisting INPX output is replaced only after the new archive is fully written. If\nan existing file is overwritten, `metabib` logs a warning. During generation,\n`metabib` logs the selected metadata, input part count, record loading progress,\none live message per created `.inp` member, and final aggregate INPX statistics.\n\nManifest cache files are zstd-compressed JSONL payloads named `.manifest.zst`,\nfor example `lib.manifest.zst` or `database.manifest.zst`. When archive\nmanifests are stored in a central `processing.manifests.archive_dir`, the first\narchive with a basename keeps the usual manifest name. Later archives with the\nsame basename get a source-qualified manifest name and a warning is logged.\n\nUse global `--verbose` to enable detailed progress reporting.\n\nWhen `--output-part-size` is used, output files are named\nwith zero-padded book-id ranges so they sort naturally, for example:\n\n```text\nmetabib.0000000001-0000120345.jsonl.zst\nmetabib.0000120346-0000240872.jsonl.zst\n```\n\nDump the default configuration:\n\n```sh\nmetabib dumpconfig --default metabib.yaml\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frupor-github%2Fmetabib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frupor-github%2Fmetabib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frupor-github%2Fmetabib/lists"}