{"id":43223275,"url":"https://github.com/gnames/gndb","last_synced_at":"2026-03-06T23:05:29.302Z","repository":{"id":317770903,"uuid":"1068026576","full_name":"gnames/gndb","owner":"gnames","description":"GNdb creates and populates database that enables install of GNverifier locally or remotely.","archived":false,"fork":false,"pushed_at":"2025-11-13T10:32:21.000Z","size":101941,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-13T12:19:41.771Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gnames.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-01T18:36:22.000Z","updated_at":"2025-10-31T20:51:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"095b5b75-c22c-47e9-a0bb-3dd3ceb79737","html_url":"https://github.com/gnames/gndb","commit_stats":null,"previous_names":["gnames/gndb"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/gnames/gndb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgndb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgndb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgndb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgndb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gnames","download_url":"https://codeload.github.com/gnames/gndb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgndb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28974534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T08:16:14.655Z","status":"ssl_error","status_checked_at":"2026-02-01T08:06:51.373Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-01T09:15:48.423Z","updated_at":"2026-03-06T23:05:29.284Z","avatar_url":"https://github.com/gnames.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GNdb\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18895372.svg)](https://doi.org/10.5281/zenodo.18895372)\n\nGNdb is a command-line tool for creating and managing a PostgreSQL database\nfor a local [GNverifier] instance.\n\n\u003c!-- vim-markdown-toc GFM --\u003e\n\n* [Introduction](#introduction)\n* [Prerequisites](#prerequisites)\n* [Installation](#installation)\n  * [Download binary](#download-binary)\n  * [Install with Go](#install-with-go)\n  * [Build from source](#build-from-source)\n* [Quick Start](#quick-start)\n* [Next Steps: Running GNverifier](#next-steps-running-gnverifier)\n* [Commands](#commands)\n  * [create](#create)\n  * [populate](#populate)\n  * [optimize](#optimize)\n  * [migrate](#migrate)\n* [Configuration](#configuration)\n  * [Config file](#config-file)\n  * [Environment variables](#environment-variables)\n  * [CLI flags](#cli-flags)\n* [Data Sources](#data-sources)\n  * [Standard sources](#standard-sources)\n  * [Custom sources](#custom-sources)\n  * [SFGA file formats](#sfga-file-formats)\n  * [File naming convention](#file-naming-convention)\n  * [Remote sources](#remote-sources)\n* [Artificial Intelligence Policy](#artificial-intelligence-policy)\n* [Authors](#authors)\n* [License](#license)\n\n\u003c!-- vim-markdown-toc --\u003e\n\n## Introduction\n\n[GNverifier] is a scientific name verification service that reconciles\nscientific names against multiple biodiversity data sources. It detects\nmisspellings via fuzzy matching, identifies accepted names for taxa, and\nretrieves vernacular/common names. GNdb is the tool that builds and\nmaintains the PostgreSQL database that a GNverifier server runs against.\n\nGNverifier is available as a centralized service, but it does not always\nhave a particular data source or the most recent version of one. A local\ninstance gives you full control over which sources are included and when\nthey are updated.\n\n**When to use a local instance:**\n\n- You need a data source not available in the central service\n- You have private or institutional data not suitable for a public service\n- You need a specific version or snapshot of a data source\n- You are deploying GNverifier in an offline or air-gapped environment\n- You need a dedicated high-throughput server for your organization\n\n## Prerequisites\n\nGNdb stores data in a `gnames` PostgreSQL database. Install PostgreSQL for\nyour operating system, create the database, and make sure your user has\nthe necessary permissions. It is also very useful to tweak `postgresql.conf`\nand optimize it according to CPU and memory available on the computer.\n\n```bash\n# Example: create the database\ncreatedb gnames\n```\n\nEdit `~/.config/gndb/config.yaml` to provide `gndb` information how to\nconnect to the database. This file will be created after installing `gndb`\nand running it for the first time without any subcommands, for example as\n`gndb` or `gndb -V`.\n\n## Installation\n\n### Download binary\n\nDownload the latest pre-built binary for your platform from the [releases\npage], unpack the archive, and place the `gndb` binary somewhere in your `PATH`\n(e.g. `/usr/local/bin`).\n\n### Install with Go\n\nIf you have Go installed:\n\n```bash\ngo install github.com/gnames/gndb@latest\n```\n\nMake sure `$HOME/go/bin` is in your `PATH`.\n\n### Build from source\n\n```bash\ngit clone https://github.com/gnames/gndb.git\ncd gndb\njust install\n```\n\nThis builds the binary and installs it to `$HOME/go/bin/gndb`.\n\n## Quick Start\n\nThe typical workflow to set up a local GNverifier database:\n\n```bash\n# 1. Create the PostgreSQL database (run once)\ncreatedb gnames\n# or connect to PostgreSQL and run:\n#   CREATE DATABASE gnames;\n\n# 2. Create the GNverifier schema inside the database\ngndb create\n\n# 3. Populate the database with the data sources you need.\n#    Standard sources (IDs \u003c 1000) are pre-configured in sources.yaml\n#    and downloaded automatically from opendata.globalnames.org/sfga.\ngndb populate -s 1,11,4\n\n# 4. Optimize the database for fast name verification\n#    This step runs several optimization steps and denormalizes data to\n#    a materialized view to speedup queries.\ngndb optimize\n```\n\nStandard source IDs and their names are listed in\n`~/.config/gndb/sources.yaml`, which is created automatically on the\nfirst run. Run `gndb populate` without flags to import all configured\nsources at once (it will take a long time).\n\n`gndb populate` can be run multiple times to add sources incrementally,\nincluding after a previous `gndb optimize`. Always run `gndb optimize`\nonce at the end, after all desired sources have been imported.\n\nAfter step 4 the database is ready for GNverifier.\n\n## Next Steps: Running GNverifier\n\nOnce the database is ready, install and configure the [GNverifier] server\nto connect to your `gnames` database. See the [GNverifier] README for\ninstallation and configuration instructions. GNverifier has a\nserver/client architecture: most users only need to run the server and\naccess verification results through its REST API or command-line client.\n\n## Commands\n\n### create\n\nCreates the GNverifier database schema from scratch.\n\n```bash\n# Create schema (prompts for confirmation if tables already exist)\ngndb create\n\n# Drop existing tables without confirmation\ngndb create --force\ngndb create -f\n```\n\n**What it does:**\n\n1. Connects to PostgreSQL using the configured credentials\n2. Warns and prompts if the database already has tables\n3. Creates all base tables using GORM AutoMigrate\n4. Sets collation for correct scientific name sorting\n\n### populate\n\nImports nomenclature data from SFGA sources into the database.\n\n```bash\n# Import all sources listed in sources.yaml\ngndb populate\n\n# Import specific sources by ID\ngndb populate --source-ids 1,11,132\ngndb populate -s 1,11,132\n\n# Override release metadata for a single source\ngndb populate -s 1 --release-version \"2024.01\" --release-date \"2024-01-15\"\n\n# Use flat (non-hierarchical) classification\ngndb populate --flat-classification\n```\n\n| Flag | Short | Description |\n| ---- | ----- | ----------- |\n| `--source-ids` | `-s` | Comma-separated source IDs to import (default: all) |\n| `--release-version` | `-r` | Override version string (single source only) |\n| `--release-date` | `-d` | Override date `YYYY-MM-DD` (single source only) |\n| `--flat-classification` | `-f` | Use flat rather than hierarchical classification |\n\n**What it does:**\n\n1. Connects to PostgreSQL and verifies the schema exists\n2. Reads `~/.config/gndb/sources.yaml` to discover SFGA files\n3. Opens each SFGA SQLite file (local path or remote URL)\n4. Imports data in phases: source metadata, name-strings, vernacular\n   names, classification hierarchy, and name indices\n5. Reports progress and final statistics\n\nYou can run `gndb populate` multiple times to add more sources. Run\n`gndb optimize` after all desired sources are imported.\n\n### optimize\n\nPrepares the database for fast name verification queries.\n\n```bash\ngndb optimize\n```\n\n**What it does:**\n\n1. Reparses all name-strings using the latest [GNparser]\n2. Builds canonical forms (simple, full, stemmed)\n3. Creates word indexes for advanced name search\n4. Builds materialized views and runs VACUUM ANALYZE\n\nOptimization may take 20–90 minutes depending on the dataset size.\nProgress bars show the current status. You can re-run this command\nany time to apply improvements from a newer version of GNparser.\n\n### migrate\n\nUpdates the database schema to the latest version after a GNdb upgrade.\nRun this command in case you already have older version of PostgreSQL\ndatabase, and in the most recent version the schema did change.\n\n```bash\n# Migrate schema only (drops materialized views)\ngndb migrate\n\n# Migrate and recreate materialized views immediately\ngndb migrate --recreate-views\ngndb migrate -v\n```\n\nMost of the time the migration would run before populating new data.\nIn such cases there is no need to recreate materialized views, they\nwill be restored after all data is imported during the `optimize` step.\nGORM AutoMigrate adds new tables and columns but never removes existing\nones, making migrations safe. After migrating, run `gndb populate` and\nthen `gndb optimize` to rebuild views with fresh data.\n\n## Configuration\n\nConfiguration is resolved in the following precedence order (highest first):\n\n```\nCLI flags  \u003e  environment variables  \u003e  config file  \u003e  defaults\n```\n\n### Config file\n\nThe config file is created automatically at `~/.config/gndb/gndb.yaml` on\nthe first run. Edit it to set persistent settings:\n\n```yaml\ndatabase:\n  host: localhost\n  port: 5432\n  user: postgres\n  password: \"\"\n  database: gnames\n  ssl_mode: disable\n  batch_size: 50000\n\nlog:\n  level: info        # debug, info, warn, error\n  format: json       # json, text\n  destination: file  # file, stderr\n\njobs_number: 8\n```\n\nGNdb also creates `~/.config/gndb/sources.yaml` on first run. Edit it to\nconfigure your SFGA data sources (see [Data Sources](#data-sources)).\nIf you import your own data sources, make sure they have IDs from 1001 and\nhigher.\n\nLog files are written to `~/.local/share/gndb/logs/gndb.log`.\n\n### Environment variables\n\nAll config file fields can be overridden with environment variables using\nthe `GNDB_` prefix:\n\n```bash\nexport GNDB_DATABASE_HOST=localhost\nexport GNDB_DATABASE_PORT=5432\nexport GNDB_DATABASE_USER=postgres\nexport GNDB_DATABASE_PASSWORD=secret\nexport GNDB_DATABASE_DATABASE=gnames\nexport GNDB_DATABASE_SSL_MODE=disable\nexport GNDB_DATABASE_BATCH_SIZE=50000\nexport GNDB_LOG_LEVEL=info\nexport GNDB_LOG_FORMAT=json\nexport GNDB_LOG_DESTINATION=file\nexport GNDB_JOBS_NUMBER=8\n```\n\n### CLI flags\n\nRun `gndb --help` or `gndb \u003ccommand\u003e --help` for the full list of flags.\nThe version flag follows the convention used across Global Names tools:\n\n```bash\ngndb -V   # print version and build timestamp\n```\n\n## Data Sources\n\nGNdb supports two kinds of sources: **standard** sources maintained by the\nGlobal Names project, and **custom** sources you provide yourself.\n\n### Standard sources\n\nStandard sources have IDs below 1000. They are pre-configured in\n`~/.config/gndb/sources.yaml` and their SFGA files are hosted on\n`opendata.globalnames.org/sfga`. Running `gndb populate -s \u003cid\u003e` is all\nthat is needed — GNdb downloads the latest file for that source\nautomatically.\n\nTo see which standard sources are available, open\n`~/.config/gndb/sources.yaml` after the first run of `gndb`.\n\n### Custom sources\n\nCustom sources have IDs of 1000 or higher. Use the [SF] tool to convert\nyour data (Darwin Core Archive, CoLDP, Excel spreadsheet, plain name\nlist, etc.) into an SFGA file. Place the resulting file on your local\ncomputer or on a web server, then register it in\n`~/.config/gndb/sources.yaml`:\n\n```yaml\ndata_sources:\n  - id: 1001\n    parent: \"/path/to/sfga/files/\"\n    title_short: \"My Custom Source\"\n    home_url: \"https://example.org/my-source\"\n    is_curated: true\n    has_classification: true\n\n  - id: 1002\n    parent: \"https://releases.example.org/sfga/\"\n    title_short: \"VASCAN\"\n    is_curated: true\n    has_classification: true\n```\n\nThe `parent` field must point to the **immediate parent directory** (or\nURL) of the SFGA file — GNdb searches only that location, not\nsubdirectories. Without the correct parent path the file will not be\nfound.\n\n| Field | Description |\n| ----- | ----------- |\n| `id` | Unique integer ID. Use `\u003c 1000` for standard sources, `\u003e= 1000` for custom |\n| `parent` | Immediate parent directory of the SFGA file (local path or URL) |\n| `title_short` | Short display name for the data source |\n| `home_url` | URL of the source's home page |\n| `is_curated` | Whether the source is expert-curated |\n| `is_auto_curated` | Whether the source is algorithmically curated |\n| `has_classification` | Whether the source includes taxonomic classification |\n\n### SFGA file formats\n\nGNdb recognizes four file formats:\n\n| Format | Extension | Notes |\n| ------ | --------- | ----- |\n| SQLite binary | `.sqlite` | Fastest to process |\n| Zipped SQLite | `.sqlite.zip` | Smallest download size; preferred over `.sql.zip` |\n| SQL dump | `.sql` | Plain-text SQL statements |\n| Zipped SQL dump | `.sql.zip` | Compressed SQL dump |\n\nWhen multiple files match a source ID, GNdb selects the one with the\nlatest date embedded in the filename (`YYYY-MM-DD` format). On equal\ndates the preference order is: `.sqlite.zip` \u003e `.sql.zip` \u003e `.sqlite`\n\u003e `.sql`.\n\n### File naming convention\n\nFiles are matched to a source by their numeric ID prefix. GNdb tries\nzero-padded variants in order: `0001`, `001`, `01`, `1`. The prefix\nmust be followed by `-`, `_`, or `.` to prevent false matches.\n\n```\n0147-vascan-2025-08-25.sqlite.zip       ← matched for id: 147\n1000_ruhoff_2023-08-22_v1.0.0.sqlite   ← matched for id: 1000\n0196.sql                                 ← matched for id: 196\n```\n\nVersion (`v1.0.0`) and date (`YYYY-MM-DD`) embedded in the filename are\nextracted automatically and stored as source metadata. They can be\noverridden at import time with the `--release-version` and\n`--release-date` flags.\n\n### Remote sources\n\nThe `parent` field can be an HTTP/HTTPS URL pointing to a web directory\nlisting (Apache or nginx style). GNdb fetches the listing, identifies\nthe matching file by ID, downloads it to `~/.cache/gndb/sfga/`, and\nimports it from there.\n\nUse the [SF] tool to convert Darwin Core Archives, CoLDP packages, and\nother biodiversity formats into SFGA.\n\n## Artificial Intelligence Policy\n\nWe use artificial intelligence to help find algorithms, decide on\nimplementation approaches, and generate code. All automatically generated\ncode is carefully reviewed, with inconsistencies fixed, superfluous\nimplementations removed, and optimizations improved. No code that we do\nnot understand or approve makes it into published versions of GNdb. We\nprimarily use Claude Code, with limited use of Gemini CLI.\n\n## Authors\n\n[Dmitry Mozzherin]\n\n## License\n\nReleased under [MIT License]\n\n[Datasette]: https://datasette.io\n[Dmitry Mozzherin]: https://github.com/dimus\n[GNparser]: https://github.com/gnames/gnparser\n[GNverifier]: https://github.com/gnames/gnverifier\n[MIT License]: LICENSE\n[SF]: https://github.com/sfborg/sf\n[SFGA]: https://github.com/sfborg/sfga\n[SQLite DB viewer]: https://sqlitebrowser.org/\n[releases page]: https://github.com/gnames/gndb/releases/latest\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnames%2Fgndb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgnames%2Fgndb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnames%2Fgndb/lists"}