{"id":50995271,"url":"https://github.com/nao1215/omokage","last_synced_at":"2026-06-20T08:32:01.684Z","repository":{"id":362051843,"uuid":"1256158761","full_name":"nao1215/omokage","owner":"nao1215","description":"Measure how closely writing matches a learned author's style. Japanese \u0026 English, local-first CLI for LLMs and humans.","archived":false,"fork":false,"pushed_at":"2026-06-11T14:53:25.000Z","size":2476,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-11T16:24:15.874Z","etag":null,"topics":["authorship-attribution","cli","golang","japanese","markdown","stylometry","text-analysis","writing-style"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nao1215.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T14:13:20.000Z","updated_at":"2026-06-11T14:53:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nao1215/omokage","commit_stats":null,"previous_names":["nao1215/omokage"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/nao1215/omokage","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fomokage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fomokage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fomokage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fomokage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nao1215","download_url":"https://codeload.github.com/nao1215/omokage/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fomokage/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34563535,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-20T02:00:06.407Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["authorship-attribution","cli","golang","japanese","markdown","stylometry","text-analysis","writing-style"],"created_at":"2026-06-20T08:32:01.606Z","updated_at":"2026-06-20T08:32:01.676Z","avatar_url":"https://github.com/nao1215.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build](https://github.com/nao1215/omokage/actions/workflows/build.yml/badge.svg)](https://github.com/nao1215/omokage/actions/workflows/build.yml)\n[![MultiPlatformUnitTest](https://github.com/nao1215/omokage/actions/workflows/unit_test.yml/badge.svg)](https://github.com/nao1215/omokage/actions/workflows/unit_test.yml)\n[![reviewdog](https://github.com/nao1215/omokage/actions/workflows/reviewdog.yml/badge.svg)](https://github.com/nao1215/omokage/actions/workflows/reviewdog.yml)\n[![Coverage](https://github.com/nao1215/omokage/actions/workflows/coverage.yml/badge.svg)](https://github.com/nao1215/omokage/actions/workflows/coverage.yml)\n[![Go Reference](https://pkg.go.dev/badge/github.com/nao1215/omokage.svg)](https://pkg.go.dev/github.com/nao1215/omokage)\n[![Go Report Card](https://goreportcard.com/badge/github.com/nao1215/omokage)](https://goreportcard.com/report/github.com/nao1215/omokage)\n![GitHub](https://img.shields.io/github/license/nao1215/omokage)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"doc/img/omokage-icon.jpg\" alt=\"omokage\" width=\"320\"\u003e\n\u003c/p\u003e\n\nomokage learns how you write from your past writing, then scores how close a new draft is to that style. It runs locally, works on Japanese and English, and never uses the network.\n\n![demo](./doc/img/demo.gif)\n\n## What it does (and doesn't)\n\n- Compares style — sentence shape, register (敬体 / 常体), kanji/kana balance, word and character patterns — between a draft and a trained author, and points out where they differ.\n- Does not judge meaning, correctness, originality, or quality. It is not an AI-text detector. A high score means only \"this reads like the voice you trained.\"\n\nIt is built for an LLM as much as for a person: an agent can run `check` after each rewrite, read the differences, and revise until the draft sits closer to the voice.\n\n## Install\n\n```shell\ngo install github.com/nao1215/omokage@latest\n```\n\nRuns on Windows, macOS, and Linux. Building from source needs Go 1.25 or later.\n\n## Quick start\n\nThe repo ships a small example corpus under [examples/](./examples) to follow along.\n\n```shell\n$ omokage init                                   # writes omokage.toml, profiles/, cache/\n$ omokage train --author me examples/en/posts    # learn a voice from .md/.txt files\nTrained author \"me\" from 8 files.\nProfile: /home/me/blog/profiles/me.db\n\nCorpus reliability: good.\n$ omokage check examples/en/draft-keeps-voice.md  # score a draft (one profile needs no --author)\nAuthor: me\nSimilarity: 92% (this author's self-similarity median 90%, range 75-91%)\n\nDifferences:\n- character n-gram \"gh\" is higher than reference\n- function word \"at\" is higher than reference\n- character n-gram \"ht\" is higher than reference\n```\n\n`train` takes any mix of directories (scanned for `.md`/`.txt`) and individual files; a file reached twice is learned once. It reads local files only — a URL, a missing path, or an unsupported extension stops the run by name and trains nothing.\n\nThe same idea rewritten in a stiff, formal voice scores low:\n\n```shell\n$ omokage check --author me examples/en/draft-lost-voice.md\nAuthor: me\nSimilarity: 0% (this author's self-similarity median 90%, range 75-91%)\n\nDifferences:\n- average sentence length is higher than reference\n- paragraph length variance is higher than reference\n- sentence length variance is higher than reference\n```\n\n`omokage diff A B` compares two files directly, without training a profile.\n\n## Checking a corpus\n\nScores are only as steady as the corpus behind them. A good corpus is several documents (aim for eight or more), each a few paragraphs long, all in one consistent voice. `doctor` rates a corpus — training and writing nothing — and names what to fix:\n\n```shell\n$ omokage doctor ~/writing/posts\nCorpus: 8 documents, 142 sentences, 5210 characters (avg 651 per document)\nReliability: good\n\nNo problems found: enough material, a consistent voice, and no obvious outliers.\n\nThese checks look at sample size and consistency, not writing quality.\n```\n\n```shell\n$ omokage doctor ~/drafts\nCorpus: 3 documents, 9 sentences, 140 characters (avg 46 per document)\nReliability: weak\n\nFindings:\n- [warning] Only 3 documents. The measured spread is barely an estimate, so scores will be noisy.\n    → Add more samples of this voice; 8 or more documents give steadier scores.\n- [warning] 3 of 3 documents are short (under 150 characters).\n    a.md, b.md, c.md\n    → Short samples make per-document features jumpy; prefer samples of a few paragraphs.\n\nThese checks look at sample size and consistency, not writing quality.\n```\n\n![doctor demo](./doc/img/doctor.gif)\n\n`doctor --format json` returns the same report as data. `train` prints the reliability too, and `show --format json` stores the rating and findings so you can inspect a profile later. A mixed corpus is usually flagged by the feature it disagrees on, often the register or kanji/kana balance; the fix is to split it into one profile per voice.\n\n## Choosing the author\n\n`--author` is just a profile name; it need not be a person. Name a profile for a purpose — `--author blog`, `--author docs` — and train each on the writing that belongs to it. `check` and `show` resolve the author as: `--author` if given, else `default_author`, else the only profile, else an error (they never silently pick one). Set a default with `train --author me --default ...`.\n\n## Output modes\n\n`check` reads one file; pick how you want the result:\n\n| Mode | Output | For |\n| --- | --- | --- |\n| (default) | similarity score + top differences | quick, human-facing checks |\n| `--score-only` | the integer 0-100 | shell pipelines, pass/fail gates |\n| `--explain` | per-feature drift (value, mean ± spread, z-score) + the paragraphs that drift most | final by-hand tuning |\n| `--format json` | the `--explain` detail as JSON, plus `term_warnings` | an LLM or tool reading between rewrites |\n\n`--explain` and `--format json` split the draft into paragraphs, so they are opt-in and plain `check` stays fast. `--score-only` can't be combined with them.\n\n```shell\n$ score=$(omokage check --score-only draft.md)\n$ [ \"$score\" -ge 70 ] \u0026\u0026 echo \"close enough\"\n\n$ omokage check --author me --explain examples/en/draft-lost-voice.md\nAuthor: me\nSimilarity: 0% (this author's self-similarity median 90%, range 75-91%)\nScore driver: lexical\nScoring note: This score is computed from the full fingerprint and structure mix; the paragraph-level scalar drift below is supporting detail and usually contributes less than the lexical fingerprint.\n\nHigh-level style differences (fix these first):\n  1. average sentence length is higher than reference [structure]\n       target 292.3  reference 40.4 ± 1.540  (62.3σ)\n  ...\n\nParagraphs that drift most:\n  #2 (74.9σ; average sentence length higher): Subsequently, a notebook is utilized for the purpo…\n```\n\n![explain demo](./doc/img/explain.gif)\n\n## Using omokage with an LLM\n\nTrain once, then have the agent run `check --format json` after each rewrite. The JSON leads with `high_level_drift`, the editable features, each with a `priority` and `actionable` flag; `segments` points at the paragraphs that drift most, and `term_warnings` flags notation that differs from your learned preference. For a lighter payload, `show --author me --format json --summary` returns provenance and the quality rating without the often large term list. omokage tells the agent how close the draft sits to your voice and where it strays, not whether it is correct or good, so keep a human in the loop.\n\n## Term preferences\n\n`train` also learns which surface form you use for a recurring term (`DB` vs `データベース`, `HTTP` vs `http`), stored in the same per-author database — no LLM, no network, no dictionary, and only surfaces and counts are kept, never the text. A `normalized_key` folds case and full/half-width ASCII so `DB`, `db`, and `ＤＢ` share a key; a `group_key` merges a Japanese phrase with its acronym only when the corpus declares the bridge (`データベース（DB）`). `show --format json` lists them under `term_preferences`, and `check --format json` adds `term_warnings`; both appear only in JSON, so plain `check` is unchanged.\n\n## Managing profiles and stores\n\n```shell\n$ omokage list [--long]                # names, or trained_at / file count / source(s)\n$ omokage show --author me             # how a profile was trained (--format json for more)\n$ omokage rename --author me --to watashi\n$ omokage remove --author watashi\n```\n\nBy default omokage finds an `omokage.toml` by walking up from the current directory (a project-local store). `omokage init --global` makes a per-user store under `$OMOKAGE_HOME` (or your user config dir) that any directory falls back to; a local project always wins inside its tree. `--config PATH` / `--profile-dir PATH` point at a specific store.\n\n## How it scores\n\nTraining measures a set of stylistic features per document and stores their mean and spread in a SQLite database under `profiles/`, one per author. It stores only the numbers, never the text. The features are register (敬体 / 常体), script balance (kanji/hiragana/katakana), function words, character n-grams, and shape (sentence and paragraph length, punctuation, layout). A check measures the same features on the draft and scores each by how far it strays from your usual range, as a z-score in the spirit of Burrows's Delta: the function-word and n-gram fingerprint carries most of the signal, a clear register shift is penalized on top, and shape only nudges. The final 0-100 score is calibrated against the profile's leave-one-out self-similarity baseline, so \"90%\" means \"close to this author's own typical variation\" rather than \"three sigmas from the mean.\" Profiles trained before this baseline was added still work, but retraining is recommended so the calibrated score and self-similarity anchor are available. Code blocks are stripped first, so the score reflects prose.\n\nFor Japanese, which has no spaces between words, these features use morphological analysis (kagome) rather than whitespace tokenization: the function-word fingerprint counts particles and auxiliaries as whole morphemes (not substrings), the register is read from each sentence's closing predicate, and conjunction frequency uses a real morpheme denominator. On a held-out author-attribution test this raised accuracy over the previous heuristics. Two further Japanese signals — a part-of-speech n-gram fingerprint and lemma-based vocabulary richness — are available but off by default (`pos_ngram_frequency`, `type_token_ratio` in the config), since on that test they did not help.\n\n## Limits\n\nomokage looks at style, not meaning: it cannot tell whether a draft is correct, original, or well written, only whether it resembles the voice it was trained on. It needs a reasonable amount of text — with a few short documents the spread is wide and scores are noisy, which `doctor` and the `reliability` rating flag (they measure sample adequacy, not writing quality). It separates Japanese authors more sharply than English ones, and two people who write in the same register look more alike than they are. It is not an AI-text detector.\n\n## About the name\n\nomokage (面影) is written with 面 (face) and 影 (shadow, trace): the remembered image of someone, the likeness that comes back to mind. The name is borrowed from [Omokage](https://www.toraya-group.co.jp/products/collections/yokan-omokage), a yokan by Toraya that I like.\n\n## License\n\nMIT. See [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnao1215%2Fomokage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnao1215%2Fomokage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnao1215%2Fomokage/lists"}