{"id":44515489,"url":"https://github.com/myersm0/montre","last_synced_at":"2026-04-01T17:56:40.468Z","repository":{"id":335011006,"uuid":"1142844108","full_name":"myersm0/montre","owner":"myersm0","description":"A modern, embeddable query engine for corpus linguistics.","archived":false,"fork":false,"pushed_at":"2026-03-25T00:57:19.000Z","size":320,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-25T01:39:34.251Z","etag":null,"topics":["concordancer","conllu","corpus-linguistics","cql","digital-humanities","nlp","parallel-corpus","rust","text-mining","translation-studies"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/myersm0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-26T23:15:54.000Z","updated_at":"2026-03-24T22:14:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/myersm0/montre","commit_stats":null,"previous_names":["myersm0/montre"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/myersm0/montre","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myersm0%2Fmontre","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myersm0%2Fmontre/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myersm0%2Fmontre/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myersm0%2Fmontre/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/myersm0","download_url":"https://codeload.github.com/myersm0/montre/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/myersm0%2Fmontre/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concordancer","conllu","corpus-linguistics","cql","digital-humanities","nlp","parallel-corpus","rust","text-mining","translation-studies"],"created_at":"2026-02-13T17:06:10.662Z","updated_at":"2026-04-01T17:56:40.460Z","avatar_url":"https://github.com/myersm0.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Montre\n[![CI](https://github.com/myersm0/montre/actions/workflows/ci.yml/badge.svg)](https://github.com/myersm0/montre/actions/workflows/ci.yml)\n[![Release](https://img.shields.io/github/v/release/myersm0/montre)](https://github.com/myersm0/montre/releases/latest)\n\nA modern, embeddable corpus query engine with first-class support for aligned corpora.\n\n\u003e **montre** *(/mɔ̃tʁ/):* “shows,” “reveals,” “makes visible” — from French _montrer_, “to show.” The Latin root is _monstrare_, “to point out, indicate.”\n\nNo server, external services, or prerequisites.\n\nA corpus is a self-contained directory with its own data, indexes, and (optionally) alignments. Build it in one line from your annotation files, or from a TOML manifest describing multiple components.\n\nDesigned to be used from the CLI or embedded directly in Julia or Python.\n\n---\n\n## Install\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/myersm0/montre/main/install.sh | sh\n```\n\n\n## Quick start\n```bash\n# Build a corpus from a directory of CoNLL-U files:\nmontre build -i data/maupassant/ -o my-corpus/\n\n# Query\nmontre query my-corpus/ '[pos=\"ADJ\"] [pos=\"NOUN\"]'\n\n# Count\nmontre count my-corpus/ '[pos=\"ADJ\"] [pos=\"NOUN\"]'\nmontre count my-corpus/ '[pos=\"NOUN\"]' --by-document\nmontre count my-corpus/ '[pos=\"NOUN\"]' --by-component\n\n# Filter\nmontre query my-corpus/ '[pos=\"ADJ\"] [pos=\"NOUN\"]' --document la-parure\nmontre query my-corpus/ '[pos=\"ADJ\"] [pos=\"NOUN\"]' --component fr\n\n# Inspect\nmontre info my-corpus/\nmontre docs my-corpus/\nmontre layers my-corpus/\nmontre vocab my-corpus/ pos\nmontre vocab my-corpus/ lemma --top 50 --component fr\n```\n\n## Query language\n\nMontre uses a CQL-based language, extended with labels, constraints, and alignment-aware operations.\n\n### Core patterns\n```cql\n# Token queries\n[pos=\"NOUN\"]\n[lemma=\"maison\"]\n[word=\"chat\" \u0026 pos=\"NOUN\"]\n[lemma=/^un.*/]\n[pos!=\"PUNCT\"]\n\n# Sequences\n[pos=\"DET\"] [pos=\"ADJ\"]* [pos=\"NOUN\"]\n\n# Quantifiers\n[pos=\"ADJ\"]+\n[pos=\"ADJ\"]*\n[pos=\"ADJ\"]?\n[pos=\"ADJ\"]{2,4}\n\n# Alternation\n([pos=\"ADJ\"] | [pos=\"ADV\"])+ [pos=\"NOUN\"]\n```\n\n### Structural constraints\n```cql\n[pos=\"DET\"] [pos=\"NOUN\"] within s\n[lemma=\"chat\"] within doc\n```\n\n### Morphological features\n\nRequires using the flag `--decompose-feats` at build time.\n\n```cql\n[pos=\"NOUN\" \u0026 feats.Number=\"Plur\"]\n[feats.Gender=\"Masc\" \u0026 feats.Tense=\"Past\"]\n```\n\n### Component and document filtering\n```cql\n[pos=\"NOUN\"] within component:fr\n[pos=\"ADJ\"] [pos=\"NOUN\"] within doc:\"la-parure\",\"boule-de-suif\"\n```\n\n### Labeled captures and global constraints\n```cql\na:[pos=\"NOUN\"] []* b:[pos=\"NOUN\"] :: a.lemma = b.lemma\na:[pos=\"ADJ\"] b:[pos=\"NOUN\"] :: a.lemma != b.lemma\na:[] []{0,20} b:[] :: distance(a,b) \u003e= 5\n```\n\nConstraints are evaluated over full matches using labeled spans.\n\n## Parallel corpus support\n\nMontre was designed from the ground up specifically for parallel corpora.\n\nMontre treats a parallel corpus as a single object with multiple ***components*** and explicit alignment relations, rather than as separate corpora joined at query time.\n\n### Key features\n- Multiple components (languages, editions, translations)\n- Named alignments at any span level (sentence, paragraph, stanza)\n- Multiple competing alignment sets (LaBSE, vecalign, manual)\n- Alignment projection between components\n\n### Example\n```cql\n# Query French, project to English\n[lemma=\"maison\"] within component:fr =labse=\u003e\n```\n\nThis enables:\n- tracing translations across languages\n- detecting omissions or expansions\n- comparing editions or variants\n\n### Build a multi-component corpus\n```toml\n[corpus]\nname = \"isosceles\"\ndecompose_feats = true\n\n[components.maupassant-fr]\npath = \"data/maupassant/fr/conllu/\"\nlanguage = \"fr\"\n\n[components.maupassant-en]\npath = \"data/maupassant/en/conllu/\"\nlanguage = \"en\"\n\n[alignments.labse]\nsource = \"maupassant-fr\"\ntarget = \"maupassant-en\"\nedges = \"alignments/labse/\"\nsource_layer = \"sentence\"\ntarget_layer = \"sentence\"\n```\n\n```bash\nmontre build -m corpus.toml -o my-corpus/\n```\n\n## Performance\n\nMontre is competitive with established corpus engines while prioritizing structural flexibility and embeddability.\n\nOn a 1.5M token corpus (Maupassant French/English, Apple M4 Max):\n\n| Query | Matches | Time |\n|---|---|---|\n| `[pos=\"NOUN\"]` | 244,184 | 0.6ms |\n| `[pos=\"ADJ\"] [pos=\"NOUN\"]` | 30,672 | 12ms |\n| `[pos=\"ADJ\"]? [pos=\"NOUN\"]` | 272,019 | 71ms |\n| `([pos=\"ADJ\"] \\| [pos=\"ADV\"])+ [pos=\"NOUN\"]` | 33,444 | 27ms |\n| `([pos=\"ADJ\"] \\| [pos=\"DET\"])+ [pos=\"NOUN\"]` | 198,735 | 71ms |\n\n### Key properties:\n- Quantifiers use a run-based execution model (scales with matches, not corpus size)\n- `--count-only` avoids hit allocation entirely (nanosecond-scale for simple queries)\n- Memory-mapped indexes reduce load time and memory footprint by an order of magnitude\n\n## Bindings\nMontre exposes a C FFI for embedding in other languages.\n\n### Julia (almost complete)\n**[Montre.jl](https://github.com/myersm0/Montre.jl)**\n```julia\nusing Montre\n\ncorpus = open_corpus(\"./my-corpus\")\nhits = query(corpus, \"[pos=\\\"ADJ\\\"] [pos=\\\"NOUN\\\"]\")\n\nfor line in concordance(corpus, hits)\n    println(line)\nend\n```\n\n### Python (early)\nBindings via PyO3 are in progress.\n```python\nimport montre\n\ncorpus = montre.open(\"./my-corpus\")\nfor hit in corpus.query('[pos=\"DET\"] [pos=\"NOUN\"]'):\n    print(hit.start, hit.end)\n```\n\n## Roadmap\nComing soon:\n- Statistics: group, collocation\n- Python bindings (feature-complete, pip install)\n- REPL (persistent corpus session)\n- TUI for interactive exploration\n- Support for additional input formats (VRT, Stanza JSON, TEI)\n\n## Citing Montre\n\nA paper describing Montre is in preparation. In the meantime, if you use Montre in published research, please cite:\n```bibtex\n@software{myers-montre,\n  author       = {Myers, Michael J.},\n  title        = {Montre: A Modern Corpus Query Engine for Aligned Corpora},\n  year         = {2026},\n  url          = {https://github.com/myersm0/montre},\n  version      = {0.4.0}\n}\n```\n\n## License\n\nApache-2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyersm0%2Fmontre","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmyersm0%2Fmontre","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmyersm0%2Fmontre/lists"}