{"id":43954183,"url":"https://github.com/czcorpus/kontext","last_synced_at":"2026-02-07T04:06:27.371Z","repository":{"id":30383557,"uuid":"33936204","full_name":"czcorpus/kontext","owner":"czcorpus","description":"An advanced, extensible web front-end for the Manatee-open corpus search engine","archived":false,"fork":false,"pushed_at":"2026-02-02T14:27:10.000Z","size":41857,"stargazers_count":78,"open_issues_count":68,"forks_count":24,"subscribers_count":7,"default_branch":"master","last_synced_at":"2026-02-03T03:20:53.319Z","etag":null,"topics":["corpora","corpus-linguistics","corpus-tools","user-interface"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/czcorpus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2015-04-14T14:19:58.000Z","updated_at":"2026-02-02T14:26:15.000Z","dependencies_parsed_at":"2023-10-02T12:04:30.813Z","dependency_job_id":"c5113843-6629-46b4-b551-2a8c6eaabcce","html_url":"https://github.com/czcorpus/kontext","commit_stats":null,"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/czcorpus/kontext","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czcorpus%2Fkontext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czcorpus%2Fkontext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czcorpus%2Fkontext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czcorpus%2Fkontext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/czcorpus","download_url":"https://codeload.github.com/czcorpus/kontext/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czcorpus%2Fkontext/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29186094,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T03:35:06.566Z","status":"ssl_error","status_checked_at":"2026-02-07T03:34:57.604Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corpora","corpus-linguistics","corpus-tools","user-interface"],"created_at":"2026-02-07T04:06:26.690Z","updated_at":"2026-02-07T04:06:27.365Z","avatar_url":"https://github.com/czcorpus.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"![KonText screenshot](https://github.com/czcorpus/kontext/blob/master/doc/images/kontext-screenshot1.jpg)\n\n\n## Contents\n\n* [Introduction](#introduction)\n* [Features](#features)\n* [Installation](#installation)\n* [Customization and contribution](#customization-and-contribution)\n* [Notable users](#notable-users)\n* [How to cite](#how-to-cite-kontext)\n\n## Introduction\n\nKonText is an **advanced corpus query interface** and corpus data **integration platform** built around corpus search engine [Manatee-open](http://nlp.fi.muni.cz/trac/noske). It is written in Python 3 and TypeScript and it runs on any major Linux distribution. The development is maintained by the [Department of Linguistics, Faculty of Arts, Charles University](http://ucnk.ff.cuni.cz/).\n\n## Features\n\n* fully **editable query chain**\n    * any operation from a user defined sequence (e.g. query -\u0026gt; filter -\u0026gt; sample -\u0026gt; sorting) can be changed\n    and the whole sequence is then re-executed.\n* multiple search modes:\n    * concordance,\n    * paradigmatic query,\n    * word list\n    * keyword analysis\n* simple and advanced query types\n    * **advanced CQL editor** with **syntax highlighting** and **attribute recognition**\n    * **interactive PoS tag composing tool** for positional and key-value tagsets\n    * customizable query suggestions and simple type query refinement (e.g. for homonym disambiguation)\n* support for **spoken corpora**\n    * defined text segments can be played back as audio\n    * KWIC detail with **easily distinguishable speeches**\n* rich **concordance view options and tools**\n    * any positional attribute can be set as primary\n    * multiple ways how to display other attributes\n    * **user-defined line groups** - filtering, reviewing groups ratios\n    * tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)\n    * individual tokens can be linked to each other using an external service (e.g. for word translation equivalents)\n* **rich subcorpus-related functionality**\n    * any subcorpus is accessible by other users (in case they obtain a URL, otherwise the subcorpus is not discoverable by default)\n      * once a public description is set, the subcorpus can be discovered on the \"public subcorpora\" page\n    * text types metadata can be gradually refined to a specific subcorpus (\"which publishers are there in case only *fiction* is selected?\")\n    * a **custom text types ratio** can be defined (\"give me 20% fiction and 80% journalism\")\n    * unused subcorpora can be archived (URLs with the subcorpus are still valid) or completely removed (URLs will become invalid)\n    * searching within a subcorpous can be further refined with ad-hoc text type selection\n    * a subcorpus can be created with respect to corpora aligned (\"give me fiction in Czech but only if there is an English translation for it\")\n* **frequency distribution**\n    * univariate\n        * positional attributes (including tuples of multiple attributes per token)\n        * structural attributes\n    * **multivariate distribution** (2 dimensions) for both positional and structural attributes\n* collocation analysis\n* **persistent URLs** - any result page can be easily shared even if the original query is megabytes long\n* access to **previous queries**, named queries\n* convenient corpus access\n    * finding corpus by a keyword (tag), size, description\n    * adding corpus to **favorites** (incl. subcorpora, aligned corpora)\n* saving result to Excel, CSV, XML, JSONL, TXT\n* [HTTP API](https://github.com/czcorpus/kontext/wiki/HTTP-API) access\n\n\n## Internal features\n\n* modern client-side application (written in TypeScript, event stream architecture, React components, extensible)\n* server-side written using the [Sanic](https://sanic.dev/en/) framework with fully **decoupled background concordance/frequency/collocation calculation** (using an integrated [Rq](https://python-rq.org/) worker server)\n* modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database\nadapters, authentication method, corpus listing widgets, HTTP session management)\n   * integrability with existing information systems\n\n\n## Installation\n\n### Docker\n\nRunning KonText as a set of Docker containers is the most convenient and flexible way. Docker Compose v2 is required. To run a basic\nconfiguration instance (i.e. no MySQL/MariaDB server) use:\n\n```shell\ndocker compose up\n```\n\nTo run a production grade instance:\n\n```shell\ndocker compose -f docker-compose.yml -f docker-compose.mysql.yml --env-file .env.mysql up\n```\n\n(the `.env.mysql` allows configuring custom MySQL/MariaDB credentials and KonText configuration file)\n\n\n### Manual installation\n\n#### Key requirements\n\n* Python *3.6* (or newer)\n* [Manatee](http://nlp.fi.muni.cz/trac/noske) corpus search engine - version *2.167.8* and onwards (for KonText *v0.17+*, Manatee *v2.2xx* is recommended)\n* a key-value storage\n    * [Redis](http://redis.io/) (recommended), [SQLite](https://sqlite.org/) (supported), custom implementations possible\n* a task queue - [Rq](https://python-rq.org/)\n* HTTP proxy server\n  + [Nginx](http://nginx.org/) (recommended), [Apache](http://httpd.apache.org/),...\n\n\nFor Ubuntu OS users, it is recommended to use the [install script](scripts/install/install.py) which should\nperform most of the actions necessary to install and run KonText. For other Linux distributions we recommend\nrunning KonText within a container or a virtual machine. Please refer to the [doc/INSTALL.md](doc/INSTALL.md)\nfile for details.\n\n\n## Customization and contribution\n\nPlease refer to our [Wiki](https://github.com/czcorpus/kontext/wiki/Development-and-customization).\n\n## Notable users\n\n* [Department of Linguistics, Faculty of Arts, Charles University](https://kontext.korpus.cz/)\n* [LINDAT/CLARIAH-CZ](https://ufal.mff.cuni.cz/lindat-kontext)\n* [CLARIN-PL](https://kontext.clarin-pl.eu/)\n* [CLARIN-SI](https://www.clarin.si/kontext/)\n* [Serbski Institut](https://www.serbski-institut.de) (API version of KonText)\n\n## How to cite KonText\n\nTomáš Machálek (2020) - KonText: Advanced and Flexible Corpus Query Interface\n\n```bibtex\n@inproceedings{machalek-2020-kontext,\n    title = \"{K}on{T}ext: Advanced and Flexible Corpus Query Interface\",\n    author = \"Mach{\\'a}lek, Tom{\\'a}{\\v{s}}\",\n    booktitle = \"Proceedings of the 12th Language Resources and Evaluation Conference\",\n    month = may,\n    year = \"2020\",\n    address = \"Marseille, France\",\n    publisher = \"European Language Resources Association\",\n    url = \"https://www.aclweb.org/anthology/2020.lrec-1.865\",\n    pages = \"7003--7008\",\n    language = \"English\",\n    ISBN = \"979-10-95546-34-4\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczcorpus%2Fkontext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fczcorpus%2Fkontext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczcorpus%2Fkontext/lists"}