{"id":17182427,"url":"https://github.com/bertsky/dta-lexdb-applications","last_synced_at":"2026-02-25T17:08:07.092Z","repository":{"id":219659773,"uuid":"749573117","full_name":"bertsky/dta-lexdb-applications","owner":"bertsky","description":"formatting and integrating the Deutches Textarchiv dictionary into various applications","archived":false,"fork":false,"pushed_at":"2024-03-01T23:27:32.000Z","size":34,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-26T20:43:29.840Z","etag":null,"topics":["dta"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bertsky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-29T00:08:36.000Z","updated_at":"2024-02-02T09:40:02.000Z","dependencies_parsed_at":"2024-03-02T00:30:13.794Z","dependency_job_id":null,"html_url":"https://github.com/bertsky/dta-lexdb-applications","commit_stats":null,"previous_names":["bertsky/dta-lexdb-applications"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/bertsky/dta-lexdb-applications","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fdta-lexdb-applications","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fdta-lexdb-applications/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fdta-lexdb-applications/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fdta-lexdb-applications/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bertsky","download_url":"https://codeload.github.com/bertsky/dta-lexdb-applications/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fdta-lexdb-applications/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29832055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T15:41:19.027Z","status":"ssl_error","status_checked_at":"2026-02-25T15:40:47.150Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dta"],"created_at":"2024-10-15T00:37:04.060Z","updated_at":"2026-02-25T17:08:07.050Z","avatar_url":"https://github.com/bertsky.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dta-lexdb-applications\n\n[![CD](https://github.com/bertsky/dta-lexdb-applications/actions/workflows/makefile.yml/badge.svg)](https://github.com/bertsky/dta-lexdb-applications/actions/workflows/makefile.yml)\n\n\u003e formatting and integrating the Deutches Textarchiv dictionary into various applications\n\n[Deutsches Textarchiv](https://www.deutsches-textarchiv.de) (DTA) is a large collection of curated and manually corrected\nreference corpora in New High German from the 17th to 20th century.\n\n[LexDB](https://www.dwds.de/r/lexdb) are a collection of lexical databases (i.e. dictionaries) distilled from DTA\nby the BBAW. They include the full-form, lemmatization, normalized orthography and part-of-speech.\n\nThis repository provides scripts to extract and re-format dictionaries for re-use in other applications.\nThe results will be available as Github release assets.\n\n## Tesseract OCR models with added language model\n\n[Tesseract](https://tesseract-ocr.github.io/) models (both the originally provided ones, trained on\nsynthetic data, and the community generated ones, finetuned on annotated scan data or trained from scratch)\ncan be amended with a simple language model by providing dictionaries/grammars for punctuation, numbers and words.\n\nWe will pick publicly available models for German Antiqua and Fraktur prints, as well as handwriting,\nand republish them with DTA as language model.\n\nFor currently selected models, see https://github.com/bertsky/dta-lexdb-applications/blob/83e5d5c3404da3b14886fe5eeed044ee1f630bdd/Makefile#L13-L34\n\n## Hunspell\n\n[Hunspell](http://hunspell.github.io/) is a widely used dictionary based, morphology aware spell checker.\n\nWe will produce a DTA dictionary for it.\n\nFor currently selected rules, see https://github.com/bertsky/dta-lexdb-applications/blob/83e5d5c3404da3b14886fe5eeed044ee1f630bdd/Makefile#L60-L63\n\n## ...\n\nOthers to come. Please raise an issue if you have ideas!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Fdta-lexdb-applications","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertsky%2Fdta-lexdb-applications","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Fdta-lexdb-applications/lists"}