{"id":15519089,"url":"https://github.com/danieldk/dictomaton","last_synced_at":"2025-04-13T18:22:30.927Z","repository":{"id":6989679,"uuid":"8252814","full_name":"danieldk/dictomaton","owner":"danieldk","description":"Finite state dictionaries in Java","archived":false,"fork":false,"pushed_at":"2022-02-01T07:54:59.000Z","size":1841,"stargazers_count":130,"open_issues_count":1,"forks_count":10,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-27T09:06:57.374Z","etag":null,"topics":["collections","compact","dictionary","finite","java","levenshtein","state"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-02-17T16:14:47.000Z","updated_at":"2024-06-22T07:48:50.000Z","dependencies_parsed_at":"2022-09-16T13:02:28.696Z","dependency_job_id":null,"html_url":"https://github.com/danieldk/dictomaton","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fdictomaton","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fdictomaton/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fdictomaton/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fdictomaton/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldk","download_url":"https://codeload.github.com/danieldk/dictomaton/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248759135,"owners_count":21157096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collections","compact","dictionary","finite","java","levenshtein","state"],"created_at":"2024-10-02T10:19:59.827Z","updated_at":"2025-04-13T18:22:30.900Z","avatar_url":"https://github.com/danieldk.png","language":"Java","readme":"# dictomaton\n\n## Introduction\n\nThis Java library implements dictionaries that are stored in finite state\nautomata. *Dictomaton* has the following features:\n\n * Finite state dictionaries that implement the Java Set\u003cString\u003e interface.\n * Perfect hash dictionaries, that provide a unique hash for each character\n   sequence that is in the dictionary. Perfect hash dictionaries can be used\n   in two directions: (1) obtaining the hash code for a character sequence\n   and (2) obtaining the character sequence for a hash code.\n * Levenshtein automata, that allow you to efficiently find all the sequences\n   in the dictionary that are within the given edit distance of a sequence.\n * String to primitive type mappings, where the keys are stored in a perfect\n   hashing automaton and the values in an (unboxed) array.\n\n## Using Dictomaton\n\nDictomaton is in the Maven Central Repository:\n\n~~~\n\u003cdependency\u003e\n    \u003cgroupId\u003eeu.danieldk.dictomaton\u003c/groupId\u003e\n    \u003cartifactId\u003edictomaton\u003c/artifactId\u003e\n    \u003cversion\u003e1.1.1\u003c/version\u003e\n\u003c/dependency\u003e\n~~~\n\nSBT:\n\n~~~\nlibraryDependencies += \"eu.danieldk.dictomaton\" % \"dictomaton\" % \"1.1.1\"\n~~~\n\nGrails:\n\n~~~\ncompile 'eu.danieldk.dictomaton:dictomaton:1.1.1'\n~~~\n\n## Comparisons\n\nThe following table compares the sizes of the object graphs of the\n\u003ctt\u003eDictionary\u003c/tt\u003e type of this library to that of \u003ctt\u003eTreeSet\u003c/tt\u003e and\n\u003ctt\u003eHashSet\u003c/tt\u003e. The comparisons were obtained by storing all the words\nin the *web2* and *web2a* dictionaries and were measured using\n[memory-measurer](https://code.google.com/p/memory-measurer/)\n\n\u003ctable\u003e\n   \u003ctr\u003e\u003cth\u003eData type\u003c/th\u003e\u003cth\u003eObjects\u003c/th\u003e\u003cth\u003eReferences\u003c/th\u003e\u003cth\u003echar\u003c/th\u003e\u003cth\u003eint\u003c/th\u003e\u003cth\u003eboolean\u003c/th\u003e\u003cth\u003efloat\u003c/th\u003e\u003c/tr\u003e\n   \u003ctr\u003e\u003ctd\u003eTreeSet\u003cString\u003e\u003c/td\u003e\u003ctd align=\"right\"\u003e936277\u003c/td\u003e\u003ctd align=\"right\"\u003e1872555\u003c/td\u003e\u003ctd align=\"right\"\u003e3193749\u003c/td\u003e\u003ctd align=\"right\"\u003e624184\u003c/td\u003e\u003ctd align=\"right\"\u003e312091\u003c/td\u003e\u003ctd\u003e0\u003c/td\u003e\u003c/tr\u003e\n   \u003ctr\u003e\u003ctd\u003eHashSet\u003cString\u003e\u003c/td\u003e\u003ctd align=\"right\"\u003e936277\u003c/td\u003e\u003ctd align=\"right\"\u003e1772657\u003c/td\u003e\u003ctd align=\"right\"\u003e3193749\u003c/td\u003e\u003ctd align=\"right\"\u003e936277\u003c/td\u003e\u003ctd align=\"right\"\u003e1\u003c/td\u003e\u003ctd\u003e1\u003c/td\u003e\u003c/tr\u003e\n   \u003ctr\u003e\u003ctd\u003eDictionary\u003cString\u003e\u003c/td\u003e\u003ctd align=\"right\"\u003e41188\u003c/td\u003e\u003ctd align=\"right\"\u003e94546\u003c/td\u003e\u003ctd align=\"right\"\u003e424169\u003c/td\u003e\u003ctd align=\"right\"\u003e397033\u003c/td\u003e\u003ctd align=\"right\"\u003e1\u003c/td\u003e\u003ctd\u003e1\u003c/td\u003e\u003c/tr\u003e\n\u003c/table\u003e\n\n## Benchmarks\n\nBenchmarks are in a different test group than normal unit tests. You can run\nbenchmarks via Maven, adding the Benchmarks group:\n\n    mvn test -Djunit.groups=eu.danieldk.dictomaton.categories.Benchmarks\n\n## Changelog\n\n### 1.2.0\n\n* Exposing state through StateInfo object, which allows user of PerfectHashDictionary to resume transitions, which makes it e.g. far more efficient to look up a string and its prefixes. (contributed by René Kriegler).\n* DictionaryBuilder now accepts adding more general CharSequence instead of String and uses CharSequence internally (contributed by René Kriegler).\n\n### 1.1.0\n\n* Added immutable mapping from String to a generic type.\n* Added a key-ordered builder for immutable mappings. This builder is more\n  efficient since it construct the key automaton on the fly.\n\n### 1.0.0\n\n* Added Levenshtein automata for looking up sequences in a \u003ctt\u003eDictionary\u003c/tt\u003e that\n  are within a certain edit distance of a sequence.\n* Provide a variant of perfect hash automata that puts right language\n  cardinalities in transitions rather than states. This provides faster\n  hashing and hashcode lookups at the cost of some memory.\n* Added String to String mapping (\u003ctt\u003eImmutableStringStringMap\u003c/tt\u003e).\n* Generic object values.\n\n### 0.0.3\n\n* Fix an off-by-one error in integer width of the state table.\n\n### 0.0.2\n\n* Rename the project from *fsadict-java* to *dictomaton*.\n* Store the state and transition tables as packed int arrays, resulting in drastically smaller automata.\n\n\n## Release plan\n\nPlans for 1.3.0: Perhaps an explicit, fast, and compact data storage format\nas an alternative to Java serialization. C or C++ version.\n\n## Contributors\n\n* Daniël de Kok (maintainer)\n* René Kriegler\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fdictomaton","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldk%2Fdictomaton","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fdictomaton/lists"}