{"id":25861468,"url":"https://github.com/m-k-l-s/discworld-hex","last_synced_at":"2026-05-11T11:40:08.508Z","repository":{"id":70671880,"uuid":"455935176","full_name":"m-k-l-s/discworld-hex","owner":"m-k-l-s","description":"Hex clusters Discworld's stories.","archived":false,"fork":false,"pushed_at":"2022-04-11T13:51:40.000Z","size":54,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-01T23:36:41.875Z","etag":null,"topics":["discworld","faiss","sentence-transformers","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m-k-l-s.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-05T17:19:29.000Z","updated_at":"2022-07-12T11:00:10.000Z","dependencies_parsed_at":"2023-03-06T14:00:32.727Z","dependency_job_id":null,"html_url":"https://github.com/m-k-l-s/discworld-hex","commit_stats":null,"previous_names":["m-k-l-s/discworld-hex"],"tags_count":0,"template":false,"template_full_name":"m-k-l-s/python-project-template","purl":"pkg:github/m-k-l-s/discworld-hex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-k-l-s%2Fdiscworld-hex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-k-l-s%2Fdiscworld-hex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-k-l-s%2Fdiscworld-hex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-k-l-s%2Fdiscworld-hex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m-k-l-s","download_url":"https://codeload.github.com/m-k-l-s/discworld-hex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-k-l-s%2Fdiscworld-hex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32893999,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-10T13:40:02.631Z","status":"online","status_checked_at":"2026-05-11T02:00:05.975Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["discworld","faiss","sentence-transformers","transformers"],"created_at":"2025-03-01T23:36:28.471Z","updated_at":"2026-05-11T11:40:08.479Z","avatar_url":"https://github.com/m-k-l-s.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Discworld Hex\n\nHex clusters Discworld's stories.\n\nClustering and search tool applied to plots of Discworld novels.\nCurrently, given an input sentence, it will find the most similar parts of Discworld books based on their plot summaries from Wikipedia.\n\nThis is just a tiny proof-of-concept of using [FAISS](https://github.com/facebookresearch/faiss) with transformer language models that could be easily extended to cover much larger datasets.\n\n## Setup\n\nShould work out of the box with `bash` and a couple of prerequisites:\n- [conda](https://docs.conda.io/en/latest/miniconda.html)\n- [poetry](https://python-poetry.org/docs/#installation)\n\n```bash\n( cd conda \u0026\u0026 source bootstrap.sh )\nconda activate discworld-hex\npoetry install\n```\n\n## Usage\n\nTL;DR (when `poetry` is installed and the `discworld-hex` conda env is activated):\n\n```bash\nbuild\nsearch\n```\n\nTo only fetch data and build and export the index:\n\n```bash\nbuild\n# is just a shortcut for:\npoetry run build\n```\n\nTo use the index to search:\n\n```bash\nsearch\n# is just a shortcut for:\npoetry run search\n```\n\nTo run any python script in this project:\n\n```bash\npoetry run python src/discworld_hex/any_file.py\n```\n\nTo run all checks:\n\n```bash\npoetry run pre-cmmit\n```\n\n## TODO\n\n### Functionality\n\n(What the user would notice.)\n\n- [ ] Allow custom `wikipedia` queries on the input (and thus custom libraries)\n- [ ] Fine-tune (e.g., standard (masked) language modelling) on the specific subdomains\n- [ ] Aggregate search results per-book\n- [ ] Allow merging libraries\n- [ ] Better CLI, allow to change `k`, pass in multiple sentences, etc., either:\n    - [ ] [`click`](https://github.com/pallets/click)ify and [`rich`](https://github.com/Textualize/rich)ify the\n      interface\n    - [ ] Alternatively, just make it into an API\n- [ ] Support [other (faster, less accurate) indexes](https://github.com/facebookresearch/faiss/wiki/Faster-search)\n\n### Maintenance\n\n(What the user shouldn't notice.)\n\n- [ ] Less redundant library serialization\n- More tests\n    - [ ] Rebuilding Library and the FAISS index\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm-k-l-s%2Fdiscworld-hex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm-k-l-s%2Fdiscworld-hex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm-k-l-s%2Fdiscworld-hex/lists"}