{"id":19441028,"url":"https://github.com/rosedblabs/diskhash","last_synced_at":"2025-04-24T23:32:21.248Z","repository":{"id":188057858,"uuid":"674557012","full_name":"rosedblabs/diskhash","owner":"rosedblabs","description":"on-disk hash table(mainly for WAL).","archived":false,"fork":false,"pushed_at":"2023-09-10T08:40:49.000Z","size":61,"stargazers_count":27,"open_issues_count":0,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-03T13:11:25.078Z","etag":null,"topics":["disk","hashtable","storage","wal"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rosedblabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-04T08:25:58.000Z","updated_at":"2025-02-10T20:55:33.000Z","dependencies_parsed_at":"2023-09-06T12:32:39.079Z","dependency_job_id":"e9840a8a-ec93-4c09-92e6-2a3b42b37c36","html_url":"https://github.com/rosedblabs/diskhash","commit_stats":{"total_commits":23,"total_committers":4,"mean_commits":5.75,"dds":"0.17391304347826086","last_synced_commit":"289755737e2a746ab055c927bd619035b5f3ab9c"},"previous_names":["rosedblabs/diskhash"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosedblabs%2Fdiskhash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosedblabs%2Fdiskhash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosedblabs%2Fdiskhash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosedblabs%2Fdiskhash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rosedblabs","download_url":"https://codeload.github.com/rosedblabs/diskhash/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250727805,"owners_count":21477373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["disk","hashtable","storage","wal"],"created_at":"2024-11-10T15:34:06.832Z","updated_at":"2025-04-24T23:32:20.881Z","avatar_url":"https://github.com/rosedblabs.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# diskhash\non-disk hash table index(mainly for WAL).\n\n## When will you need it?\nIf you are using [WAL](https://github.com/rosedblabs/wal) to store your data,\n\n\u003e wal: https://github.com/rosedblabs/wal\n\nyou will get the positions to get the data from WAL, the common way to store the positions is to use an in-memory index(like rosedb).\n\nBut if you have a large amount of data, and it will take a lot of time to load the index into memory when you restart the system.\n\nSo, you can use diskhash to store the index on disk.\n\n## Can be used as a general hash table index(without wal)?\n\nyes, you can use it as an on-disk hash table index, but the restriction is that the value must be fixed size.\nyou can set the value size when you create the index, and once you set the value size, you can't change it.\n\nBut don't set the value size too large(1KB), the disk size maybe increase dramatically because of the write amplification.\n**it is suitable for storing some metadata of your system.**\n\n## Design Overview\nThe diskhash consists of two disk files: main and overflow.\nThe file format is as follows:\n```\nFile Format:\n+---------------+---------------+---------------+---------------+-----+----------------+\n|    (unused)   |    bucket0    |    bucket1    |    bucket2    | ... |     bucketN    |\n+---------------+---------------+---------------+---------------+-----+----------------+\n```\n\nA file is divided into multiple buckets, if the table reaches the load factor, a new bucket will be appended to the end of the file.\nA bucket contains 31 slots, and an overflow offset which points to the overflow file buckets.\n```\nBucket Format:\n+-------------+-------------+-------------+-------------+-----+--------------+-----------------+\n|   slot0     |   slot1     |   slot2     |   slot3     | ... |    slotN     | overflow_offset |\n+-------------+-------------+-------------+-------------+-----+--------------+-----------------+\n```\n\nA slot contains a key hash value, and user-defined value.\n```\nSlot Format:\n+-----------------------+--------------------------------+\n|      key_hash(4B)     |          value(N Bytes)        |\n+-----------------------+--------------------------------+\n```\n\n## Getting Started\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/rosedblabs/diskhash\"\n\t\"strings\"\n)\n\nfunc main() {\n\t// open the table, specify the slot value length,\n\t// remember that you can't change it once you set it, and all values must be the same length.\n\toptions := diskhash.DefaultOptions\n\toptions.DirPath = \"/tmp/diskhash-test\"\n\toptions.SlotValueLength = 10\n\ttable, err := diskhash.Open(options)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\t// don't forget to close the table!!!\n\t// some meta info will be saved when you close the table.\n\tdefer func() {\n\t\t_ = table.Close()\n\t}()\n\n\t// put a key-value pair into the table.\n\t// the MatchKey function will be called when the key is matched.\n\t// When we store the data in the hash table, we only store the hash value of the key, and the raw value.\n\t// So when we get the data from hash table, even if the hash value of the key matches, that doesn't mean\n\t// the key matches because of hash collision.\n\t// So we need to provide a function to determine whether the key of the slot matches the stored key.\n\terr = table.Put([]byte(\"key1\"), []byte(strings.Repeat(\"v\", 10)), func(slot diskhash.Slot) (bool, error) {\n\t\treturn true, nil\n\t})\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\terr = table.Get([]byte(\"key1\"), func(slot diskhash.Slot) (bool, error) {\n\t\tfmt.Println(\"val =\", string(slot.Value))\n\t\treturn true, nil\n\t})\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\terr = table.Delete([]byte(\"key1\"), func(slot diskhash.Slot) (bool, error) {\n\t\treturn true, nil\n\t})\n\tif err != nil {\n\t\tpanic(err)\n\t}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosedblabs%2Fdiskhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frosedblabs%2Fdiskhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosedblabs%2Fdiskhash/lists"}