{"id":19672001,"url":"https://github.com/michelp/structchunk","last_synced_at":"2025-02-27T05:14:30.871Z","repository":{"id":6402131,"uuid":"7640251","full_name":"michelp/structchunk","owner":"michelp","description":"Structured data store to mmap'ed chunk files.","archived":false,"fork":false,"pushed_at":"2013-01-18T23:15:41.000Z","size":127,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-10T03:44:13.149Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michelp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-01-16T06:04:44.000Z","updated_at":"2013-10-20T16:46:36.000Z","dependencies_parsed_at":"2022-09-01T20:34:06.843Z","dependency_job_id":null,"html_url":"https://github.com/michelp/structchunk","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fstructchunk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fstructchunk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fstructchunk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fstructchunk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michelp","download_url":"https://codeload.github.com/michelp/structchunk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240981562,"owners_count":19888346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T17:10:35.668Z","updated_at":"2025-02-27T05:14:30.853Z","avatar_url":"https://github.com/michelp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"= structchunk\n\nStructured data store to mmap'ed chunk files.\n\nThis is a very simple key, object data store, that maps\nctypes.Structure objects onto a memory mapped sparse chunk file.  The\nindex of keys and the offset into the mapped chunk files is stored in a\nleveldb database.\n\nStructure objects contain only data accessor methods into a given\nmemory buffer. No copying is done in Python from the file to memory,\nall handling of the memory buffers backing the Structure objects is\ndone by the kernel's virtual memory manager.\n\nThe primary purpose of structchunk is absolute maximum speed of\naccessing structure data in a file, but doing no file managment\nwhatsoever, by using extremely lightweight objects and letting the\nOS's VMM do all the work.\n\n== Advantages\n\n  - LevelDB index is fast and lightweight, only the key and the offset\n    into a chunked file is stored.\n\n  - Objects can be accessed from on-disk data without explicitly\n    serializing or unserializing any of the data in process memory\n    space.  The VMM takes care of all disk access.\n\n  - All existing 'ctypes' types can be used as-is, no special type\n    system exists, a single Object superclass is provided for defining\n    new Chunk stored structures, but otherwise is exactly like a\n    ctypes.Structure.\n\n  - Objects are equivalent to c structs unrelated to any Python\n    structures, making the file format portable.\n\n  - Sparse file support means only initialized structure elements get\n    written to disk.  Large arrays can be allocated but only\n    initialized elements take up any disk space.\n\n  - More data can be referenced in mapped files than physical memory\n    can hold, the OS takes care of loading and unloading virtual\n    memory backing objects to fit into available memory automatically.\n\n  - Objects are simply offset pointers into a chunk file, and have a\n    comparatively small memory footprint compared to the data they can\n    reference on disk.\n    \n== Disadvantages\n\n  - If the index is lost, the chunk files are meaningless.\n\n  - If the OS doesn't handle sparse files (OSX) chunk files are fully\n    sized.\n\n  - The OS process virtual memory space limits apply to mmap()ed\n    files, 32-bit linux is limited to 2 GBs of on disk data.  64-bit\n    systems can address up to 128 TB.\n\n  - No \"type\" information is stored on objects in the chunk file, you\n    must know the type head of time before you load it.  If you load a\n    buffer into the wrong type, weird stuff will happen!\n\n== Implementation\n\nctypes.Structure subclasses have a from_buffer() method that creates a\nnew instance of that type backed by the memory region in the buffer.\nBy using an mmap.mmap object that is mapped to a chunk file as the\nbuffer, the objects are essentially accessors to data in process\nvirtual memory that is directly on disk.\n\nEvery chunk file contains a header that specifies its size, and the\nnext available \"free\" spot in the file that has not been allocated.\nWhen a new object is created, it is mapped to the free location and\nthe head is advanced to the end of the new object.  If there is\nissuficient space in the chunk to hold the new object, a new chunk is\ncreated.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelp%2Fstructchunk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichelp%2Fstructchunk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelp%2Fstructchunk/lists"}