{"id":31967997,"url":"https://github.com/sof3/wordvec","last_synced_at":"2025-10-14T18:45:59.524Z","repository":{"id":292828729,"uuid":"982021435","full_name":"SOF3/wordvec","owner":"SOF3","description":"A thin and small vector that can fit data into a single usize.","archived":false,"fork":false,"pushed_at":"2025-10-06T11:50:53.000Z","size":59,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-06T12:26:36.863Z","etag":null,"topics":["data-structures","rust"],"latest_commit_sha":null,"homepage":"https://sof3.github.io/wordvec/report/index.html","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SOF3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-12T08:59:28.000Z","updated_at":"2025-10-06T10:35:59.000Z","dependencies_parsed_at":"2025-05-12T11:40:54.921Z","dependency_job_id":"b0b3232f-9983-4ed0-a00f-ffe84ad2fd8f","html_url":"https://github.com/SOF3/wordvec","commit_stats":null,"previous_names":["sof3/wordvec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SOF3/wordvec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOF3%2Fwordvec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOF3%2Fwordvec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOF3%2Fwordvec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOF3%2Fwordvec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SOF3","download_url":"https://codeload.github.com/SOF3/wordvec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOF3%2Fwordvec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279020355,"owners_count":26086867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-structures","rust"],"created_at":"2025-10-14T18:45:54.788Z","updated_at":"2025-10-14T18:45:59.519Z","avatar_url":"https://github.com/SOF3.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# wordvec\n\n[![GitHub CI](https://github.com/SOF3/wordvec/workflows/CI/badge.svg)](https://github.com/SOF3/wordvec/actions?query=workflow%3ACI)\n[![crates.io](https://img.shields.io/crates/v/wordvec.svg)](https://crates.io/crates/wordvec)\n[![crates.io](https://img.shields.io/crates/d/wordvec.svg)](https://crates.io/crates/wordvec)\n[![docs.rs](https://docs.rs/wordvec/badge.svg)](https://docs.rs/wordvec)\n[![GitHub](https://img.shields.io/github/last-commit/SOF3/wordvec)](https://github.com/SOF3/wordvec)\n[![GitHub](https://img.shields.io/github/stars/SOF3/wordvec?style=social)](https://github.com/SOF3/wordvec)\n\nA [thin][thinvec] and [small][smallvec] vector\nthat can fit data into a single `usize`.\n\n## Memory layout\n\n`WordVec\u003cT, N\u003e` has different layouts for small (inlined) and large (heap) lengths.\nInlined layout is used when the length is less than or equal to `N`,\nwhile heap layout is used when the length exceeds `N`.\nThe type is a union of either inlined layout or heap layout.\n\n### Inlined layout\n\nWordVec can store up to `N` (generic const) items on the stack, where `N \u003c= 127`:\n\n```text\n┌──────────┬────────────────────────┐\n│ 1|len\u003c\u003c1 │   Elements (T × len)   │\n└──────────┴────────────────────────┘\n```\n\nThe length is stored in the 7 more significant bits of the first byte.\nThe least significant bit is always set.\n\nIf `T` has an alignment greater than 1,\nthere would be `align_of::\u003cT\u003e() - 1` padding bytes between the length and the first element.\n\n### Heap layout\n\nIn the heap layout, WordVec is effectively a thin **pointer to** the following structure on the heap:\n\n```text\n┌────────┬──────────┬────────────────────────┐\n│ length │ capacity │   Elements (T × cap)   │\n└────────┴──────────┴────────────────────────┘\n```\n\nSince `length` and `capacity` are `usize`s,\nthe thin pointer is always a multiple of `align_of::\u003cusize\u003e()`.\nThus, the least significant bit of the thin pointer is always 0,\nwhich distinguishes it from the inlined layout.\n\n## When to use\n\nWordVec is a niche data structure that works best when all of the following conditions are met:\n\n### Less than 24 bytes\nAlthough the technical limit is `N \u003c= 127`,\nit is not meaningful to set `N` such that `align_of::\u003cT\u003e() + N * size_of::\u003cT\u003e()` exceeds 24;\nWordVec has no advantage over [SmallVec][smallvec] if it cannot pack into a smaller struct.\n\n### Inlined layout is hot path\nThin vectors are significantly (several times) slower than conventional vectors\nsince reading the length and capacity usually involves accessing memory out of active cache.\nThus, heap layout is supposed to be the cold path.\nIn other words, WordVec is basically\n\"length should never exceed `N`, but behavior is still correct when it exceeds\".\n\n### Many colocated vectors\nSince the length encoding in the inlined layout is indirect (involves a bitshift),\nraw inlined access also tends to be slower in WordVec compared to SmallVec,\nas a tradeoff of reduced memory footprint of each vector alone.\nHowever, it reduces the number of L1 cache misses due to tighter packing between inlined data,\nwhich leads to better overall performance when many vectors are colocated.\n\nThis may get handy in scenarios with a large array of small vectors, e.g. [ECS][ecs],\nwhere WordVec as a component would be packed in an archetype component storage contiguously.\n\n## Platform requirements\n\nTargets violating the following requirements will lead to compile error:\n\n- Little-endian only\n- `align_of::\u003cusize\u003e()` must be at least 2 bytes.\n\n## Benchmarks\n\nFull criterion benchmark report generated from GitHub CI\ncan be found on [GitHub pages][bench-criterion].\nNote that GitHub CI runners are subject to many uncontrolled noise sources\nand may not be very reliable.\nYou may reproduce the benchmarks yourself by running `cargo bench --bench criterion`,\nor check the [valgrind-based analysis][bench-iai] instead.\n\nThe benchmarks compare `std::vec`, [`thinvec`][thinvec], [`smallvec`][smallvec] and `wordvec`.\nThe general observation is that WordVec performance is mostly comparable to SmallVec, but:\n- is consistently slower with operations on a *single* small vector (presumably due to bitshifting the length byte),\n  particularly resizing operations such as `push`.\n- is sometimes slower with operations on large vectors due to thinness (reading/writing length/capacity from heap)\n- is consistently faster with operations on *many* small vectors due to more efficient memory (fewer RAM accesses),\n  particularly with operations updating the inner values of a `Vec\u003cWordVec\u003cT, N\u003e\u003e`.\n\n## Vec feature parity\n\nWordVec is a new project to experiment on new semantics.\nCurrently only the basic features required to produce meaningful benchmarks are implemented,\nbut **all features from `std::vec` shall be implemented** in this library eventually.\nPull requests are welcome to align WordVec functionality with `std::vec` or smallvec;\nI do not have bandwidth to implement all those functions but I am happy to review such contributions.\n\n[smallvec]: https://docs.rs/smallvec\n[thinvec]: https://docs.rs/thin-vec\n[std-vec]: https://doc.rust-lang.org/std/vec/struct.Vec.html\n[bench-criterion]: https://sof3.github.io/wordvec/report/index.html\n[bench-iai]: https://sof3.github.io/wordvec/iai/summary.txt\n[ecs]: https://en.wikipedia.org/wiki/Entity_component_system\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsof3%2Fwordvec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsof3%2Fwordvec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsof3%2Fwordvec/lists"}