{"id":31037233,"url":"https://github.com/ashvardanian/stringtape","last_synced_at":"2026-03-09T04:31:52.007Z","repository":{"id":310490750,"uuid":"1039689897","full_name":"ashvardanian/StringTape","owner":"ashvardanian","description":"Apache Arrow-compatible space-efficient \"tape\" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variable length strings","archived":false,"fork":false,"pushed_at":"2025-11-21T00:44:00.000Z","size":232,"stargazers_count":29,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-21T02:35:06.290Z","etag":null,"topics":["allocator","apache-arrow","arrow","pyarrow","string-manipulation","tape"],"latest_commit_sha":null,"homepage":"https://github.com/ashvardanian/StringZilla","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashvardanian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-17T19:20:19.000Z","updated_at":"2025-11-21T00:44:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"07fc750f-bb21-4d8a-9650-e0f429ba5d0b","html_url":"https://github.com/ashvardanian/StringTape","commit_stats":null,"previous_names":["ashvardanian/stringtape"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/ashvardanian/StringTape","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FStringTape","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FStringTape/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FStringTape/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FStringTape/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashvardanian","download_url":"https://codeload.github.com/ashvardanian/StringTape/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FStringTape/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30283416,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T02:57:19.223Z","status":"ssl_error","status_checked_at":"2026-03-09T02:56:26.373Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allocator","apache-arrow","arrow","pyarrow","string-manipulation","tape"],"created_at":"2025-09-14T04:55:44.732Z","updated_at":"2026-03-09T04:31:51.985Z","avatar_url":"https://github.com/ashvardanian.png","language":"Rust","readme":"# StringTape\n\n![StringTape banner](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/StringTape.png?raw=true)\n\nA memory-efficient collection for variable-length strings, co-located on a contiguous \"tape\".\n\n- Convertible to __[Apache Arrow](https://arrow.apache.org/)__ `String`/`LargeString` \u0026 `Binary`/`LargeBinary` arrays\n- Compatible with __UTF-8 \u0026 binary__ strings in Rust via `CharsTape` and `BytesTape`\n- Usable in `no_std` and with custom allocators for GPU \u0026 embedded use cases\n- Sliceable into __zero-copy__ borrow-checked views with `[i..n]` range syntax\n\n## Quick Start\n\n```rust\nuse stringtape::{CharsTapeI32, BytesTapeI32, StringTapeError};\n\n// Create a new CharsTape with 32-bit offsets\nlet mut tape = CharsTapeI32::new();\ntape.push(\"hello\")?;\ntape.push(\"world\")?;\n\nassert_eq!(tape.len(), 2);\nassert_eq!(\u0026tape[0], \"hello\");\nassert_eq!(tape.get(1), Some(\"world\"));\n\n// Iterate over strings\nfor s in \u0026tape {\n    println!(\"{}\", s);\n}\n\n// Build from iterator\nlet tape2: CharsTapeI32 = [\"a\", \"b\", \"c\"].into_iter().collect();\nassert_eq!(tape2.len(), 3);\n\n// Binary data with BytesTape\nlet mut bytes = BytesTapeI32::new();\nbytes.push(b\"hi\")?;\nassert_eq!(\u0026bytes[0], b\"hi\");\n\n# Ok::\u003c(), StringTapeError\u003e(())\n```\n\n## Memory Layout\n\n`CharsTape` and `BytesTape` use the same memory layout as Apache Arrow string and binary arrays:\n\n```text\nData buffer:    [h,e,l,l,o,w,o,r,l,d]\nOffset buffer:  [0, 5, 10]\n```\n\n## API Overview\n\n### Basic Operations\n\n```rust\nuse stringtape::CharsTapeI32;\n\nlet mut tape = CharsTapeI32::new();\ntape.push(\"hello\")?;                    // Append one string\ntape.extend([\"world\", \"foo\"])?;         // Append an array\nassert_eq!(\u0026tape[0], \"hello\");          // Direct indexing\nassert_eq!(tape.get(1), Some(\"world\")); // Safe access\n\nfor s in \u0026tape { // Iterate\n    println!(\"{}\", s);\n}\n\n// Construct from an iterator\nlet tape2: CharsTapeI32 = [\"a\", \"b\", \"c\"].into_iter().collect();\n```\n\n`BytesTape` provides the same interface for arbitrary byte slices.\n\n### Views and Slicing\n\n```rust\nlet view = tape.view();              // View entire tape\nlet subview = tape.subview(1, 3)?;   // Items [1, 3)\nlet nested = subview.subview(0, 1)?; // Nested subviews\nlet raw_bytes = \u0026tape.view()[1..3];  // Raw byte slice\n\n// Views have same API as tapes\nassert_eq!(subview.len(), 2);\nassert_eq!(\u0026subview[0], \"world\");\n```\n\n### Memory Management\n\n```rust\n// Pre-allocate capacity\nlet tape = CharsTapeI32::with_capacity(1024, 100)?; // 1KB data, 100 strings\n\n// Monitor usage\nprintln!(\"Items: {}, Data: {} bytes\", tape.len(), tape.data_len());\n\n// Modify\ntape.clear();           // Remove all items\ntape.truncate(5);       // Keep first 5 items\n\n// Custom allocators\nuse allocator_api2::alloc::Global;\nlet tape = CharsTape::new_in(Global);\n```\n\n### Apache Arrow Interop\n\nTrue zero-copy conversion to/from Arrow arrays:\n\n```rust\n// CharsTape → Arrow (zero-copy)\nlet (data_slice, offsets_slice) = tape.arrow_slices();\nlet data_buffer = Buffer::from_slice_ref(data_slice);\nlet offsets_buffer = OffsetBuffer::new(ScalarBuffer::new(\n    Buffer::from_slice_ref(offsets_slice), 0, offsets_slice.len()\n));\nlet arrow_array = StringArray::new(offsets_buffer, data_buffer, None);\n\n// Arrow → CharsTapeView (zero-copy)\nlet view = unsafe {\n    CharsTapeViewI32::from_raw_parts(\n        arrow_array.values(),\n        arrow_array.offsets().as_ref(),\n    )\n};\n```\n\n`BytesTape` works the same way with Arrow `BinaryArray`/`LargeBinaryArray` types.\n\n### Unsigned Offsets\n\nIn addition to the signed offsets (`i32`/`i64` via `CharsTapeI32`/`CharsTapeI64`),\nthe library also supports unsigned offsets (`u32`/`u64`) when you prefer non-negative indexing:\n\n- `CharsTapeU32`, `CharsTapeU64`\n- `BytesTapeU32`, `BytesTapeU64`\n- `CharsTapeViewU32\u003c'_\u003e`, `CharsTapeViewU64\u003c'_\u003e`\n- `BytesTapeViewU32\u003c'_\u003e`, `BytesTapeViewU64\u003c'_\u003e`\n\nNote, that unsigned offsets cannot be converted to/from Arrow arrays.\n\n## `no_std` Support\n\nStringTape can be used in `no_std` environments:\n\n```toml\n[dependencies]\nstringtape = { version = \"2\", default-features = false }\n```\n\nIn `no_std` mode:\n\n- All functionality is preserved\n- Requires `alloc` for dynamic allocation\n- Error types implement `Display` but not `std::error::Error`\n\n## Testing\n\nRun tests for both `std` and `no_std` configurations:\n\n```bash\ncargo test                          # Test with std (default)\ncargo test --doc                    # Test documentation examples\ncargo test --no-default-features    # Test without std\ncargo test --all-features           # Test with all features enabled\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashvardanian%2Fstringtape","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashvardanian%2Fstringtape","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashvardanian%2Fstringtape/lists"}