An open API service indexing awesome lists of open source software.

https://github.com/carlos-sweb/z-string

ECMAScript String API implementation in Zig - 96.4% spec compliance, UTF-16 indexing, runtime-ready
https://github.com/carlos-sweb/z-string

ecmascript javascript js-runtime runtime spec-compliance string string-api utf16 zig zig-lang

Last synced: 6 days ago
JSON representation

ECMAScript String API implementation in Zig - 96.4% spec compliance, UTF-16 indexing, runtime-ready

Awesome Lists containing this project

README

          

# z-string

**ECMAScript String API implementation in Zig**

[![Zig Version](https://img.shields.io/badge/zig-0.15.2-orange.svg)](https://ziglang.org/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Status](https://img.shields.io/badge/status-active-green.svg)](#-project-status)

A Zig library that implements the ECMAScript 262 String API with full spec compliance. Designed to be the foundation for JavaScript/ECMAScript runtime engines written in Zig.

## ๐ŸŽฏ Project Goals

- **Spec Compliance**: Match ECMAScript 262 String API behavior exactly
- **UTF-16 Indexing**: Use UTF-16 code units for indexing (like JavaScript)
- **Performance**: Efficient implementation leveraging Zig's strengths
- **Runtime Ready**: Built to be integrated into ECMAScript runtime engines

## โœจ Features

### โœ… Implemented (33/33 methods - 100%)

#### Character Access (4 methods)
- `charAt(index)` - Get character at index
- `at(index)` - Get character with negative indexing support
- `charCodeAt(index)` - Get UTF-16 code unit value
- `codePointAt(index)` - Get Unicode code point

#### Search (5 methods)
- `indexOf(searchString, position?)` - Find first occurrence
- `lastIndexOf(searchString, position?)` - Find last occurrence
- `includes(searchString, position?)` - Check if contains substring
- `startsWith(searchString, position?)` - Check if starts with substring
- `endsWith(searchString, length?)` - Check if ends with substring

#### Transform (4 methods)
- `slice(start, end?)` - Extract substring with negative indices
- `substring(start, end?)` - Extract substring (swaps if start > end)
- `concat(...strings)` - Concatenate strings
- `repeat(count)` - Repeat string N times

#### Padding (2 methods)
- `padStart(targetLength, padString?)` - Pad from start
- `padEnd(targetLength, padString?)` - Pad from end

#### Trimming (5 methods)
- `trim()` - Remove whitespace from both ends
- `trimStart() / trimLeft()` - Remove whitespace from start
- `trimEnd() / trimRight()` - Remove whitespace from end

#### Split (1 method)
- `split(separator?, limit?)` - Split string into array

#### Case Conversion (4 methods)
- `toLowerCase()` - Convert to lowercase
- `toUpperCase()` - Convert to uppercase
- `toLocaleLowerCase(locale?)` - Locale-aware lowercase*
- `toLocaleUpperCase(locale?)` - Locale-aware uppercase*

#### Utility (4 methods)
- `toString()` - Get string value
- `valueOf()` - Get primitive value
- `localeCompare(that, locales?, options?)` - Compare strings*
- `normalize(form?)` - Unicode normalization (NFC/NFD/NFKC/NFKD)**

\* Basic implementation without full locale support (ICU integration planned)

\*\* Supports common Latin characters (ร€-รฟ range) with proper decomposition/composition

#### Regex Methods (5 methods) โœ…
- `search(regexp)` - Search with regex
- `match(regexp)` - Match with regex
- `matchAll(regexp)` - Match all with regex
- `replace(searchValue, replaceValue)` - Replace with regex support
- `replaceAll(searchValue, replaceValue)` - Replace all with regex support

## ๐Ÿ“ฆ Installation

### Language Support

z-string can be used from multiple languages:
- **Zig**: Native Zig API (recommended)
- **C**: C-compatible API with manual memory management
- **C++**: Modern C++17 API with RAII and STL integration

๐Ÿ“– **See language-specific guides:**
- **[C.md](C.md)** - Complete guide for C usage
- **[CPP.md](CPP.md)** - Complete guide for C++ usage

### Using Zig Package Manager (0.15.0+)

**Note:** z-string depends on [zregexp](https://github.com/carlos-sweb/zregexp) for regex functionality. You'll need to set it up as a local dependency or wait for published releases.

#### Quick Setup (Local Development)

```bash
# Clone z-string
git clone https://github.com/carlos-sweb/z-string.git
cd z-string

# Clone zregexp dependency
mkdir -p deps
git clone https://github.com/carlos-sweb/zregexp.git deps/zregexp

# Build and test
zig build test
```

#### Future: Package Manager Installation

Once published, you'll be able to add to your `build.zig.zon`:

```zig
.{
.name = "my-project",
.version = "0.1.0",
.dependencies = .{
.zstring = .{
.url = "https://github.com/carlos-sweb/z-string/archive/refs/tags/v0.2.0.tar.gz",
.hash = "1220...", // Use zig fetch to get hash
},
},
}
```

Add to your `build.zig`:

```zig
const zstring = b.dependency("zstring", .{
.target = target,
.optimize = optimize,
});

exe.root_module.addImport("zstring", zstring.module("zstring"));
```

### Manual Installation

```bash
git clone https://github.com/carlos-sweb/z-string.git
cd z-string
zig build test
```

## โš ๏ธ Error Handling

z-string follows Zig's error handling philosophy. All operations that can fail return error unions:

```zig
// โœ… Proper error handling
const upper = try str.toUpperCase(allocator);
defer allocator.free(upper);

// โœ… Handle specific errors
const result = str.toUpperCase(allocator) catch |err| {
std.log.err("Failed: {}", .{err});
return err;
};

// โœ… Check optional returns
const char = try str.at(allocator, 0);
if (char) |c| {
defer allocator.free(c);
// Use c...
}
```

**๐Ÿ“– See [ERROR_HANDLING.md](ERROR_HANDLING.md) for comprehensive error handling guide.**

## ๐Ÿš€ Quick Start

### Zig API

```zig
const std = @import("std");
const zstring = @import("zstring");

pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();

// Create a ZString
const str = zstring.ZString.init("Hello, World!");

// Character access
const char = try str.charAt(allocator, 0);
defer allocator.free(char);
std.debug.print("First char: {s}\n", .{char}); // "H"

// Search
const pos = str.indexOf("World", null);
std.debug.print("Position: {}\n", .{pos}); // 7

// Transform
const upper = try str.toUpperCase(allocator);
defer allocator.free(upper);
std.debug.print("Upper: {s}\n", .{upper}); // "HELLO, WORLD!"

// Split
const parts = try str.split(allocator, ", ", null);
defer zstring.ZString.freeSplitResult(allocator, parts);
std.debug.print("Parts: {s}, {s}\n", .{parts[0], parts[1]}); // "Hello", "World!"
}
```

### C API

```c
#include
#include "zstring.h"

int main(void) {
ZString* str = NULL;

// Create a string
if (zstring_init("Hello, World!", &str) != ZSTRING_OK) {
return 1;
}

// Convert to uppercase
char* upper = NULL;
if (zstring_to_upper_case(str, &upper) == ZSTRING_OK) {
printf("Upper: %s\n", upper); // "HELLO, WORLD!"
zstring_str_free(upper);
}

// Clean up
zstring_free(str);
return 0;
}
```

**Build:** `gcc your_program.c -I./include -L. -lzstring -o your_program`

๐Ÿ“– **See [C.md](C.md) for complete C API documentation.**

### C++ API

```cpp
#include
#include "zstring.hpp"

int main() {
try {
// Create a string (RAII - automatic cleanup)
zstring::String str("Hello, World!");

// Convert to uppercase
auto upper = str.toUpperCase();
std::cout << "Upper: " << upper << std::endl; // "HELLO, WORLD!"

// Split into words
auto words = str.split(" ");
for (const auto& word : words) {
std::cout << "Word: " << word << std::endl;
}

} catch (const zstring::Exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}

return 0;
}
```

**Build:** `g++ -std=c++17 your_program.cpp -I./include -L. -lzstring -o your_program`

๐Ÿ“– **See [CPP.md](CPP.md) for complete C++ API documentation.**

## ๐Ÿ“š Documentation

### Key Concepts

#### UTF-16 Indexing

JavaScript uses UTF-16 code units for string indexing. z-string maintains this behavior for spec compliance:

```zig
const str = zstring.ZString.init("๐Ÿ˜€"); // Emoji (surrogate pair)
std.debug.print("Length: {}\n", .{str.length()}); // 2 (UTF-16 code units)
```

#### Memory Management

Methods that return new strings require explicit memory management:

```zig
const upper = try str.toUpperCase(allocator);
defer allocator.free(upper); // Caller owns the memory
```

#### Borrowed vs Owned Strings

```zig
// Borrowed (no allocation)
const borrowed = zstring.ZString.init("hello");

// Owned (allocated, must call deinit)
var owned = try zstring.ZString.initOwned(allocator, "hello");
defer owned.deinit();
```

### Examples

See the `examples/` directory for complete examples:
- `character_access.zig` - Character access methods
- `search_methods.zig` - Search and indexOf methods
- `transform_methods.zig` - Slice, substring, concat, repeat
- `padding_trimming_methods.zig` - Padding and trimming
- `split_method.zig` - String splitting

Run examples:
```bash
zig build example # Character access
zig build example-search # Search methods
zig build example-transform # Transform methods
zig build example-padding-trimming
zig build example-split
zig build example-errors # Error handling (recommended!)
```

## ๐Ÿงช Testing

```bash
# Run all tests
zig build test

# Run benchmarks
zig build bench
```

**Test Coverage:**
- 372+ tests across all implemented methods
- ECMAScript spec compliance tests
- Unicode and emoji handling tests
- Unicode normalization tests (NFC/NFD/NFKC/NFKD)
- Edge case coverage

## ๐Ÿ—๏ธ Architecture

```
z-string/
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ zstring.zig # Public Zig API entry point
โ”‚ โ”œโ”€โ”€ c_api.zig # C API implementation
โ”‚ โ”œโ”€โ”€ core/
โ”‚ โ”‚ โ”œโ”€โ”€ utf16.zig # UTF-8 โ†” UTF-16 conversion
โ”‚ โ”‚ โ””โ”€โ”€ string.zig # ZString struct
โ”‚ โ””โ”€โ”€ methods/ # Method implementations (grouped by category)
โ”‚ โ”œโ”€โ”€ access.zig # charAt, at, charCodeAt, codePointAt
โ”‚ โ”œโ”€โ”€ search.zig # indexOf, lastIndexOf, includes, etc.
โ”‚ โ”œโ”€โ”€ transform.zig # slice, substring, concat, repeat
โ”‚ โ”œโ”€โ”€ padding.zig # padStart, padEnd
โ”‚ โ”œโ”€โ”€ trimming.zig # trim, trimStart, trimEnd
โ”‚ โ”œโ”€โ”€ split.zig # split
โ”‚ โ”œโ”€โ”€ case.zig # toLowerCase, toUpperCase
โ”‚ โ”œโ”€โ”€ regex.zig # search, match, matchAll, replace, replaceAll
โ”‚ โ”œโ”€โ”€ unicode_normalize.zig # NFC/NFD/NFKC/NFKD normalization
โ”‚ โ””โ”€โ”€ utility.zig # toString, valueOf, localeCompare, normalize
โ”œโ”€โ”€ include/
โ”‚ โ”œโ”€โ”€ zstring.h # C header file
โ”‚ โ””โ”€โ”€ zstring.hpp # C++ header file (RAII wrapper)
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ spec/ # ECMAScript spec compliance tests
โ”‚ โ””โ”€โ”€ benchmarks/ # Performance benchmarks
โ”œโ”€โ”€ examples/ # Usage examples
โ”œโ”€โ”€ C.md # Complete C API documentation
โ””โ”€โ”€ CPP.md # Complete C++ API documentation
```

## ๐Ÿ”ฎ Roadmap

### Phase 1: Core Methods โœ… (Complete - 100%)
- [x] Character access methods
- [x] Search methods (literal)
- [x] Transform methods
- [x] Padding and trimming
- [x] Split (literal)
- [x] Case conversion
- [x] Utility methods
- [x] Unicode normalization (NFC/NFD/NFKC/NFKD)

### Phase 2: Regex Integration โœ… (Complete - 100%)
- [x] Integrate zregexp as dependency
- [x] Implement search() with regex
- [x] Implement match() and matchAll()
- [x] Implement replace() and replaceAll() with regex
- [x] Comprehensive test coverage for regex methods

### Phase 3: Advanced Features ๐Ÿ”ฎ (Future)
- [ ] Full locale support (ICU integration)
- [ ] Extended Unicode normalization (full UCD coverage beyond Latin-1)
- [ ] Locale-aware case mapping (Turkish ฤฐ/i, etc.)

## ๐Ÿค Contributing

Contributions are welcome! This project is actively maintained.

### Development Setup

```bash
git clone https://github.com/carlos-sweb/z-string.git
cd z-string
zig build test
```

### Guidelines

- Follow ECMAScript 262 specification exactly
- Maintain UTF-16 indexing compatibility
- Include comprehensive tests for all changes
- Document public APIs with examples

## ๐Ÿ“„ License

MIT License - see [LICENSE](LICENSE) file for details.

## ๐Ÿ”— Related Projects

- **libzregexp** (in development) - Zig regex engine for ECMAScript compatibility
- **Zig Standard Library** - UTF-8/UTF-16 utilities

## ๐Ÿ“Š Project Status

**Current Version:** 0.3.0 (Development)

**Compatibility:**
- โœ… 33/33 methods implemented (100%)
- โœ… 28/28 non-regex methods (100%)
- โœ… 5/5 regex methods (100%)
- โœ… Full Unicode normalization for common Latin characters

**Production Ready:** Complete - all ECMAScript String API features available

โœ… **Project Status: ACTIVE**

All methods have been successfully implemented! The project now provides **complete ECMAScript 262 String API compatibility** with 100% of methods implemented, including full Unicode normalization (NFC/NFD/NFKC/NFKD) for common Latin characters.

**Dependency Architecture:**
- z-string depends on zregexp (one-way dependency)
- No circular dependencies
- Clean separation of concerns

## ๐Ÿ™ Acknowledgments

- ECMAScript 262 specification
- Zig community
- All contributors

---

**Note:** For questions or discussions about the project architecture, please open an issue.