https://github.com/edjcase/motoko_url_kit

A library with Url helper utilities for parsing and manipulation
https://github.com/edjcase/motoko_url_kit
Last synced: 5 months ago
JSON representation
A library with Url helper utilities for parsing and manipulation
Host: GitHub
URL: https://github.com/edjcase/motoko_url_kit
Owner: edjCase
License: mit
Created: 2025-06-23T16:38:28.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-23T22:26:39.000Z (about 1 year ago)
Last Synced: 2025-06-23T23:24:01.059Z (about 1 year ago)
Language: Motoko
Size: 21.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # URL Kit

[![MOPS](https://img.shields.io/badge/MOPS-url--kit-blue)](https://mops.one/url-kit)

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/edjcase/motoko_url_kit/blob/main/LICENSE)

A comprehensive URL parsing and manipulation library for Motoko on the Internet Computer.

## Overview

URL Kit is a robust library designed to handle all aspects of URL processing in Motoko applications. It provides complete RFC-compliant URL parsing, validation, manipulation, and encoding/decoding capabilities with support for various host types, optional domain parsing, and comprehensive error handling.

Key features:

-   🔍 **Complete URL Parsing**: Parse any URL into structured components (scheme, authority, path, query, fragment)

-   🌐 **Multi-Host Support**: Handle host names, IPv4, and IPv6 addresses with proper validation

-   🏷️ **Flexible Domain Parsing**: Optional domain parsing with comprehensive or custom suffix lists

-   🔗 **URL Manipulation**: Add, remove, and modify query parameters with ease

-   🔄 **Encoding/Decoding**: Proper URL encoding and decoding with UTF-8 support

-   ⚖️ **Normalization**: Normalize URLs for accurate comparison and deduplication

-   🛡️ **Validation**: Comprehensive validation with detailed error messages

-   🎯 **Type Safety**: Strongly typed URL components for compile-time safety

-   📊 **IPv6 Support**: Full IPv6 address parsing with compression and various formats

-   🔧 **Path Handling**: Flexible path parsing with custom separators and normalization

## Package

### MOPS

```bash

mops add url-kit

```

To setup MOPS package manager, follow the instructions from the [MOPS Site](https://mops.one)

## Quick Start

Here's a simple example to get started with URL parsing:

```motoko

import UrlKit "mo:url-kit";

// Parse a URL - no domain parser needed for basic parsing

let urlResult = UrlKit.fromText("https://api.example.com:8080/users?page=1&limit=10#results");

let url = switch (urlResult) {

    case (#ok(url)) url;

    case (#err(errorMsg)) {

        // Handle parsing error

    };

};

// Access URL components

// url = {

     scheme = ?"https";

     authority = ?{

        user = null;

        host = #name("api.example.com");  // Host names are parsed as simple strings

        port = ?8080;

     };

     path = { segments = ["users"]; trailingSlash = false };

     queryParams = [("page", "1"), ("limit", "10")];

     fragment = ?"results"

// }

// Get specific query parameter

let page = UrlKit.getQueryParam(url, "page"); // ?"1"

// Add query parameters

let urlWithAuth = UrlKit.addQueryParam(url, ("token", "abc123"));

// Convert back to text

let urlText = UrlKit.toText(urlWithAuth);

```

## Comprehensive Example

Here's a more detailed example showing various URL manipulation capabilities:

```motoko

import UrlKit "mo:url-kit";

import Host "mo:url-kit/Host";

// Parse different types of URLs

let examples = [

    "https://user:pass@sub.example.com:8080/api/v1/users?page=1&sort=name#section1",

    "http://192.168.1.1:3000/dashboard",

    "https://[2001:db8::1]:8443/secure",

    "file:///path/to/file.txt",

    "//cdn.example.com/assets/style.css"

];

for (urlText in examples.vals()) {

    switch (UrlKit.fromText(urlText)) {

        case (#ok(url)) {

            // Analyze the URL structure

            switch (url.authority) {

                case (?authority) {

                    // Check host type

                    switch (authority.host) {

                        case (#name(hostName)) {

                            // Host name (domain name, hostname, etc.)

                            // For domain parsing, use the separate domain parsers

                        };

                        case (#ipV4(ip)) {

                            // IPv4 address: (192, 168, 1, 1)

                            let hostText = Host.toText(authority.host, authority.port);

                        };

                        case (#ipV6(ip)) {

                            // IPv6 address with proper formatting

                            let hostText = Host.toText(authority.host, authority.port);

                        };

                    };

                    // Check for user authentication

                    switch (authority.user) {

                        case (?userInfo) {

                            let username = userInfo.username;

                            let password = userInfo.password;

                        };

                        case (null) {};

                    };

                };

                case (null) {

                    // No authority (e.g., mailto:, file: schemes)

                };

            };

            // Manipulate query parameters

            let withParams = url

                |> UrlKit.addQueryParam(_, ("timestamp", "123456789"))

                |> UrlKit.addQueryParamMulti(_, [("version", "v2"), ("format", "json")])

                |> UrlKit.removeQueryParam(_, "page");

            // Normalize for comparison

            let normalizeOpts = { usernameIsCaseSensitive = false; pathIsCaseSensitive = false; queryKeysAreCaseSensitive = false; removeEmptyPathSegments = true; resolvePathDotSegments = true; preserveTrailingSlash = false };

            let normalized = UrlKit.normalize(withParams, normalizeOpts);

            // Convert back to string

            let finalUrl = UrlKit.toText(normalized);

        };

        case (#err(error)) {

            // Handle parsing errors with detailed messages

        };

    };

};

```

## Core API

### URL Type

The core `Url` type represents a parsed URL with all its components:

```motoko

public type Url = {

    scheme : ?Text;           // "https", "http", "mailto", etc.

    authority : ?Authority;   // Host, port, and user info

    path : Path.Path;        // Path segments with trailing slash flag

    queryParams : [(Text, Text)]; // Query parameters as key-value pairs

    fragment : ?Text;        // Fragment identifier

};

public type Authority = {

    user : ?UserInfo;        // Username and password

    host : Host.Host;        // Host name or IP address

    port : ?Nat16;          // Port number

};

```

### Parsing and Conversion

```motoko

// Parse URL from text

UrlKit.fromText(url : Text) : Result.Result

// Convert URL back to text

UrlKit.toText(url : Url) : Text

// Normalize URL for comparison

UrlKit.normalize(url : Url, options : NormalizationOptions) : Url

// Compare URLs for equality

UrlKit.equal(url1 : Url, url2 : Url, options : NormalizationOptions) : Bool

```

### Query Parameter Manipulation

```motoko

// Get query parameter value

UrlKit.getQueryParam(url : Url, key : Text) : ?Text

// Add single query parameter

UrlKit.addQueryParam(url : Url, param : (Text, Text)) : Url

// Add multiple query parameters

UrlKit.addQueryParamMulti(url : Url, params : [(Text, Text)]) : Url

// Remove query parameter by key

UrlKit.removeQueryParam(url : Url, key : Text) : Url

// Remove multiple query parameters

UrlKit.removeQueryParamMulti(url : Url, keys : [Text]) : Url

```

### Encoding and Decoding

```motoko

// URL encode text (percent encoding)

UrlKit.encodeText(value : Text, hexIsUpperCase : Bool) : Text

// URL decode text

UrlKit.decodeText(value : Text) : Result.Result

```

## Host Types

URL Kit supports various host types with proper validation:

### Host Names

```motoko

import Host "mo:url-kit/Host";

// Parse host with optional port

let hostResult = Host.fromText("example.com:8080");

// Result: (#name("example.com"), ?8080)

// Convert host to text

let hostText = Host.toText(host);

// Normalize host (lowercase)

let normalized = Host.normalize(host);

```

### IPv4 Addresses

```motoko

import IpV4 "mo:url-kit/IpV4";

// Parse IPv4 address

let ipResult = IpV4.fromText("192.168.1.1");

// Result: (192, 168, 1, 1)

// Convert back to text

let ipText = IpV4.toText(ip); // "192.168.1.1"

```

### IPv6 Addresses

```motoko

import IpV6 "mo:url-kit/IpV6";

// Parse IPv6 address (supports compression and various formats)

let ipResult = IpV6.fromText("2001:db8::1");

// Convert to text with default options (compressed format, lowercase)

let ipText = IpV6.toText(ip); // "2001:db8::1"

// Convert to text with custom formatting options

let full = IpV6.toTextAdvanced(ip, { format = #full; isUpperCase = false });        // "2001:0db8:0000:0000:0000:0000:0000:0001"

let standard = IpV6.toTextAdvanced(ip, { format = #standard; isUpperCase = false }); // "2001:db8:0:0:0:0:0:1"

let compressed = IpV6.toTextAdvanced(ip, { format = #compressed; isUpperCase = true }); // "2001:DB8::1"

```

### Host Parsing and Formatting

```motoko

import Host "mo:url-kit/Host";

// Parse host with port

let hostResult = Host.fromText("example.com:8080");

// Result: (#name("example.com"), ?8080)

// Convert host to text (with optional port)

let hostText = Host.toText(host, port);

// Normalize host (lowercase)

let normalized = Host.normalize(host);

```

## Path Handling

## Path Handling

```motoko

import Path "mo:url-kit/Path";

// Parse path from text

let path = Path.fromText("/api/v1/users");

// Result: { segments = ["api", "v1", "users"]; trailingSlash = false }

// Convert path back to text

let pathText = Path.toText(path); // "/api/v1/users"

// Join path segments

let newPath = Path.join(path, "123");

let newNewPath = Path.joinMulti(newPath, ["456", "profile"]);

// Result: { segments = ["api", "v1", "users", "123", "456", "profile"]; trailingSlash = false }

// Normalize path with options

let options = { isCaseSensitive = false; removeEmptySegments = true; resolveDotSegments = true; preserveTrailingSlash = false };

let normalized = Path.normalize(path, options);

```

## URL Examples

### Basic HTTP/HTTPS URLs

```motoko

// Simple HTTPS URL

"https://example.com"

// URL with port and path

"https://api.example.com:8080/v1/users"

// URL with query parameters

"https://example.com/search?q=motoko&type=repo"

// URL with fragment

"https://docs.example.com/guide#installation"

// Complete URL with all components

"https://user:pass@api.example.com:8080/v1/users?page=1&limit=10#results"

```

### IP Address URLs

```motoko

// IPv4 address

"http://192.168.1.1:3000/dashboard"

// IPv6 address (note the brackets)

"https://[2001:db8::1]:8443/api"

// IPv6 with embedded IPv4

"http://[::ffff:192.168.1.1]/mixed"

```

### Special Schemes

```motoko

// File URLs

"file:///path/to/file.txt"

"file://server/share/document.pdf"

// Custom schemes

"custom-protocol://data.example.com/resource"

```

### Relative URLs

```motoko

// Protocol-relative URL

"//cdn.example.com/assets/style.css"

// Absolute path

"/api/users/123"

// Query only

"?search=term"

// Fragment only

"#section1"

```

## Domain Parsing

As of v3.0, domain parsing has been separated from basic host parsing. The Host type now simply stores names as strings (`#name`), and domain parsing is handled by dedicated domain parsers when needed.

### Domain Parsers

URL Kit provides two types of domain parsers:

#### Comprehensive Domain Parser (Recommended)

Uses the complete Public Suffix List for accurate domain parsing:

```motoko

import ComprehensiveDomainParser "mo:url-kit/ComprehensiveDomainParser";

import Domain "mo:url-kit/Domain";

// Create a comprehensive domain parser

let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

// Parse a domain name

let domainResult = domainParser.parse("blog.github.io");

// Result: #ok({ name = "github"; suffix = "io"; subdomains = ["blog"] })

switch (domainResult) {

    case (#ok(domain)) {

        let name = domain.name; // "github"

        let suffix = domain.suffix; // "io"

        let subdomains = domain.subdomains; // ["blog"]

    };

    case (#err(msg)) {

        // Handle parsing error

    };

};

```

#### Custom Domain Parsing

For custom domain suffix lists, use the Domain module directly:

```motoko

import Domain "mo:url-kit/Domain";

// Parse with custom suffixes

let customSuffixes = ["com", "org", "test", "local"];

let domainResult = Domain.fromText("api.example.com", customSuffixes);

// Result: #ok({ name = "example"; suffix = "com"; subdomains = ["api"] })

```

#### Custom Domain Parsing

For custom domain suffix lists, use the Domain module directly:

```motoko

import Domain "mo:url-kit/Domain";

// Parse with custom suffixes

let customSuffixes = ["com", "org", "test", "local"];

let domainResult = Domain.fromText("api.example.com", customSuffixes);

// Result: #ok({ name = "example"; suffix = "com"; subdomains = ["api"] })

```

### Domain Operations

```motoko

import Domain "mo:url-kit/Domain";

// Parse domain with specified suffixes

let domainResult = Domain.fromText("blog.github.io", ["github.io", "io"]);

// Validate domain structure

let validation = Domain.validate(domain);

// Convert domain to text

let domainText = Domain.toText(domain);

// Normalize domain (lowercase)

let normalized = Domain.normalize(domain);

```

### Integration with URLs

When you need domain parsing for URLs, extract the host name and parse it separately:

```motoko

import UrlKit "mo:url-kit";

import ComprehensiveDomainParser "mo:url-kit/ComprehensiveDomainParser";

let url = switch (UrlKit.fromText("https://blog.example.com/path")) {

    case (#ok(u)) u;

    case (#err(_)) return; // Handle error

};

// Extract host name for domain parsing

switch (url.authority) {

    case (?authority) {

        switch (authority.host) {

            case (#name(hostName)) {

                // Parse the host name as a domain

                let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

                let domainResult = domainParser.parse(hostName);

                switch (domainResult) {

                    case (#ok(domain)) {

                        // Work with parsed domain components

                        let rootDomain = domain.name # "." # domain.suffix; // "example.com"

                    };

                    case (#err(_)) {

                        // Host name is not a valid domain (e.g., IP address, localhost)

                    };

                };

            };

            case (#ipV4(_) or #ipV6(_)) {

                // IP addresses don't need domain parsing

            };

        };

    };

    case (null) {};

};

```

## Domain Suffix List

URL Kit includes an automatically generated domain suffix list based on the [Public Suffix List](https://publicsuffix.org/) for accurate domain parsing. The suffix list helps distinguish between domain names and subdomains.

The [`ComprehensiveDomainParser`](src/ComprehensiveDomainParser.mo) uses this comprehensive list for accurate domain parsing, while you can also create custom parsers with [`Domain.fromText`](src/Domain.mo) for specific use cases.

### Updating the Suffix List

The domain suffix list should be updated periodically to include new top-level domains and suffixes. Run the provided script to regenerate the list:

```bash

# Requires Python 3 and internet connection

./scripts/rebuild_suffix_list.sh

```

This script:

1. Downloads the latest Public Suffix List from https://publicsuffix.org/

2. Processes and filters the data

3. Generates a new `src/data/DomainSuffixData.mo` file with the current suffixes

4. Structures the data as an efficient tree for fast lookups

The generated file contains a compressed tree structure that allows for efficient suffix matching during domain parsing. You should run this script periodically (e.g., monthly) to keep the suffix list current.

## Performance

URL Kit is designed for performance with:

-   **Efficient Domain Matching**: Tree-based suffix lookup for O(log n) domain validation

-   **Minimal Allocations**: Careful memory management during parsing

-   **Lazy Evaluation**: Components are parsed only when needed

-   **Optimized String Operations**: Efficient text processing for encoding/decoding

-   **Compressed Suffix Data**: Compact representation of the public suffix list

## Testing

Run the comprehensive test suite:

```bash

mops test

```

The test suite covers:

-   URL parsing success and failure cases

-   All host type variations (domains, IPv4, IPv6, hostnames)

-   Query parameter manipulation

-   URL encoding/decoding edge cases

-   Normalization and equality comparison

-   IPv6 address formatting variations

-   Domain parser functionality

-   Error handling scenarios

## Migration Guide

### Breaking Changes in v3.0

1. **Host Type Simplified**: The Host type has been simplified by consolidating `#domain` and `#hostname` variants into a single `#name` variant:

    ```motoko

    // Old (v2.x)

    switch (host) {

        case (#domain(domain)) { /* domain components */ };

        case (#hostname(name)) { /* hostname string */ };

        case (#ipV4(ip)) { /* IPv4 address */ };

        case (#ipV6(ip)) { /* IPv6 address */ };

    };

    // New (v3.0)

    switch (host) {

        case (#name(hostName)) { /* any host name string */ };

        case (#ipV4(ip)) { /* IPv4 address */ };

        case (#ipV6(ip)) { /* IPv6 address */ };

    };

    ```

2. **Domain Parser Removed from URL Parsing**: URL parsing no longer requires a domain parser parameter:

    ```motoko

    // Old (v2.x)

    import ComprehensiveDomainParser "mo:url-kit/ComprehensiveDomainParser";

    let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

    UrlKit.fromText("https://example.com", domainParser)

    // New (v3.0)

    UrlKit.fromText("https://example.com")

    ```

3. **Separate Domain Parsing**: Domain parsing is now a separate step when needed:

    ```motoko

    // Old (v2.x) - automatic domain parsing

    let url = UrlKit.fromText("https://blog.example.com", domainParser);

    switch (url.authority.host) {

        case (#domain(domain)) {

            let name = domain.name; // "example"

            let suffix = domain.suffix; // "com"

            let subdomains = domain.subdomains; // ["blog"]

        };

    };

    // New (v3.0) - explicit domain parsing when needed

    let url = UrlKit.fromText("https://blog.example.com");

    switch (url.authority.host) {

        case (#name(hostName)) {

            let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

            switch (domainParser.parse(hostName)) {

                case (#ok(domain)) {

                    let name = domain.name; // "example"

                    let suffix = domain.suffix; // "com"

                    let subdomains = domain.subdomains; // ["blog"]

                };

                case (#err(_)) { /* not a valid domain */ };

            };

        };

    };

    ```

### Breaking Changes in v2.0

1. **Domain Parser Required**: All URL parsing functions now require a `domainParser` parameter:

    ```motoko

    // Old (v1.x)

    UrlKit.fromText("https://example.com")

    // New (v2.0)

    import ComprehensiveDomainParser "mo:url-kit/ComprehensiveDomainParser";

    let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

    UrlKit.fromText("https://example.com", domainParser)

    ```

2. **Domain Parsing**: Domain parsing is now more flexible with custom suffix support:

    ```motoko

    // Using comprehensive parser (recommended)

    let domainParser = ComprehensiveDomainParser.ComprehensiveDomainParser();

    let result = domainParser.parse("example.com");

    // Using custom suffixes

    let result = Domain.fromText("example.test", ["test", "local"]);

    ```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. When contributing:

1. Add tests for new functionality

2. Update documentation for API changes

3. Follow existing code style and patterns

4. Ensure all tests pass

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/edjcase/motoko_url_kit

Awesome Lists containing this project

README