https://github.com/hammerlab/sosa
The Sane OCaml String API
https://github.com/hammerlab/sosa
Last synced: about 1 year ago
JSON representation
The Sane OCaml String API
- Host: GitHub
- URL: https://github.com/hammerlab/sosa
- Owner: hammerlab
- License: isc
- Created: 2014-10-21T20:19:55.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2017-08-24T20:07:34.000Z (almost 9 years ago)
- Last Synced: 2025-04-04T09:44:41.940Z (about 1 year ago)
- Language: OCaml
- Homepage: http://www.hammerlab.org/docs/sosa/master/index.html
- Size: 309 KB
- Stars: 27
- Watchers: 7
- Forks: 7
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Sane OCaml String API
=====================
This library is a set of APIs defined with module types, and a set of
modules and functors implementing one or more of those interfaces.
The APIs define what a *character* and a *string of characters* should
be.
See the [INSTALL](INSTALL.md) file for build instructions and/or the
[documentation](http://www.hammerlab.org/docs/sosa/master/index.html) website.
The library is “packed” in the `Sosa` toplevel module name.
Module Types (APIs)
-------------------
We have, in the sub-module `Api`:
- `BASIC_CHARACTER`: characters of any length.
- `NATIVE_CONVERSIONS`: functions to transform from/to native OCaml
strings.
- `BASIC_STRING`: immutable strings of (potentially abstract)
characters:
- includes `NATIVE_CONVERSIONS`,
- contains a functor to provide a thread agnostic `output` function:
`Make_output`: `OUTPUT_MODEL` → `sig val output: ... end`.
- `UNSAFELY_MUTABLE`: mutability of some string implementations
(“unsafe” meaning that they break immutability
invariants/assumptions).
- `MINIMALISTIC_MUTABLE_STRING`: abstract mutable string used as
argument of the `Of_mutable` functor.
Implementations
---------------
### Native OCaml Characters
The `Native_character` module implements `BASIC_CHARACTER` with
OCaml's `char` type.
### Native OCaml Strings
The `Native_string` module implements `BASIC_STRING` with OCaml's `string` type
considered immutable (and hence `Native_character`).
### Native Mutable OCaml Strings (Bytes)
The `Native_bytes` module implements `BASIC_STRING`
and `UNSAFELY_MUTABLE` with OCaml's `bytes` type.
### Lists Of Arbitrary Characters
`List_of` is a functor: `BASIC_CHARACTER` → `BASIC_STRING`, i.e., it creates a
string datastructure made of a list of characters.
### Build From Basic Mutable Data-structures
The functor `Of_mutable` uses an implementation of
`MINIMALISTIC_MUTABLE_STRING` to build a `BASIC_STRING`.
### Integer UTF-8 Characters
The `Int_utf8_character` module implements `BASIC_CHARACTER` with
OCaml integers (`int`) representing Utf8 characters (we force the
handling of not more than 31 bits, even if [RFC 3629][RFC3629]
restricts them to end at U+10FFFF, c.f. also
[wikipedia][wikipedia:UTF-8]). Note that the function `is_whitespace` considers
only ASCII whitespace (useful while writing parsers for example).
Examples, Tests, and Benchmarks
-------------------------------
See the file [`test/main.ml`](src/test/main.ml) for usage examples, the
library is tested with:
- native strings and characters,
- lists of native characters (`List_of(Native_character)`),
- lists of integers representing UTF-8 characters (`List_of(utf8-int array)`),
- arrays of integers representing UTF-8 characters (`Of_mutable(utf8-int)`),
- bigarrays of 8-bit integers (`Of_mutable(int8 Bigarray1.t)`).
The tests depend on the [Nonstd](https://bitbucket.org/smondet/nonstd),
`unix`, and `bigarray` libraries:
make test
./sosa_tests
and you may add the basic benchmarks to the process with:
./sosa_tests bench
[wikipedia:UTF-8]: http://en.wikipedia.org/wiki/UTF-8
[RFC3629]: http://tools.ietf.org/html/rfc3629