Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tanmaykm/mutablestrings.jl

Mutable string types for Julia
https://github.com/tanmaykm/mutablestrings.jl

Last synced: 13 days ago
JSON representation

Mutable string types for Julia

Awesome Lists containing this project

README

        

# Mutable Strings

Note: This package is now deprecated in favor of (see )

Large scale text processing often requires several changes to be made on large string objects. Using immutable strings can result in significant inefficiencies in such cases. Using byte arrays directly prevents us from using the convenient string methods. This package provides Mutable ASCII and UTF8 string types that allow mutating the string data through the familiar string methods.

## Types
- MutableASCIIString: `immutable MutableASCIIString <: DirectIndexString`
- MutableUTF8String: `immutable MutableUTF8String <: String`
- MutableString: `typealias MutableString Union(MutableASCIIString, MutableUTF8String)`

## Methods
All methods on immutable strings can also be applied to a MutableString. Additionally the below methods allow modifications on MutableString objects:

### Case Conversion

- `uppercase!(s::MutableString)` : In-place uppercase conversion
- `lowercase!(s::MutableString)` : In-place lowercase conversion
- `ucfirst!(s::MutableString)` : Convert the first letter to uppercase in-place
- `lcfirst!(s::MutableString)` : Convert the first letter to lowercase in-place

### Search/Replace
The usual `search` methods on String type also applies to MutableStrings.

`replace!(s::MutableString, pattern, repl::Union(ByteString,Char,Function), limit::Integer=0)`

The above method allows in-place replacement of patterns matching `pattern` with `repl` upto `limit` occurrences. If `limit` is zero, all occurrences are replaced.

As with search, the `pattern` argument may be a single character, a vector or a set of characters, a string, or a regular expression.

If `repl` is a ByteString, it replaces the matching region. If it is a Char, it replaces each character of the matching region. If `repl` is a function, it must accept a SubString representing the matching region and return either a Char or a ByteString to be used as the replacement.

### Others

- `setindex!(s::MutableString, x, i0::Real)`
- `setindex!(s::MutableString, r::ByteString,I::Range1{T<:Real})`
- `setindex!(s::MutableString, c::Char, I::Range1{T<:Real})`
- `reverse!(s::MutableString)`
- `map!(f, s::MutableString)`

Parts of a mutable string can be modified as:

````
s[10] = 'A'
s[12:14] = "ABC"
````

### Performance

- Most operations on a MutableString are faster than those on an immutable String.
- Replacing segments of mutable strings with different length replacements is slower than recreating the entire string.
- MutableStrings are always more memory efficient than immutable Strings.

ASCIIStringMutableASCIIString
functiontimebytestimebytes
case conversion0.004997000800.004760
reverse0.01057113840.00100
regex search and blank out matches0.006799170000.0029564
regex search and delete matches0.0249561440721.01742292768

### Notes
- Significant code has been duplicated from Julia base to specialize the MutableString methods. A proper type-reorganization would eliminate this.
- The hash method on MutableString behaves similar to that on String. This can result in surprises when it is used as a key in collections.
- Since UTF8 has variable character byte lengths, MutableUTF8String also allows replacing segments of the string with arbitrary length replacements, e.g: `s[10] = "ABC"`. This is inconsistent with behavior of MutableASCIIString, and remains to be debated.