https://github.com/rossmerr/bwt
Burrows-Wheeler Transform
https://github.com/rossmerr/bwt
bwt bwt-transform go golang
Last synced: 4 months ago
JSON representation
Burrows-Wheeler Transform
- Host: GitHub
- URL: https://github.com/rossmerr/bwt
- Owner: rossmerr
- License: mit
- Created: 2022-07-13T18:49:31.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-03-05T11:43:11.000Z (over 3 years ago)
- Last Synced: 2023-08-11T23:29:56.288Z (almost 3 years ago)
- Topics: bwt, bwt-transform, go, golang
- Language: Go
- Homepage:
- Size: 34.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Burrows-Wheeler Transform (BWT)
[](https://github.com/rossmerr/bwt/actions/workflows/go.yml)
[](https://goreportcard.com/report/github.com/rossmerr/bwt)
[](https://pkg.go.dev/github.com/rossmerr/bwt)
Rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding.
Given the following int `abaaba`
The Burrows-Wheeler Matrix would look like the this :-
```
abaaba
aabaab
aabaab
abaaba
abaaba
baabaa
baabaa
```
The last column is then `abbaaa`, were the `` rune is the End-of-Text code.
```go
matrix, err := bwt.Matrix("abaaba") // 'matrix' is a [][]rune
```
```go
last, err := bwt.Last("abaaba") // 'last' is a []rune
fmt.Println(string(last)) // abbaaa
```
```go
first, last, err := bwt.FirstLast("abaaba") // 'first' is a []rune
fmt.Println(string(first)) // aaaabb
fmt.Println(string(last)) // abbaaa
```
```go
str := "abaaba"
text := []rune(str)
first, last, sa, err := bwt.FirstLastSuffix(str) // 'sa' is a suffixarray.Suffix
fmt.Println(string(first)) // aaaabb
fmt.Println(string(last)) // abbaaa
// You want to find the original offset of the first 'b' in the 'str'
// 6 is the index of rune 'b' from the first column,
//
// you could range over 'first' to find the index
// as the 'first' column of the BWT has consecutivity
// so we would know the first 'b' must have been the first 'b' in the 'str'
offset := sa.Get(6)
fmt.Println(offset) // 1
fmt.Println(string(text[offset])) // b
```