https://github.com/maskray/position-heap
Implementation of position heaps
https://github.com/maskray/position-heap
Last synced: 12 months ago
JSON representation
Implementation of position heaps
- Host: GitHub
- URL: https://github.com/maskray/position-heap
- Owner: MaskRay
- Created: 2011-06-29T01:57:37.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2011-07-11T08:11:31.000Z (over 14 years ago)
- Last Synced: 2025-01-29T02:27:09.699Z (about 1 year ago)
- Language: C++
- Homepage:
- Size: 97.7 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+TITLE: position-heap
#+AUTHOR: Ray Song (a.k.a. MaskRay)
#+DATE: 2011-06-29
#+LANGUAGE: en
#+OPTIONS: num:t toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t
* Description
/position-heap/ is an implementation of position heaps which is
inspired by _Ross McConnell_'s implementation:
http://www.cs.colostate.edu/PositionHeaps/. A position heap is a
simple and dynamic text indexing data structure.
* References
_Andrzej Ehrenfeucht_, _Ross M. McConnell_, _Nissa Osheim_,
_Sung-Whan Woo_, /Position heaps: A simple and dynamic text indexing
data structure/, J. Discrete Algorithms 9(1): 100-121 (2011).
* Usage
This project solves the problem of finding all occurrences of a
pattern string *P* in a text *T*. It will preprocess *T* (by
building a /position heap/) to speed up successive searches.
Let *n* denote the length of *T*, *m* denote the length of *S* and
*k* denote the number of occurrences.
The code implements two construction algorithms as follows:
- /naiveBuild/ means a simple algorithm to construct the /position
heap/ with O(n^2) worst-case time bound (O(n log n) expected time
for a randomly generated text).
- /fastBuild/ means an algorithm to construct the /position heap/
with Θ(n) time bound.
And two search algorithms:
- /naiveSearch/ means a simple search algorithm with O(m^2+k)
worst-case time bound. This bound is pessimistic for it takes
O(m+k) expected time when either the text or the pattern is
randomly generated.
- /fastSearch/ means a construction algorithm with Θ(m+k) time
bound. This will do some extra work when constructing the
/position heap/.
Note: The alphabet-size *Σ* factor is omitted in the time bounds. This
project is just a simple implementation, so it uses arrays to simplify
codes. You may use balanced binary search trees, hash tables or some other
data structures to replace arrays.
For either step (construction and search), you have two choices, so
you can combine them freely to get four different programs:
- fastBuildFastSearch
- fastBuildNaiveSearch
- naiveBuildFastSearch
- naiveBuildNaiveSearch
Their meanings are self-explaining.
Each program receives two strings seperated by whitespace
representing *T* and *S* respectively, and prints out all
occurrences of *S* in *T*. The leftmost position of *S* is labeled
*0*. These positions may not be ascending.
* Examples
~% make fastBuildFastSearch~
~g++ -g -D LINEAR_BUILD -D LINEAR_SEARCH main.cpp -o fastBuildFastSearch~
~% ./fastBuildFastSearch~
~aabcabc~
~ab~
~4~
~1~
*4 1* is returned by /fastBuildFastSearch/ which means the pattern
*ab* have two occurrences at position *4* and *1* in the text
*aabcabc*.