https://github.com/lodestone/uss
https://github.com/lodestone/uss
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lodestone/uss
- Owner: lodestone
- License: mit
- Created: 2015-11-05T08:05:58.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-11-05T08:07:24.000Z (over 9 years ago)
- Last Synced: 2025-01-12T11:49:22.116Z (4 months ago)
- Language: HTML
- Size: 211 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README-ALSO.adoc
- License: LICENSE.txt
Awesome Lists containing this project
README
= README-ALSO
By Matthew Petty:numbered!:
== _Because ASCIIDOC is more fun!_
== USS: Universal String Searcher
=== Unique Searcher:
This library will, given a
corpus,footnote:[A corpus should consist of words or phrases separated by a new line.]
write out two new files:* `uniques.txt`
* `stems.txt`The `uniques.txt` file contains every sequence of
four footnote:[By default this library searches for a unique string length of 4, but it can be customized.]
letters `[A-z]` that appear in exactly one word of the dictionary corpus, one sequence per line.The `stems.txt` file contains the corresponding words that contain the
sequences/uniques, in the same order, again one per line.For example, given the trivial dictionary containing only:
[source,plain]
arrows
carrots
give
meThe outputs should be:
[source,plain]
'sequences' 'words'
carr carrots
give give
rots carrots
rows arrows
rrot carrots
rrow arrowsOf course, `arro` does not appear in the output, since it is found in more than one word.