https://github.com/stewsquared/rosalind-long-python
Solution to http://rosalind.info/problems/long/
https://github.com/stewsquared/rosalind-long-python
Last synced: 10 months ago
JSON representation
Solution to http://rosalind.info/problems/long/
- Host: GitHub
- URL: https://github.com/stewsquared/rosalind-long-python
- Owner: stewSquared
- Created: 2016-10-20T16:24:54.000Z (over 9 years ago)
- Default Branch: optimized
- Last Pushed: 2016-11-29T00:51:15.000Z (over 9 years ago)
- Last Synced: 2025-03-05T00:41:29.981Z (over 1 year ago)
- Language: Python
- Size: 326 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Usage
Mark executable or run with `python2 rosalind.py`. Takes a filename
(defaults to "rosalind_long.txt") and prints assembled strand.
## Caveats
`find_overlap` assumes (without loss of correctness) that left and
right are roughly the same length. Complexity is O(len(left)) rather
than O(min_overlap). Superstring relationships are not detected.
`adjacency_list` is not technically an adjacency list, whose type
would map `String -> List[String]` rather than `String -> String`.
This is fine for our case, where the path is linear and unique.
I could return some sort of overlap object/tuple/datastructure that
stores the length of overlap, rather than just a string, so I don't
have to recompute overlap during assembly. I don't for two reasons:
- find_overlap is only called O(N) times during assembly vs O(N^2)
times during adjacency. Not likely to make 1% difference.
- By not overdesigning for this special-case optimization, code is
cleaner, more intuitive, and more re-useable.
I call `find_overlap` in `assemble` without first performing a None
check. This is safe, because of the pre-computed `adj`. But the
compiler doesn't know that. Renegade for life ;)
There are no tests; I used the REPL to develop. That said, code is
modular and testable, since this is TDD with tests thrown out.
Documentation is just comments with Scala type signatures. Sorry.
### Performance
(`time` 50, 500, and 5000 reads on a Xeon processor)
```
real 0m0.016s 0m0.232s 0m20.159s
user 0m0.010s 0m0.223s 0m20.140s
sys 0m0.003s 0m0.007s 0m0.017s
```