Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adamtaranto/te-dispersion-metric
Proposed metric for measuring the clustering of transposons within a genome
https://github.com/adamtaranto/te-dispersion-metric
Last synced: 3 days ago
JSON representation
Proposed metric for measuring the clustering of transposons within a genome
- Host: GitHub
- URL: https://github.com/adamtaranto/te-dispersion-metric
- Owner: Adamtaranto
- License: mit
- Created: 2017-09-14T03:05:54.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-09-19T04:56:09.000Z (over 7 years ago)
- Last Synced: 2024-11-21T02:33:38.087Z (2 months ago)
- Language: Python
- Size: 5.86 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TE-dispersion-metric
**Aim:**
To measure the degree to which repetitive elements are clustered or dispersed
within a chromosome. It has been widely observed that large genomic islands of
repetitive sequences (transposons etc) comprise isolate specific regions in
fungal genomes and often contain virulence determinants. Similarly, many fungal
pathogens possess accessory chromosomes which are predominantly composed of
repetitive elements.Currently, transposon content is crudely summarised as a proportion of the
chromosome or genome space. This measure is agnostic to the distribution of
repetitive sequences across the total genome space and is
therefore unable to differentiate between TE-poor genomes with a small number
of repeat-islands or accessory chromosomes and genomes with a diffuse
distribution of small repeats and lacking any isolated clusters.**Proposed metric:**
Within a chromosome - the mean distance from each repeat to all other repeats
weighted by target feature's contribution to the total feature space.```
## Chrom length
S = Total sequence length
## Distance between features scaled to chromosome length
d = distance between features / S
## Total space within S occupied by repeat features
T = Sum length of all transposons in S
## Weight of repeat feature as proportion of total repeat space T
w = feature-length / T
## Count of non-overlapping features in Chrom
n = number of discrete features
```Mean weighted distance from transposon i to all other features j
Mi = sum(d(i,j) * jw) / n - 1
Mean weighted distance across all features i
sum(Mi) / n
## Example implementation
Requirements:
- biopython
- pybedtools```bash
./TEDcalc.py --gff repeats.gff3 --genome genome.fa
```