Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sammyjava/pdcut

Routines for the PDcut project
https://github.com/sammyjava/pdcut

Last synced: 1 day ago
JSON representation

Routines for the PDcut project

Host: GitHub
URL: https://github.com/sammyjava/pdcut
Owner: sammyjava
Created: 2022-02-01T17:23:47.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-03-08T18:19:34.000Z (almost 3 years ago)
Last Synced: 2024-11-11T09:50:13.531Z (2 months ago)
Language: Shell
Size: 13.6 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PDcut routines

## reformat-pho
This routine reformats the orthogroups in the `N0.tsv` file into a tab-delimited row format with only protein names.
For example, if you ran OrthoFinder in the At_Mus directory (containing the FASTAs), you'll get the following file:
```
At_Mus/OrthoFinder/Results_Feb01/Phylogenetic_Hierarchical_Orthogroups/N0.tsv
```
In this case, run reformat-pho as follows:
```
$ cd At_Mus/OrthoFinder/Results_Feb01/
$ ../../../reformat-pho > pho.tsv
```
This will create a file `pho.tsv` which contains the orthogroup identifiers, e.g.
```
HOG OG Gene Tree Parent Clade Arabidopsis_thaliana.TAIR10.pep.all Mus_musculus-Cilia_proteome
N0.HOG0000000 OG0000000 n0 AT5G09950.2 AT5G09950.3 AT5G09950.1 AT4G33170.1 AT1G16480.1 ...
```
Note that the top lines of this file are *very long* because they contain large At orthogroups. Groups that contain Mus genes are further down in the file.