{"id":16511356,"url":"https://github.com/pbenner/tfbayes","last_synced_at":"2026-02-09T20:10:37.001Z","repository":{"id":8383097,"uuid":"9956642","full_name":"pbenner/tfbayes","owner":"pbenner","description":"Bayesian analysis of ChIP-Seq data for the identification of transcription factor binding sites.","archived":false,"fork":false,"pushed_at":"2016-09-01T12:14:29.000Z","size":37374,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-08T04:58:30.792Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Mathematica","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-05-09T09:50:57.000Z","updated_at":"2023-07-07T19:11:07.000Z","dependencies_parsed_at":"2022-08-24T17:50:26.552Z","dependency_job_id":null,"html_url":"https://github.com/pbenner/tfbayes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pbenner/tfbayes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Ftfbayes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Ftfbayes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Ftfbayes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Ftfbayes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/tfbayes/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Ftfbayes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29279383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T19:05:41.198Z","status":"ssl_error","status_checked_at":"2026-02-09T19:05:37.449Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:59:54.450Z","updated_at":"2026-02-09T20:10:36.986Z","avatar_url":"https://github.com/pbenner.png","language":"Mathematica","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Documentation\n\nPlease read our [paper](http://arxiv.org/abs/1305.3692) on inference of phylogenetic trees.\n\n## Configuration of local installations\n\nIt is necessary to export some environment variables for a local installation. Here is an example:\n\n\texport PATH=$HOME/.usr/bin:$PATH\n\texport CPATH=$HOME/.usr/include\n\texport LD_LIBRARY_PATH=$HOME/.usr/lib:$LD_LIBRARY_PATH\n\texport LIBRARY_PATH=$HOME/.usr/lib:$LIBRARY_PATH\n\texport MANPATH=$HOME/.usr/share/man:$MANPATH\n\texport PYTHONPATH=$HOME/.usr/lib/python2.7/site-packages/:$PYTHONPATH\n\nwhere of course the system's python version has to be used. The definitions can for instance be placed in the local *.profile* or *.bash_profile*.\n\n## Requirements\n\nThe following libraries are required for tfbayes:\n\n\tboost (\u003e= 1.48)\n\tboost_python\n\tboost_system\n\tboost_serialization\n\tboost_thread\n\tboost_regex\n\tglpk\n\tgsl\n\tpthread\n\nRequirements for parsing phylogenetic trees in newick format:\n\n\tbison (\u003e= 2.7)\n\tflex\n\nSome of the scripts are written in *python* and require:\n\n\tbiopython\n\tnumpy\n\tmatplotlib\n\n## Installation\n\nFirst create all autoconf and automake files with\n\n\tautoreconf\n\nFor local installations to *$HOME/.usr* use\n\n\t./configure --prefix=$HOME/.usr\n\nand otherwise simply\n\n\t./configure\n\nThe preferred compiler is *clang*, to use it type\n\n\tCXX=clang++ ./configure\n\nNow the source can be compiled and installed with\n\n\tmake\n\tmake install\n\n### Link time optimization (LTO)\n\nLTO can significantly improve the performance of TFBayes. It is recommended to use it with *clang* and *clang++*. It is disabled by default, to switch it on use\n\n\tCXX=clang++ ./configure --enable-lto\n\n### Known errors\n\n * **error: unknown type name '__extern_always_inline'**: The macro *__extern_always_inline* may not always be defined and in this case causes an error. If this happens it es necessary to declare *CXXFLAGS='-D__extern_always_inline=inline'*.\n * **no archive symbol table (run ranlib)**: Your linker is not using the LLVMgold plugin. Either you are not using the gold linker or the plugin is not found.\n\n## Example: Phylogenetic tree inference\n\nThe *data* directory contains some data sets for phylogenetic tree inference. We consider the *a subsequence of the MT-RNR2 alignment* for this example. The following command runs *10* Markov chains in parallel, each generates *10000* tree samples:\n\n\ttfbayes-treespace-sampler --steps=10000 --chains=10 --save-posterior=test.posterior.dat metropolis-hastings data/trees/ucsc-hg19-multiz46.nh data/alignments/ucsc-hg19-multiz46-U25123.fa | gzip -f \u003e test.nh.gz\n\nThe file *test.posterior.dat* contains the (unnormalized) posterior values for the samples. It is formatted such that it can be easily read and plotted with *GNU-R*. To compute the posterior expectation use\n\n\tzcat test.nh.gz | tfbayes-treespace-estimate -n 100 -r -f -d 8000 --verbose=3 mean \u003e test.mean.nh\n\nwhich drops the first *8000* samples and performs *100* iterations of the algorithm. Similarly, use\n\n\tzcat test.nh.gz | tfbayes-treespace-estimate -n 100 -r -d 8000 --verbose=3 median \u003e test.median.nh\n\nto compute the median. The convergence should be checked carefully! If the set of samples is large it might be a good idea to increase the step size parameter (e.g. *-s 1000*). To compare the result with the majority rule consensus tree use\n\n\tzcat test.nh.gz | tfbayes-treespace-estimate -d 8000 -v majority-consensus\n\nWith *tfbayes-treespace-histogram* several summarizing statistics can be computed. For instance, to obtain a histogram of tree topologies use\n\n\tzcat test.nh.gz | tfbayes-treespace-histogram -d 8000 topology \u003e test.topology.dat\n\nwhich can be visuablized with R\n\n\t\u003e attach(read.table(\"test.topology.dat\", header=T))\n\t\u003e hist(topology)\n\nThe topologies are sorted according to their frequencies. With\n\n\tzcat test.nh.gz | tfbayes-treespace-histogram -d 8000 edges \u003e test.edges.dat\n\na table of edge lengths is printed, which can be visualized with\n\n\thist.edges \u003c- function(t, s1, s2, from=-0.2, to=0.2, n=50, main=\"\", ...)\n\t{\n\t  x \u003c- c(-t[[s1]], t[[s2]])\n\t  x \u003c- x[x \u003e from \u0026 x \u003c to]\n\t  hist(x, breaks=seq(from=from,to=to, length.out=n), freq=F,\n\t       ylab=\"Density estimate\", main=main, ...)\n\t  lines(density(x,  na.rm=T, adjust=2))\n\t}\n\tt \u003c- read.table(\"test.edges.dat\", header=T)\n\thist.edges(t, \"s14\", \"s15\")\n\nThe histogram shows edge lengths of split *s14* as negative values and lengths of split *s15* as positive values. Split identifiers are declared in the header of *test.edges.dat*. After learning a tree it can be used to generate alignments. By comparing generated alignments to some real data one may assess the goodness of the learned tree. For instance, use\n\n\ttfbayes-generate-alignment -a 0.2:0.2:0.2:0.2:0.2 simple test.mean.nh\n\nto generate a conserved region\n\n\tochPri2: TCAATACGAG-C-CCACA--GCG--GC-T--GGCTTGCAA-CAA-CA-ATACC-AGTAAGCCGA-TCT-AGGT-G-GACATTCCAGCT-GTCACT-ATAG\n\tcalJac1: TCAAT-CGTGA--CGAC---GCGT-GCTTAGGGCTTACAA-TTA-CTGA-GCC-AAAAGGCCGA-TCACAGCA-GTCACATTC-AGCTCGTAAGA-GTCG\n\tpapHam1: TCAAT-CGAGA--CGACG-CGCGT-GCTT-GGGCTTACAA-TTA-CT-A--CA-CGAAGGCCGA-TCACAGCA-G-GCCACTCC-GCTCGTCACT-ATCG\n\tpanTro2: TCAAT-CGAGA--CGACAG-GCGT-GATT--GGCTTTCAA-CTA-CT-A-ACACCGAAGGCCGA-TCA-ATCA-G-GACATTCCAGCT-GTCACT-ATAG\n\t   hg19: TCAAT-CGAGA--C-ACA--GCGT-GCTT--GGCTTTCAA-CTA-CT-A-GCACCGAAGGCCGA-TCA-ATCA-G-CCCATTCCAGCT-GTCACT-ATAG\n\tgorGor1: TCAATCCGAGA--CGACAG-GCGT-GCTT-GGGCTTTCAA-CTA-CT-A-ACACCGAAGGCCGA-TCA-ATCA-G-GCCATTCCAGCTCGTCACT-ATAG\n\tponAbe2: TCAAT-CGAGA--CGACA--GCG--GCTT--GGCTTTCAA-CAA-CT-A-ACACCGAAGGCCGA-TCACAGCA-GGGCCATTCCAGCT-GTCACT-ATAG\n\t    mm9: TCAATACGAGAC-CGAAA--GCGTTGCTT--GGCTATC-A-CAA-CT-G--CACTGAAGGCCGA-TCA-AGCTCG-CACATTCCAGCT-GTCACT-ATAG\n\t    rn4: TCAATCCGAGAC-CGACA--GCGT-GC-T--GGCTATC-A-CAA-CT-GGACACCGAAGGCCGA-TCACAGCTCG-CACATTC-AGCT-GTCACT-ATTG\n\tdipOrd1: ACAAT-CGAGAC-CGAAA--GCGT-GC-T--GGCTTTTAG-CAA-CT-GTGCCCCGAACGCCTA-TCA-AGCT-G-GACATTCC-GCT-GT-ACT-ATAG\n\tcavPor3: ACAAT-CGAG-C-CGACA--G-GT-GC-T--TGCTTTCAA--TA-CTGG-ACACCGAAGGCCGAGTCACCCCT-G-GACAGTCGAGCTCGTCACT-GTAG\n\tspeTri1: TCAAT-CGAGAC-CGACA--GCCT-GCTT--TGCTTACAA-CTA-CT-G-ACACCGAAGGACGA-TCA-AGCT-GGGAAATTCCAGCT-GTCACT-ATAG\n\toryCun2: TCAATACGAGAC-CGACA--G-GT-GATT--GGCTTGCAA-CAA-CA-G-ACACAGAACGCCTA-TCT-AGCT-GGGACATTCCAGCT-GTCAC--ATAG\n\nOn the other hand, the command\n\n\ttfbayes-generate-alignment -a 10:10:10:10:20 simple test.mean.nh\n\ngenerates a less conserved region with plenty of gaps, i.e.\n\n\tochPri2: --T-G-AG--GC-T-T-CC-CACGAAAGA-CCT-GTA-T-CTCGGTGG-GA--CTGGTGA-AA---C--CCAG-GT--G-CT-G-AA--CAGGATAC--C\n\tcalJac1: C-AACC-CT----T--G-T---CC--AGC-GCC-TT-G--AGAGGG----G--C-GAAGA-TGT-GA-GAGAG-GTT-G-GGCGTAA-TTCAG-CA-G--\n\tpapHam1: C-AACAATT-C--G-GGAT-A-CCA-TGC--C-CTTAG-GA--GGGGG-G--AC-GTAGA-GAT--CCGAGAG-GTT-T-GGTGT-A-TCAA--CA---T\n\tpanTro2: T-AACAGGT-GC-G-GGA--AACCA-TGC-----GT-G-GAT-GGGGG-GAAAC-GGAGA-AAG--CCGCGACGGTT-T-GGAGT-A-TCAT--CA---T\n\t   hg19: T-AACAGGT-G--G-GGA--AACCA-TGC-----GT-G-GAT-GGGCG-GAAAC-GGAGA-AAG--CC-CGACGGTT-T-GGAGT-A-TCAG--CA---T\n\tgorGor1: T-CACAGGT-G--G-GGA--AACCA-T-C-----GT-G-GAT-GGGGG-GAAAC-GGAGA-A-G--CCGCGACGGTA-T-GG-GT-A-TCAA--CA---T\n\tponAbe2: T--ACAGGT-G--G-GGA--CACCA-TGC--C--TT-G-GAT-GGCAG-GAAAC-GGAGA-AA---CCGCGAC-GTT-T-GGTGT-A-TCAA--CA----\n\t    mm9: ATAAG-G---G--TGT-TCG-AACAAAGC--C---T-GT-CTCTGCAG-G--ACTGC-GA-AA--GATGCCAG-GT--G-C-T---TC-CAGG-TA--TT\n\t    rn4: -TAAG-G-T-G--T-TGCCCCAGCAAGGC--C-----GT-CTCAGGAG-GC-ACT-C-GA-AA--G-TGCCAG-GT--G-C-T---G--CAGG-TA--TT\n\tdipOrd1: A-AA--GGCGG-C--TGCCGATGCTAAGC--C--CT-GTTC-CAA-GG-GA-TCCGGAGA-AA--G-CG-CAG-GT----C-TG-GA--CAG-CTAC-TT\n\tcavPor3: --AGG-GGT-GC-G--GCC-CA-CTAGCCA-C---T-G--CTCAGGGG----ACTGTAGAGAA--G-CTC-AGGGG--G-C-AGA--A-CAGG-TTC--T\n\tspeTri1: ---GGCG--------T-CG-CAGATAAGC--CC-GT-GT-CTCGG-GG-GG-ACTG-AGACGAA---TACCCC-GT--G-C-TG--AA-CATG-TTC-TT\n\toryCun2: ---AGGAGA-G--T-T-CC-CACCGACGC-GCG--CTGT-CTCGGCGA-GA-ACTGGTGA-GA---C-GACAG-GT--G-CT-A-AA-T-AGGATAC-TC\n\n\n## Example: ChIP-Seq data analysis\n\nSequences from a ChIP-Seq experiment must be available in *maf* or *mfa* format. In a first step, the training data is preprocessed. For each ChIP-Seq peak in our target species (e.g. DroMel) we are given the nucleotide sequence around this location as a multiple sequence alignment. The purpose of the analysis is to find the motif for our target species and we regard the sequences of all other species as additional information. Therefore, we first remove all columns in the alignment where the target species has a gap ('-'):\n\n\ttfbayes-preprocess-alignment -v -s DroMel -m 50 training-set.orig.maf \u003e training-set.filtered.maf\n\nIn addition, the command masks all sites in a sequence as missing data ('N') if more than 50 consecutive gaps appear. The filtered data is then used to compute the phylogenetic approximation:\n\n\ttfbayes-approximate -v $(PHYLOTREE) \u003c training-set.filtered.maf \u003e training-set.approximation.fa\n\nThe sampler requires the alignment data in *fasta* format, we convert the *maf* file with\n\n\ttfbayes-maf-to-fasta training-set.filtered.maf training-set.filtered.fa\n\nBefore running the sampler, we need to specify a configuration file (*training-set.cfg*):\n\n\t[TFBS-Sampler]\n\talignment-file    = training-set.filtered.fa\n\tphylogenetic-file = training-set.approximation.fa\n\tsave = training-set.result\n\tsocket-file = training-set.srv\n\tprocess-prior = pitman-yor process\n\tsamples = 1000:100\n\tpopulation-size = 4\n\talpha = 10\n\tdiscount = 0.0\n\tlambda = 0.00000000000001\n\tinitial-temperature = 5\n\ttfbs-length = 10\n\tbackground-model = independence-dirichlet\n\tbackground-alpha =\n\t                 10.0\n\t                 10.0\n\t                 10.0\n\t                 10.0\n\t                 10.0\n\tbaseline-priors = baseline-default\n\tbaseline-default =\n\t                 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200\n\t                 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200\n\t                 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200\n\t                 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200 0.200\n\t                 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100\n\nThe sampler is executed with\n\n\ttfbayes-sampler training-set.cfg\n\nwhich first generates a sequence of 100 burnin samples with temperature greater one and afterwards starts the actual MCMC simulation. The sampler runs 4 Markov chains in parallel, each generates a set of 1000 samples. Once the sampling process has finished, we may plot the posterior probabilities, number of clusters and temperature with\n\n\ttfbayes-plot training-set.cfg\n\nA point estimate (i.e. map, mean or median) is computed with the *tfbayes-estimate* command. The computation of the mean and median might take a while and it might be reasonable to only take a subset of the posterior samples, i.e.\n\n\ttfbayes-estimate -v --take=1000 mean training-set.cfg\n\nA point estimate can be converted to a logo with\n\n\ttfbayes-partition -v -j mean training-set.cfg\n\nwhich generates a *training-set.pdf* that contains a motif for each cluster (requires *pdftk*).\n\n## Alignment gaps\n\nThe library supports two ways of handling alignment gaps. Which one is used is coded in the alignment data:\n\n+ 'N': The gap is considered as missing data, which means that a nucleotide should be present at this position, but we simply do not know which one (wildcard). It is equivalent to treating the species as if it was not present in the data set, i.e. the species is removed from the phylogenetic tree. An 'N' is commonly used by repeat masking software.\n+ '-': The gap is interpreted as an additional character in the alphabet (i.e. a fifth nucleotide). Note that if this is not used in the alignment, the prior counts for this character should be set to zero.\n\n## Newick format\n\nTFBayes uses the following grammar to parse trees in newick format:\n\n\ttree_list -\u003e tree_list tree \";\"\n\ttree_list -\u003e tree \";\"\n\ttree      -\u003e \"(\" node_list \",\" outgroup \")\"\n\tnode_list -\u003e node_list \",\" node\n\tnode_list -\u003e node\n\tnode      -\u003e \"(\" node_list \"):\" distance\n\tnode      -\u003e name \":\" distance\n\toutgroup  -\u003e name \":\" distance\n\tname      -\u003e [a-zA-Z_][a-zA-Z0-9_]*\n\tdistance  -\u003e -?{[0-9]}+(\".\"{[0-9]}*)?\n\nThe grammar shows that trees are not required to have binary branching points. However, thee root is expected to have at least three nodes attached to it, i.e. trees are required to have the structure of unrooted trees. The last node attached to the root is required to be a leaf, similar to the convention of MrBayes. Internal edges do not have labels in TFBayes. A valid tree is for instance\n\n\t((speTri1:0.322352,cavPor3:0.294901):0.117009,(dipOrd1:0.396332,mm9:0.243578):0.135282,ochPri2:0.340420);\n\nor\n\n\t(speTri1:0.322352,cavPor3:0.294901,(dipOrd1:0.396332,mm9:0.243578):0.135282,ochPri2:0.340420);\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Ftfbayes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbenner%2Ftfbayes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Ftfbayes/lists"}