{"id":27247567,"url":"https://github.com/etetoolkit/treematcher","last_synced_at":"2025-06-16T19:06:16.912Z","repository":{"id":100151371,"uuid":"53578523","full_name":"etetoolkit/treematcher","owner":"etetoolkit","description":"Search flexible patterns within tree structures using regular expression like syntax.","archived":false,"fork":false,"pushed_at":"2017-09-22T06:14:28.000Z","size":616,"stargazers_count":10,"open_issues_count":3,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-10T22:58:02.609Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/etetoolkit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-03-10T11:15:11.000Z","updated_at":"2025-03-24T21:29:26.000Z","dependencies_parsed_at":"2023-05-12T14:00:33.258Z","dependency_job_id":null,"html_url":"https://github.com/etetoolkit/treematcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/etetoolkit/treematcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etetoolkit%2Ftreematcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etetoolkit%2Ftreematcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etetoolkit%2Ftreematcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etetoolkit%2Ftreematcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/etetoolkit","download_url":"https://codeload.github.com/etetoolkit/treematcher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etetoolkit%2Ftreematcher/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260221329,"owners_count":22976859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-10T22:58:00.861Z","updated_at":"2025-06-16T19:06:16.896Z","avatar_url":"https://github.com/etetoolkit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TreeMatcher: A new tool for creating Python-based queries on trees\n\nTreeMatcher is a subproject of ete3.\nTreeMatcher will be merged with ete3. Until then treematcher have to be in PYTHONPATH environmental variable.\n\nInsert that line to `~/.bashrc` file so it can be loaded automatically on every login\nin case you use bash shell.\n\n```\npath_to_treematcher='path_to_treematcher_repository'\nexport PYTHONPATH=$PYTHONPATH\":${path_to_treematcher}\"\n```\n\n### Program Description\n\nIn mathematics, a standard way of representing graphical trees with edge lengths is the Newick format. The TreeMatcher module extends the Newick format to define a tree pattern and includes rules and filters with a Python-based vocabulary. These patterns are then searched for using a tree traversal algorithm. A pattern can be written by accessing the attributes and functions available to an ETE Tree (see Tree and PhyloTree classes), using Python code directly, or through custom functions and syntax.\n\n### How to use treematcher\n\nThe simplest way to begin using treematcher is to create a pattern on a single node. In the following example, a string defines the pattern and a TreePattern instance is created. If an attribute is not specified, the node name is assumed by default.\n\n```\n# Example 1: Find a node named \"sample_1\"\npattern1 = ' sample_1 ; '\t # begin with a string\npattern1 = TreePattern(pattern1)  # create a TreePattern Instance\n\n```\n\nNow that you know how to search for the name of a single node, you may be tempted to access other nodes through constraints like @.children[0].name==\"sample_1\" and @.children[1].name==\"sample_2\" but calling a node's descendants in this way restricts the order in which they are considered a match. For example, the permutation @.children[0].name==\"sample_2\" and @.children[1].name==\"sample_1\" would not be returned as a match. Using the Newick format ensures that both permutations of children are matched.\n\n\nNote that the format type is set to 1 as the default which does not allow internal node names. Access other Newick format types using the format argument.\n\n```\n# Example 2: Find a tree where sample_1 and sample_2 are siblings under ancestor_a\ntree = Tree(\"((sample_1,sample_2)ancestor_a,(sample_1,sample_2)ancestor_b)root;\", format = 8)\npattern2 = TreePattern(' (sample_1, sample_2)ancestor_a ; ', format=8)\n```\n\n### Quoted node names and the node symbol @\n[More about quoted_node_names](sdoc/tutorial/tutorial_treematcher.md#quoted-node-names-and-the-node-symbol-)\n\nIn order to differentiate the parentheses of a function call from the parentheses defining Newick structure, quoted node names are used.\nThat means simply that you enclose each node name in quotes.\nIn order to access a method on a node, use the @ symbol to represent the node.\n\nBe sure that these quotes are different from those of the overall pattern definition.\nFor example:\n`TreePattern(\"\"\" ('the_quoted_pattern'); \"\"\", quoted_node_names=True ) `\nand not\n`TreePattern(\" (\"the_quoted_pattern\") ;\", quoted_node_names=True)`\n\n\n```\n\n# Example 3: Find a tree where sample_2 and another leaf are siblings where a leaf is determined by number of children.\npattern2 = TreePattern(\"\"\" ('len(@.children)==0', 'sample_2')ancestor_a ; \"\"\", quoted_node_names=True)\n\n# Example 4: Find a tree where sample_1 and another leaf are siblings by accessing the the is_leaf method.\npattern3 = TreePattern(\"\"\" ('sample_1', '@.is_leaf()')ancestor_a ; \"\"\", quoted_node_names=True)\n```\n\n### Relax matches\n[More about relax matches](sdoc/tutorial/tutorial_treematcher.md#relax_patterns)\n\nTreematcher allows to test against relax matched patterns.\n\nBy setting ` ~ ` as ancestor you enable the loose connection ability.\nLoose connection means that the ` ~ ` children may connect\nloosely via any number of intermediate nodes.\n\n```\n# example 5: test if tips A, B, C exists in the same tree\npattern5 = TreePattern(\" (A, B, C)~ ;\")\n\n#example 6: test if (A, B) and (C, D) are conected via any number of nodes\npattern6 = TreePattern(\" ((A,B)~), ((C,D)~) ;\")\n\n```\n\nYou can test for a relax number of children too.\nYou can use:\n` * `, ` + ` and ` {min, max} `\nThey borrow their meaning from regular expressions.\n\n```\n#example 7: zero or more ocuurances of a node\nTreePattern( \" ('A', '@.dist \u003e 0.5*')'~' ;\", quoted_node_names=True)\n\n#example 8: exact number\nTreePattern(\" ('A', '@.dist \u003e 0,5{2, 5}') ;\", quoted_node_names=True)\n\n```\n\n\n### To Run\nTo run, use the find_match() function. By default, find_match will look for one match.\nTo find the number of matches returned, use len().\n\n```\n\n# Example 9: Find the parent node of the siblings sample_1 and sample_2\ntree = Tree(\"((sample_1,sample_2)ancestor_a,(sample_1,sample_2)ancestor_b)root;\", format = 8)\npattern = TreePattern(' (sample_1) ; ', quoted_node_names=False)\nsolution = list(pattern.find_match(tree))\nprint(\"The number of solutions are: \", len(solution))\n\n```\n\nFor more details on how to use treematcher read the tutorial.\n[Treematcher tutorial](sdoc/tutorial/tutorial_treematcher.md)\n\n### Advanced Topics\n\nTo make treematcher perform faster, break complex patterns into smaller searches. If conditional statements are used, try putting the part of the search that you think will be faster first.\n\n####  Custom Functions\nYou can use your own custom functions and syntax in treematcher.  In the following example, a custom function is created in a custom class called MySyntax.\n\n```\n# Example 8: Expanding vocabulary\nclass MySyntax(PatternSyntax):\n\tdef my_nice_function(self, node):\n\t\treturn node.species == 'Chimp'\n\nmy_syntax = MySyntax()\n\npattern = \"\"\" 'my_nice_function(@)'; \"\"\"\nt_pattern = TreePattern(pattern, syntax=my_syntax)\nfor match in t_pattern.find_match(t):\n\tprint(list(match))\n\n```\n\n### Command line tool\n\nete_search is the command line interface to treematcher. Using ete_search you can run multiple\npattern comparisons to multiple trees using files or text and retrieve some basic statistics and\nsave results for later use.\n\n[Tutorial on command line tool](sdoc/tutorial/tutorial_treematcher.rst#ete_search-command-line-tool)\n\nexamples:\nRead patterns from a file called MyPatterns.txt and apply to each tree in MyTargetTrees.txt, output the results of each pattern in separate files called treematches0.txt, treematches1.txt, etc\nIf there is only one pattern, the result file will not be numbered.\n\n`python -m treematcher.tools.ete_search --pattern_tree_list \"MyPattern.txt\" --tree_format 8 --src_tree_list \"MyTargetTrees.txt\" -o treematches.txt `\n\nProvide the pattern and tree as strings and print the result to the terminal.\n`python -m treematcher.tools.ete_search -p \"(e,d);\" --tree_format 8 -t \"(c,(d,e)b)a;\" `\n\n\nCount how many trees matches a pattern from a list of trees.\n` python -m treematcher.tools.ete_search -p \"(the, pattern)\" --src_tree_list trees.file --root | wc -l`\n\n\nThe render option will save each match as an image. If there are multiple patterns, numbers will be used to designate each pattern starting from 0.\nIf there are multiple matches, and underscore is used with a number for each match starting with 0. If I had two\n\n`python -m treematcher.tools.ete_search --pattern_tree_list \"MyPatterns.txt\" --tree_format 8 --src_tree_list \"MyTargetTrees.txt\" --render treematches.png `\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fetetoolkit%2Ftreematcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fetetoolkit%2Ftreematcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fetetoolkit%2Ftreematcher/lists"}