{"id":30430491,"url":"https://github.com/shobrook/syntaxis","last_synced_at":"2025-08-22T18:22:32.422Z","repository":{"id":55096784,"uuid":"137977263","full_name":"shobrook/syntaxis","owner":"shobrook","description":"Analyze usage patterns of imported modules in a Python program","archived":false,"fork":false,"pushed_at":"2024-11-20T17:21:49.000Z","size":6538,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-08-17T04:33:30.144Z","etag":null,"topics":["abstract-syntax-tree","ast","python","tree-algorithms","trees"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shobrook.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-20T03:49:46.000Z","updated_at":"2025-01-28T01:36:49.000Z","dependencies_parsed_at":"2024-11-20T18:39:33.524Z","dependency_job_id":null,"html_url":"https://github.com/shobrook/syntaxis","commit_stats":null,"previous_names":["shobrook/syntaxis"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/shobrook/syntaxis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shobrook%2Fsyntaxis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shobrook%2Fsyntaxis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shobrook%2Fsyntaxis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shobrook%2Fsyntaxis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shobrook","download_url":"https://codeload.github.com/shobrook/syntaxis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shobrook%2Fsyntaxis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271680836,"owners_count":24802077,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-syntax-tree","ast","python","tree-algorithms","trees"],"created_at":"2025-08-22T18:22:30.693Z","updated_at":"2025-08-22T18:22:32.387Z","avatar_url":"https://github.com/shobrook.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# syntaxis\n\n\u003e Note: this tool used to be called saplings but has sinced been renamed to syntaxis. [Saplings now refers to an LLM agent library.](https://github.com/shobrook/saplings)\n\n`syntaxis` is a static analysis tool for Python. Given a program, `syntaxis` will build object hierarchies for every module imported in the program. Object hierarchies are dependency trees where the root node represents a module and each child represents an attribute of its parent. These can be useful for making inferences about a module's API, mining patterns in how a module is used, and [duck typing](https://en.wikipedia.org/wiki/Duck_typing).\n\n\u003cimg src=\"img/demo.gif\" /\u003e\n\n\u003c!-- This library also provides simple methods for calculating software metrics, including:\n\n- Halstead Metrics (Volume, Difficulty, Estimated Length, etc.)\n- Afferent and Efferent Couplings\n- Abstractness\n- Instability\n- Function Rankings\n- Cyclomatic Complexity\n- Maintainability Index --\u003e\n\n## Installation\n\n\u003e Requires Python 3.X.\n\nYou can install `syntaxis` with `pip`:\n\n```bash\n$ pip install syntaxis\n```\n\n## Usage\n\nUsing syntaxis takes only two steps. First, convert your input program into an [Abstract Syntax Tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree) using the `ast` module. Then, import the `Syntaxis` object and initialize it with the root node of the AST.\n\n```python\nimport ast\nfrom syntaxis import Syntaxis\n\nmy_program = open(\"path_to_your_program.py\", \"r\").read()\nprogram_ast = ast.parse(my_program)\nmy_syntaxis = Syntaxis(program_ast)\n```\n\nThat's it. To access the object hierarchies, simply call the `get_trees` method in your `Syntaxis` object, like so:\n\n```python\nmy_syntaxis.get_trees() # =\u003e [ObjectNode(), ObjectNode(), ..., ObjectNode()]\n```\n\nFor more advanced usage of the `Syntaxis` object, read the docstring [here]().\n\n### Printing an Object Hierarchy\n\n`get_trees` returns a list of `ObjectNode`s, each representing the root node of an object hierarchy and which has the following attributes:\n* **`name` _(str)_:** Name of the object\n* **`is_callable` _(bool)_:** Whether the object is callable (i.e. has `__call__` defined)\n* **`order` _(int)_:** Indicates the type of connection to the parent node (e.g. `0` is an attribute of the parent, `1` is an attribute of the output of the parent when called, etc.); `-1` if node is root\n* **`frequency` _(int)_:** Number of times the object is used in the program\n* **`children` _(list)_:** List of child nodes\n\nTo pretty-print a tree, simply pass its root node into the `render_tree` generator, like so:\n\n```python\nfrom syntaxis import render_tree\n\ntrees = my_syntaxis.get_trees()\nroot_node = trees[0]\nfor branches, node in render_tree(root_node):\n  print(f\"{branches}{node}\")\n```\n```\nnumpy (NC, -1)\n +-- random (NC, 0)\n |   +-- randn (C, 0)\n |       +-- __sub__ (C, 1)\n |       |   +-- shape (NC, 1)\n |       |   +-- __index__ (C, 1)\n |       +-- sum (C, 1)\n +-- matmul (C, 0)\n +-- expand_dims (C, 0)\n     +-- T (NC, 1)\n```\n\n(Here, `NC` means indicates a non-callable node and `C` a callable node. `-1`/`0`/`1` indicate the order of the node's connection to its parent).\n\nTo create a dictionary representation of a tree, pass its root node into the `dictify_tree` function, like so:\n\n```python\nfrom syntaxis import dictify_tree\n\ndictify_tree(root_node)\n```\n```python\n{\n  \"numpy\": {\n    \"is_callable\": False,\n    \"order\": -1,\n    \"frequency\": 1,\n    \"children\": [\n      {\"random\": ...},\n      {\"matmul\": ...},\n      {\"expand_dims\": ...}\n    ]\n  }\n}\n```\n\n### Interpreting the Object Hierarchy\n\nEach node is an _object_ and an object can either be _callable_ (i.e. has `__call__` defined) or _non-callable_. Links between nodes each have an _order_ –– a number which describes the relationship between a node and its parent. If a node is a 0th-order child of its parent object, then it's an attribute of that object. If it's a 1st-order child, then it's an attribute of the output of the parent object when it's called, and so on. For example:\n\n```python\nmy_parent = module.my_obj\n\nmy_parent.attr # attr is a 0th-order child of my_obj\nmy_parent().attr # attr is a 1st-order child of my_obj\nmy_parent()().attr # attr is a 2nd-order child of my_obj\n```\n\n#### What counts as a function?\n\nIn Python, subscripts, comparisons, and binary operations are all just syntactic sugar for function calls, and are treated by syntaxis as such. Here are some common \"translations:\"\n\n```python\nmy_obj['my_sub'] # =\u003e my_obj.__index__('my_sub')\nmy_obj + 10 # =\u003e my_obj.__add__(10)\nmy_obj == None # =\u003e my_obj.__eq__(None)\n```\n\n## Limitations\n\nSyntaxis _[statically analyzes](https://en.wikipedia.org/wiki/Static_program_analysis)_ the usage of a module in a program, meaning it doesn't actually execute any code. Instead, it traverses the program's AST and tracks \"object flow,\" i.e. how an object is passed through a program via variable assignments and calls of user-defined functions and classes. To demonstrate this idea, consider this example of [currying](https://en.wikipedia.org/wiki/Currying) and the tree syntaxis produces:\n\n```python\nimport torch\n\ndef compose(g, f):\n  def h(x):\n    return g(f(x))\n\n  return h\n\ndef F(x):\n  return x.T\n\ndef G(x):\n  return x.sum()\n\ncomposed_func = compose(F, G)\ncomposed_func(torch.tensor())\n```\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"25%\" src=\"img/currying.png\" /\u003e\n\u003c/p\u003e\n\nSyntaxis identifies `tensor` as an attribute of `torch`, then follows the object as it's passed into `composed_func`. Because syntaxis has an understanding of how `composed_func` is defined, it can analyze the object flow within the function and capture the `T` and `sum` sub-attributes.\n\nWhile syntaxis can track object flow through many complex paths in a program, I haven't tested every edge case, and there are some situations where syntaxis produces inaccurate trees. Below is a list of all the failure modes I'm aware of (and currently working on fixing). If you discover a bug or missing feature that isn't listed here, please [create an issue](https://github.com/shobrook/syntaxis/issues/new) for it.\n\n### Data Structures\n\nAs of right now, syntaxis can't track _assignments_ to comprehensions, generator expressions, dictionaries, lists, tuples, or sets. It can, however, track object flow _inside_ these data structures. For example, consider the following:\n\n```python\nimport numpy as np\n\nvectors = [np.array([0]), np.array([1]), np.array([2])]\nvectors[0].mean()\n```\n\nSyntaxis can capture `array` and add it to the `numpy` object hierarchy, but it cannot capture `mean`, and thus produces the following tree:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"25%\" src=\"img/data_structures.png\" /\u003e\n\u003c/p\u003e\n\nThis limitation can have some unexpected consequences. For example, functions that return multiple values with one `return` statement (e.g. `return a, b, c`) are actually returning tuples. Therefore, the output of those functions won't be tracked by syntaxis. The same logic applies to variable unpacking with `*` and `**`.\n\n### Control Flow\n\nHandling control flow is tricky. Tracking object flow in loops and conditionals requires making assumptions about what code actually executes. For example, consider the following:\n\n```python\nimport numpy as np\n\nfor x in np.array([]):\n  print(x.mean())\n```\n\nBecause syntaxis only does _static_ analysis and doesn't do type inference, it doesn't know that `np.array([])` is an empty list, and that therefore the loop never executes. In this situation, capturing `mean` and adding the `__index__ -\u003e mean` subtree to `numpy -\u003e array` would be a false positive, since `x` (i.e. the output of `np.array().__index__()`) is never defined. To handle this, syntaxis _should_ branch out and produce two possible trees for this module –– one that assumes the loop doesn't execute, and one that assumes it does:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"img/for_loop.png\" /\u003e\n\u003c/p\u003e\n\nBut as of right now, syntaxis will only produce the tree on the right –– that is, we assume the bodies of `for` loops are always executed (because they usually are).\n\nBelow are the assumptions syntaxis makes for other control flow elements.\n\n#### `while` loops\n\n`while` loops are processed under the same assumption as `for` loops –– that is, the body of the loop is assumed to execute.\n\n#### `if`/`else` blocks\n\nSyntaxis processes `if` and `else` blocks more conservatively than loops. It tracks object flow within these blocks but doesn't allow changes to the namespace to persist into the parent scope. For example, given:\n\n```python\nimport numpy as np\n\nX = np.array([1, 2, 3])\n\nif condition:\n  X = np.matrix([1, 2, 3])\nelse:\n  print(X.mean())\n  X = None\n  Y = np.array([1, 2, 3])\n\nprint(X.sum())\nprint(Y.max())\n```\n\nsyntaxis will produce the following tree:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"40%\" src=\"img/if_else_1.png\" /\u003e\n\u003c/p\u003e\n\nNotice how the value of `X` is unreliable since we don't know if `condition` is `True` or `False`. To handle this, syntaxis simply stops tracking any variable that's defined in the outer scope, like `X`, if it's modified inside an `if`/`else` block. Similarly, notice how there exists an execution path where `Y` is never defined and `Y.max()` throws an error. To handle this, syntaxis assumes that any variable defined inside an `if`/`else` block, such as `Y`, doesn't persist into the outer scope.\n\nBoth of these assumptions are made in attempt to reduce false positives and false negatives. But ideally, syntaxis would branch out and produce two separate trees for this module –– one that assumes the `if` block executes and another that assumes the `else` block executes, like so:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"65%\" src=\"img/if_else_2.png\" /\u003e\n\u003c/p\u003e\n\n#### `try`/`except` blocks\n\n`try`/`except` blocks are handled similarly to `if`/`else` blocks –– that is, changes to the namespace made in either block do not persist in the outer scope.\n\nNotably, `try` and `else` blocks are treated as a single block, since `else` is only executed if `try` executes without exceptions. And `finally` blocks are treated as separate from the control flow, since code in here always executes regardless of whether an exception is thrown.\n\n#### `return`, `break`, and `continue` statements\n\nAll code underneath a `return`, `break`, or `continue` statement is assumed not to execute and will not be analyzed. For example, consider this:\n\n```python\nimport numpy as np\n\nfor x in range(10):\n  y = np.array([x])\n  continue\n  y.mean()\n```\n\nIt may be the case that `mean` is actually an attribute of `np.array`, but syntaxis will not capture this since `y.mean()` is never executed.\n\nNotably, syntaxis doesn't apply this assumption to statements inside control flow blocks. For example, if the `continue` statement above was changed to:\n\n```python\nif condition:\n  continue\n```  \n\nThen `mean` _would_ be captured by syntaxis as an attribute of `np.array`.\n\n### Functions\n\n\u003c!--#### Conditional return types\n\n`syntaxis` can generally track module and user-defined functions, but there are some edge cases it cannot handle. For example, because module functions must be treated as black-boxes to `syntaxis`, conditional return types cannot be accounted for. Consider the following code and trees that syntaxis produces:\n\n```python\nimport my_module\n\nmy_module.foo(5).attr1()\nmy_module.foo(10).attr2()\n```\n\nHowever, suppose `my_module.foo` is defined in the backend as:\n\n```python\ndef foo(x):\n  if x \u003c= 5:\n    return ObjectA()\n  else:\n    return ObjectB()\n```\n\nand `ObjectB` doesn't have `attr1` as an attribute. Then, syntaxis will have incorrectly treated `attr1` and `attr2` as attributes of the same object.--\u003e\n\n#### Recursion\n\nSyntaxis cannot process recursive function calls. Consider the following example:\n\n```python\nimport some_module\n\ndef my_recursive_func(input):\n  if input \u003e 5:\n    return my_recursive_func(input - 1)\n  elif input \u003e 1:\n    return some_module.foo\n  else:\n    return some_module.bar\n\noutput = my_recursive_func(5)\noutput.attr()\n```\n\nWe know this function returns `some_module.foo`, but syntaxis cannot tell which base case is hit, and therefore can't track the output. To avoid false positives, we assume this function returns nothing, and thus `attr` will not be captured and added to the object hierarchy. The tree syntaxis produces is:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"35%\" src=\"img/recursion.png\" /\u003e\n\u003c/p\u003e\n\n#### Generators\n\nGenerators aren't processed as iterables. Instead, syntaxis ignores `yield`/`yield from` statements and treats the generator like a normal function. For example, given:\n\n```python\nimport some_module\n\ndef my_generator():\n  yield from some_module.some_items\n\nfor item in my_generator():\n  print(item.name)\n```\n\n`__index__ -\u003e name` won't be added as a subtree to `some_module -\u003e some_items`, and so the tree produced by syntaxis will look like this:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"35%\" src=\"img/generators.png\" /\u003e\n\u003c/p\u003e\n\nNotably, this limitation will only produce false negatives –– not false positives.\n\n#### Anonymous Functions\n\nWhile the _bodies_ of anonymous (`lambda`) functions are processed, object flow through assignments and calls of those functions is not tracked. For example, given:\n\n```python\nimport numpy as np\n\ntrans_diag = lambda x: np.diagonal(x.T)\ntrans_diag(np.random.randn(5, 5))\n```\n\nsyntaxis will produce the following tree:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"45%\" src=\"img/anonymous.png\" /\u003e\n\u003c/p\u003e\n\nNotice that `T` is not captured as an attribute of `numpy.random.randn`, but `diagonal` is captured as an attribute of `numpy`. This is because the body of the `lambda` function is processed by syntaxis, but the assignment to `trans_diag` is not recognized, and therefore the function call is not processed.\n\n### Classes\n\nSyntaxis can track object flow in static, class, and instance methods, getter and setter methods, class and instance variables, classes defined within classes, and class closures (i.e. functions that return classes). Notably, it can keep track of the state of each instance of a user-defined class. Consider the following program and the tree syntaxis produces:\n\n```python\nimport torch.nn as nn\nfrom torch import tensor\n\nclass Perceptron(nn.Module):\n  loss = None\n\n  def __init__(self, in_channels, out_channels):\n    super(Perceptron, self).__init__()\n    self.layer = nn.Linear(in_channels, out_channels)\n    self.output = Perceptron.create_output_layer()\n\n  @staticmethod\n  def create_output_layer():\n    def layer(x):\n      return x.mean()\n\n    return layer\n\n  @classmethod\n  def calculate_loss(cls, output, target):\n    cls.loss = output - target\n    return cls.loss\n\n  def __call__(self, x):\n    x = self.layer(x)\n    return self.output(x)\n\nmodel = Perceptron(1, 8)\noutput = model(tensor([10]))\nloss = Perceptron.calculate_loss(output, 8)\n```\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"img/class.png\" /\u003e\n\u003c/p\u003e\n\nWhile syntaxis can handle many common usage patterns for user-defined classes, such as the ones above, there are some things syntaxis can't handle yet. Below are all the limitations I'm aware of:\n\n#### Class Modifications\n\nIn the example above, calling the class method `Perceptron.calculate_loss` should change the value of the class variable `loss`. However, syntaxis cannot track modifications to a class when it's passed into a function. Syntaxis _can_ handle when a class is modified in the scope in which it was defined, like so:\n\n```python\nPerceptron.loss = tensor()\nPerceptron.loss.item()\n```\n\nHere, `item` would be captured and added to the tree as an attribute of `tensor`. But if the class is modified via an alias, like so:\n\n```python\nNeuralNet = Perceptron\nNeuralNet.loss = tensor()\nPerceptron.loss.item()\n```\n\nThen syntaxis won't capture `item`. Syntaxis also can't propagate class modifications to existing instances of the class. For example, continuing the code above:\n\n```python\nmodel = Perceptron(1, 8)\nPerceptron.loss = tensor()\nmodel.loss.item()\n```\n\nBecause the change to `loss`, a class variable, won't propagate to `model`, an instance of `Perceptron`, `item` won't be captured as an attribute of `tensor`.\n\n#### Inheritance\n\nSyntaxis cannot recognize inherited methods or variables in user-defined classes. For example, given:\n\n```python\nimport some_module\n\nclass MyClass(module.Foo):\n  def __init__(self, x):\n    self.bar(x)\n```\n\nsyntaxis will not recognize `bar` as an attribute of `module.Foo`, despite `bar` being an inherited method. This limitation also holds true when the base class is user-defined.\n\n#### Metaclasses\n\nOnce I learn what metaclasses actually are and how to use them, I'll get around to handling them in syntaxis. But for now this is on the bottom of my to-do list since 99.9% of Python users also don't know what the hell metaclasses are.\n\n### Miscellaneous\n\n#### `global` and `nonlocal` statements\n\n`global` statement are used inside functions to declare a variable to be in the global namespace. But syntaxis doesn't recognize these statements and change the namespace accordingly. For example, given:\n\n```python\nimport some_module\n\nmy_var = some_module.foo\n\ndef my_func():\n  global my_var\n  my_var = None\n\nmy_func()\nmy_var.bar()\n```\n\nsyntaxis will produce a tree with `bar` as an attribute of `foo`. This would be a false positive since calling `my_func` sets `my_var` to `None`, and of course `None` doesn't have `bar` as an attribute.\n\n`nonlocal` statements are similar to `global`s, except they allow you to modify variables declared in outer scopes. And like `global`s, syntaxis doesn't recognize `nonlocal` statements.\n\n#### Built-in functions\n\nNone of Python's [built-in functions](https://docs.python.org/3/library/functions.html) are recognized by syntaxis. For example, consider the `enumerate` function:\n\n```python\nimport some_module\n\nfor index, item in enumerate(some_module.items):\n  print(item.some_attr)\n```\n\nsyntaxis won't capture `attr` as an attribute of `some_module.items.__iter__`, which it would have if `some_module.items` wasn't wrapped by `enumerate`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshobrook%2Fsyntaxis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshobrook%2Fsyntaxis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshobrook%2Fsyntaxis/lists"}