{"id":16863597,"url":"https://github.com/vbrazo/data-structures-archives","last_synced_at":"2025-03-18T16:37:43.149Z","repository":{"id":50761644,"uuid":"319854358","full_name":"vbrazo/data-structures-archives","owner":"vbrazo","description":"My data structures archives for an engineering management journey","archived":false,"fork":false,"pushed_at":"2021-06-05T18:46:43.000Z","size":236,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-24T20:29:58.245Z","etag":null,"topics":["algorithms","data-structures","tech-interviews"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vbrazo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-09T05:41:47.000Z","updated_at":"2024-03-28T04:58:17.000Z","dependencies_parsed_at":"2022-09-10T18:03:00.620Z","dependency_job_id":null,"html_url":"https://github.com/vbrazo/data-structures-archives","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fdata-structures-archives","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fdata-structures-archives/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fdata-structures-archives/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fdata-structures-archives/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vbrazo","download_url":"https://codeload.github.com/vbrazo/data-structures-archives/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244261197,"owners_count":20424897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","data-structures","tech-interviews"],"created_at":"2024-10-13T14:39:05.629Z","updated_at":"2025-03-18T16:37:43.122Z","avatar_url":"https://github.com/vbrazo.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Structures Archives\n\nThis is my personal data structures archives and it's where I store my data structures research that aims to provide resources to better interview developers in my engineering management journey.\n\n- [Data Structures](#data-structures)\n  - [Arrays](#arrays)\n  - [Strings](#strings)\n  - [Linked Lists](#linked-lists)\n  - [Stack and Queues](#stack-and-queues)\n  - [Hash Tables](#hash-tables)\n    - [Hash Set](#hash-set)\n    - [Hash Map](#hash-map)\n    - [Performance of Hash Table](#performance-of-hash-table)\n      - [Complexity Analysis](#complexity-analysis)\n      - [The Principle of Built-in Hash Table](#the-principle-of-built-in-hash-table)\n  - [Trees](#trees)\n  - [Binary Search Trees](#binary-search-trees)\n    - [Height-Balanced BST](#height-balanced-bst)\n      - [What is a Height-Balanced BST?](#what-is-a-height-balanced-bst)\n      - [Why Using a Height-Balanced BST?](#why-using-a-height-balanced-bst)\n      - [How to Implement a Height-Balanced BST?](#how-to-implement-a-height-balanced-bst)\n  - [N-ary Trees](#n-ary-trees)\n    - [Top-down solution](#top-down-solution)\n    - [Bottom-up solution](#bottom-up-solution)\n  - [Tries](#tries)\n    - [Performance of a Trie](#performance-of-a-trie)\n      - [Time complexity](#time-complexity)\n      - [Space complexity](#space-complexity)\n    - [How to represent a Trie](#how-to-represent-a-trie)\n      - [Array](#array)\n      - [Hash Map](#hash-map)\n        - [Complexity Analysis Comparison](#complexity-analysis-comparison)\n  - [Heaps](#heaps)\n  - [Graphs](#graphs)\n\n# Data Structures\n\n![data-structures/data-structures.png](data_structures.png)\n\n## Arrays\n\nArrays are without a doubt the most fundamental data structure in computer science. Under the hood, an array is represented as a fixed-size, contiguous block of memory with `O(1)` time to store and access an element. Because of this efficiency, many other data structures frequently use arrays for their implementation, such as strings, stacks, queues, and hash tables.\n\nYou can picture an array as a bounded row of labeled containers, starting at 0, where you can quickly put items in, take them out, or look up a value from an index (or label).\n\nArrays do have a few limitations. Looking up an element up by value typically requires an entire traversal of the array, unless it is sorted in some way. Deleting an element from an array means that all subsequent elements have to be shifted left by one, leading to an `O(n)` time operation. If possible, it is better to overwrite the value. Similarly, inserting an element early in the array requires the rest of the elements to be shifted right, so this should be done sparingly.\n\nFinally, arrays have a fixed bound, which means that they may not be suitable for applications where the size of the collection of elements is not known ahead of time. In an interview setting, you should be careful of off-by-one errors that lead to trying to access an element outside the range of the array.\n\nPython does not have native support for arrays; typically, you'll use the `list` data structure, which dynamically resizes under the hood. What this means is that you, the developer, it seems like the list is unbounded. In reality, as the list grows, the data structure may allocate a larger (typically twice the current size) array, copy of its elements to the larger one, and then use that as the underlying array.\n\n## Strings\n\nStrings are unavoidable part of programming. Every word in this sentence can be considered a string!\n\nBehind the scenes, the contents of a string are typically sorted in a read-only sequential array in memory, meaning that strings are immutable. In other words, you can reassign a string variable to a new value, but you cannot change a particular character in the underlying array.\n\nThe most common operations performed on strings are indexing to get a particular character or substring, joining two strings together by concatenation, and splitting by a delimiter.\n\nCommon cases of string questions:\n\n- string rotations\n- string reversals\n- prefixes\n- suffixes\n- sorting\n\n## Linked Lists\n\nOne way you can think of a linked list is a music playlist, where each item contains the song to be played and a `next song` button. In this abstract playlist, you cannot play any song you want; to play a given song you must play through all the songs before it first.\n\nThere are two main kinds of linked lists. Singly linked lists only contain a pointer to the next node, typically called next, and are implemented as follows:\n\n```python\nclass Node:\n  def __init___(self, data, next=None):\n    self.data = data\n    self.next = next\n```\n\nLinked lists are a recursive data structure: the type of next is another linked list node. Because of this, linked lists have no fixed size like arrays do: a new node can be initialized and appended to a linked list on the fly.\n\nDoubly linked lists, meanwhile, have pointers to the previous and next nodes. They take up more space, but allow you to traverse backwards. The implementation for a doubly linked list looks like this:\n\n```python\nclass Node:\n  def __init__(self, data, next=None, prev=None):\n    self.data = data\n    self.next = next\n    self.prev = prev\n```\n\nReturning to the analogy above, a doubly linked list would mean that each song has both a `previous song` and `next song` button.\n\nCommon operations on linked lists include searching, appending, prepending, and removing nodes.\n\n## Stack and Queues\n\nWhen you find yourself needing to frequently add and remove items from a list, stacks and queues are two data structures that you should consider.\n\nTo understand how a stack works, imagine a literal stack of cafeteria trays. Adding a new one to the top, and removing the top one can be done quickly, whereas it is difficult (read: not allowed) to change trays from the middle. This property is known by the shorthand `last in, first out`, or `LIFO`.\n\nThe traditional names for these operations, as well as a method for checking the value of the top `tray`, are given in the following implementation, in which all methods are `O(1)`:\n\n```python\nclass Stack:\n  def __init__(self):\n    self.stack = []\n\n  def push(self, x):\n    # Add an item to the stack.\n    self.stack.append(x)\n\n  def pop(self):\n    # Remove and return the top element.\n    return self.stack.pop()\n\n  def peek(self):\n    return self.stack[-1]\n```\n\nNote that a pop operation on an empty stack will result in an exception, unless there is proper error handling.\n\nIn the above implementation we have used a Python list as the underlying data structure, meaning the size of the stack will dynamically resize as necessary. Alternatively we could have used a linked list, so that new elements would be added to, and removed from, the tail of the existing chain.\n\nA queue, on the other hand, can be thought of as a group of people standing in line, perhaps waiting to buy a book. Each person enters the line from the back, and leaves in exactly the order that they entered it, a property known as `first in, first out`, or `FIFO`.\n\nQueues are commonly implemented as linked lists, where we enqueue and item by adding a tail node and dequeue an item by removing the head node and moving our head pointer forward.\n\nIn a double-ended queue, one can efficiently append and remove items to either side of the list.\n\n```python\nfrom collection import deque\n\nqueue = deque()\n\nqueue.append(4)\nqueue.append(5)\nqueue.appendleft(6)\n\nprint(queue)        # deque([6, 4, 5])\n\nqueue.popleft()     # 6\nqueue.pop()         # 5\n\nprint(queue)        # deque([4])\n```\n\nThe `append` and `popleft` operations above are more traditionally called enqueue and dequeue, so in the following questions we will frequently use the latter terminology. Along with pop and `appendleft`, these operations run in `O(1)` time.\n\nWhen the most recent item examined is the most important, a stack is frequently a good choice. For this reason stacks often feature in depth-first search, backtracking, and syntax parsing applications.\n\nWhen the order of the items you are dealing with needs to be preserved, on the other hand, a queue is preferable. Queues can be found, for example, in breadth-first search, buffers, and scheduling applications.\n\n## Hash Tables\n\nA hash table is a crucial tool to keep in your data structure arsenal. Simply put, hash tables associate keys with values using a hash function, allowing for `O(1)` lookup, insert, and delete times. Hash Table is a data structure which organizes data using hash functions in order to support quick insertion and search.\n\nYou may be wondering, what's the catch? For one, not everything can be hashed. It is necessary that keys be immutable, so for example Python lists cannot be used as keys. Additionally, under the hood there may be a lot of work needed to implement a rigorous hash function.\n\n```python\nd = {}\n\nd['key'] = 'value'\nprint(d['key']) # 'value'\n\ndel d['key']\nprint(d['key']) # KeyError: 'key'\n\nif 'key' in d:\n  print(d['key'])\nelse:\n  print(\"key doesn't exist\")\n```\n\nNote from above that if a key does not exist in a dictionary, simply trying to get the value will cause a `KeyError`.\n\nThe last few lines show one way of getting around this. In the solutions that follow, we will instead use the `defaultdict` library, which allows you pass in a callable parameter when declaring a dictionary to set the default value for each key.\n\nA common motivating example for using hash tables is the two-sum problem, stated as follows:\n\nGiven a list of numbers a number `k`, return whether any two numbers from the list add up to k. For example, given `[10, 15, 3, 7]` and `k = 17`, we should return `10 + 7 = 17`.\n\nInstead of a brute force solution which checks all pairs of integers to search for this total, we can use the following strategy. For each value we come across, we store it in a hash table with the `True`. We then check if the `k - value` exists in the table, and if so we can return `True`.\n\n```python\ndef two_sum(list, k):\n  seen = {}\n  for num in list:\n    if k - num in seen:\n      return True\n    seen[num] = True\n  return false\n```\n\nThis implementation cuts our time complexity down from `O(n2)` to `O(n)`, since each lookup is `O(1)`.\n\nAs the problem above demonstrates, if you want to make a solution more efficient, a dictionary should be the first tool you look for.\n\nThere are two different kinds of hash tables: `hash set` and `hash map`.\n\n1. The **hash set** is one of the implementations of a set data structure to store no repeated values.\n2. The **hash map** is one of the implementations of a map data structure to store `(key, value)` pairs.\n\nBy choosing a proper hash function, the hash table can achieve wonderful performance in both insertion and search.\n\nIt is an open problem to design a hash function. The idea is to try to assign the key to the bucket as uniform as you can. Ideally, a perfect hash function will be a one-one mapping between the key and the bucket. However, in most cases a hash function is not perfect and it is a tradeoff between the amount of buckets and the capacity of a bucket.\n\n### Hash Set\n\nThe `hash set` is one of the implementations of a `set` which is a data structure to store `no repeated values`.\n\n```python\n# 1. initialize the hash set\nhashset = set()\n\n# 2. add a new key\nhashset.add(3)\nhashset.add(2)\nhashset.add(1)\n\n# 3. remove a key\nhashset.remove(2)\n\n# 4. check if the key is in the hash set\nif (2 not in hashset):\n  print(\"Key 2 is not in the hash set.\")\n\n# 5. get the size of the hash set\nprint(\"Size of hashset is:\", len(hashset))\n\n# 6. iterate the hash set\nfor x in hashset:\n  print(x, end=\" \")\n\nprint(\"are in the hash set.\")\n\n# 7. clear the hash set\nhashset.clear()\nprint(\"Size of hashset:\", len(hashset))\n```\n\n#### An Example\n\n\u003e Given an array of integers, find if the array contains any duplicates.\n\nThis is a typical problem which can be solved by a hash set.\n\nYou can simply iterate each value and insert the value into the set. If a value has already been in the hash set, there is a duplicate.\n\n```java\n/*\n * Template for using hash set to find duplicates.\n */\nboolean findDuplicates(List\u003cType\u003e keys) {\n  // Replace Type with actual type of your key\n  Set\u003cType\u003e hashset = new HashSet\u003c\u003e();\n\n  for (Type key : keys) {\n    if (hashset.contains(key)) {\n      return true;\n    }\n    hashset.add(key);\n  }\n\n  return false;\n}\n```\n\n### Hash Map\n\nThe hash map is one of the implementations of a `map` which is used to store `(key, value)` pairs.\n\n```python\n# 1. initialize a hash map\nhashmap = {0 : 0, 2 : 3}\n\n# 2. insert a new (key, value) pair or update the value of existed key\nhashmap[1] = 1\nhashmap[1] = 2\n\n# 3. get the value of a key\nprint(\"The value of key 1 is: \" + str(hashmap[1]))\n\n# 4. delete a key\ndel hashmap[2]\n\n# 5. check if a key is in the hash map\nif 2 not in hashmap:\n  print(\"Key 2 is not in the hash map.\")\n\n# 6. both key and value can have different type in a hash map\nhashmap[\"pi\"] = 3.1415\n\n# 7. get the size of the hash map\nprint(\"The size of hash map is: \" + str(len(hashmap)))\n\n# 8. iterate the hash map\nfor key in hashmap:\n  print(\"(\" + str(key) + \",\" + str(hashmap[key]) + \")\", end=\" \")\n\nprint(\"are in the hash map.\")\n\n# 9. get all keys in hash map\nprint(hashmap.keys())\n\n# 10. clear the hash map\nhashmap.clear();\nprint(\"The size of hash map is: \" + str(len(hashmap)))\n```\n\n#### Scenario I - Provide more information\n\nThe first scenario to use a hash map is that we `need more information` rather than only the key. Then we can `build a mapping relationship between key and information` by hash map.\n\n#### An Example\n\n\u003e Given an array of integers, return indices of the two numbers such that they add up to a specific target.\n\nIn this example, if we only want to return true if there is a solution, we can use a hash set to store all the values when we iterate the array and check if `target - current_value` is in the hash set or not.\n\nHowever, we are asked to `return more information` which means we not only care about the value but also care about the index. We need to store not only the number as the key but also the index as the value. Therefore, we should use a hash map rather than a hash set.\n\n#### What's More\n\nIn some cases, we need more information not just to return more information but also to help us with our decisions.\n\nIn the previous examples, when we meet a duplicated key, we will return the corresponding information immediately. But sometimes, we might want to check if the value of the key is acceptable first.\n\n#### Template\n\nHere we provide a template for you to solve this kind of problems:\n\n```c++\n/*\n * Template for using hash map to find duplicates.\n * Replace ReturnType with the actual type of your return value.\n */\nReturnType aggregateByKey_hashmap(vector\u003cType\u003e\u0026 keys) {\n  // Replace Type and InfoType with actual type of your key and value\n  unordered_map\u003cType, InfoType\u003e hashtable;\n  for (Type key : keys) {\n    if (hashmap.count(key) \u003e 0) {\n      if (hashmap[key] satisfies the requirement) {\n        return needed_information;\n      }\n    }\n    // Value can be any information you needed (e.g. index)\n    hashmap[key] = value;\n  }\n  return needed_information;\n}\n```\n\n#### Scenario II - Aggregate by Key\n\nAnother frequent scenario is to `aggregate all the information by key`. We can also use a hash map to achieve this goal.\n\n##### Example\n\nHere is an example:\n\nGiven a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.\n\nA simple way to solve this problem is to `count the occurrence` of each character first. And then go through the results to find out the first unique character.\n\nTherefore, we can maintain a hashmap whose key is the character while the value is a counter for the corresponding character. Each time when we iterate a character, we just add the corresponding value by 1.\n\n##### What's more\n\nThe key to solving this kind of problem is to `decide your strategy when you encounter an existing key`.\n\nIn the example above, our strategy is to count the occurrence. Sometimes, we might sum all the values up. And sometimes, we might replace the original value with the newest one. The strategy depends on the problem and practice will help you make a right decision.\n\n##### Template\n\nHere we provide a template for you to solve this kind of problems:\n\n```java\n/*\n * Template for using hash map to find duplicates.\n * Replace ReturnType with the actual type of your return value.\n */\nReturnType aggregateByKey_hashmap(List\u003cType\u003e\u0026 keys) {\n  // Replace Type and InfoType with actual type of your key and value\n  Map\u003cType, InfoType\u003e hashmap = new HashMap\u003c\u003e();\n  for (Type key : keys) {\n    if (hashmap.containsKey(key)) {\n      hashmap.put(key, updated_information);\n    }\n    // Value can be any information you needed (e.g. index)\n    hashmap.put(key, value);\n  }\n  return needed_information;\n}\n```\n\n#### Design a key\n\nIn the previous problems, the choice of key is comparatively straightforward. Unfortunately, sometimes you have to think it over to `design a suitable key` when using a hash table.\n\n##### Example\n\nLet's look at an example:\n\n\u003e Given an array of strings, group anagrams together.\n\nAs we know, a hash map can perform really well in grouping information by key. But we cannot use the original string as key directly. We have to design a proper key to present the type of anagrams. For instance, there are two strings \"eat\" and \"ate\" which should be in the same group. While \"eat\" and \"act\" should not be grouped together.\n\n##### Solution\n\nActually, `designing a key` is to `build a mapping relationship by yourself` between the original information and the actual key used by hash map. When you design a key, you need to guarantee that:\n\n1. All values belong to the same group will be mapped in the same group.\n2. Values which needed to be separated into different groups will not be mapped into the same group.\n\nThis process is similar to design a hash function, but here is an essential difference. `A hash function satisfies the first rule but might not satisfy the second one`. But your mapping function should satisfy both of them.\n\nIn the example above, our mapping strategy can be: sort the string and use the sorted string as the key. That is to say, both \"eat\" and \"ate\" will be mapped to \"aet\".\n\nThe mapping strategy can be really `tricky` sometimes.\n\n### Performance of Hash Table\n\n#### Complexity Analysis\n\nIf there are `M` keys in total, we can achieve the space complexity of `O(M)` easily when using a hash table.\n\nHowever, you might have noticed that the time complexity of hash table has a strong relationship with the design.\n\nMost of us might have used an array in each bucket to store values in the same bucket. Ideally, the bucket size is small enough to be regarded as a constant. The time complexity of both insertion and search will be `O(1)`.\n\nBut in the worst case, the maximum bucket size will be `N`. And the time complexity will be `O(1)` for insertion but `O(N)` for search.\n\n#### The Principle of Built-in Hash Table\n\nThe typical design of built-in hash table is:\n\nThe key value can be any hashable type. And a value which belongs to a hashable type will have a hashcode. This code will be used in the mapping function to get the bucket index.\n\nEach bucket contains an array to store all the values in the same bucket initially.\n\nIf there are too many values in the same bucket, these values will be maintained in a height-balanced binary search tree instead.\n\nThe average time complexity of both insertion and search is still `O(1)`. And the time complexity in the worst case is `O(log N)` for both insertion and search by using height-balanced BST. It is a trade-off between insertion and search.\n\n## Trees\n\nA tree is a recursive data structure consisting of a root node (typically shown at the top) with zero or more `child` nodes, where each child node acts as the root of a new tree.\n\nFor example, below is a binary tree rooted at 7. Binary here means simply that each node is only allowed to have up to two `leaf` nodes.\n\n\u003cpre\u003e\n  5 - -1\n /\n7 - 10 - 4\n     \\\n      40\n\u003c/pre\u003e\n\nNote that we make no restriction at the moment as to the values of the tree.\n\nTrees are directed and acyclic: the connections between parents and children always flow downward, so that it is impossible to form a loop. Further, in contrast to a typical family tree, two parents can never have the same child.\n\nCommon operations in tree involve:\n\n- inserting, searching for, and deleting a particular node\n- finding subtrees, or a subset of nodes that form their own tree\n- determining the distance or relationship between two nodes\n\nTypically to answer these questions you will need to perform a recursive tree traversal, which comes in three flavors:\n\n- in-order: Traverse left node, then current node, then right\n- pre-order: Traverse current node, then left node, then right\n- post-order: Traverse left node, then right node, then current\n\nFor the tree above, for example, the three traversals would generate the following orders, respectively:\n\n- [4, 10, 40, 7, -1, 5]\n- [7, 10, 4, 40, 5, -1]\n- [4, 40, 10, -1, 5, 7]\n\nOther terminology to watch in trees:\n\n- A node A is called an `ancestor` of a node B if it can be found on the path from the root to B.\n- The height or depth of a tree is the length of the longest path from the root to any leaf.\n- A full binary tree is a binary tree in which every non-leaf node has exactly two children.\n- A complete binary tree is one in which all levels except for the bottom one are full, and all nodes on the bottom level are filled in left to right.\n\nTo implement a tree, we begin by defining a Node class and then using it to build `Tree` class:\n\n```python\nclass Node:\n  def __init__(self, data, left=None, right=None):\n    self.data = data\n    self.left = left\n    self.right = right\n```\n\nThe implementation of a given tree will often depend on the tree's application, and the particular traversal algorithm chosen.\n\nTrees can represent a wide variety of objects: animal classification schemas, an HTML document object model, moves in a chess game, or a Linux file system are a few. In general when you are faced with hierarchical data, trees are a great data structure to choose.\n\n## Binary Search Trees\n\nA binary search tree, or BST, is a binary tree search whose node values are guaranteed to stay in sorted order; that is, an in-order traversal of its nodes will create a sorted list. For example, here is a BST of integers rooted at 7:\n\n\u003cpre\u003e\n    6\n   /\n  5 - -1\n /\n7 - 10 - 25\n\u003c/pre\u003e\n\nSimilar to how a sorted array offers more efficient search times over unsorted arrays, BSTs provide several improvements over standard binary trees.\n\nIn particular, insert, find, and delete operations all run in `O(h)` time, where h is the height of the tree. If an efficient implementation is used to maintain the height of the tree around `O(log n)`, where n is the number of nodes, then these operations will be all be logarithmic in n.\n\n```python\nclass Node:\n  def __init__(self, data, left=None, right=None):\n    self.data = data\n    self.left = left\n    self.right = right\n\nclass BST:\n  def __init__(self):\n    self.root = None\n\n  def insert(self, x):\n    if not self.root:\n      self.root = Node(x)\n    else:\n      self._insert(x, self.root)\n\n  def _insert(self, x, root):\n    if x \u003c root.data:\n      if not root.left:\n        root.left = Node(x)\n      else:\n        self.insert(x, root.left)\n    else:\n      if not root.right:\n        root.right = Node(x)\n      else:\n        self.insert(x, root.right)\n\n  def find(self, x):\n    if not self.root:\n      return False\n    else:\n      return self._find(x, self.root)\n\n  def _find(self, x, root):\n    if not root:\n      return False\n    elif x == root.data:\n      return True\n    elif x \u003c root.data:\n      return self._find(x, root.left)\n    else:\n      return self._find(x, root.right)\n```\n\nNote that, as is common in recursive implementation, we use a helper function to properly define our insert and find methods.\n\nThe most common questions on binary search trees will ask you to search for elements, add and remove elements, and determine whether a tree is indeed a BST.\n\n- [Strings](#strings)\n  - [In-order](data-structures/binary_tree/README.md)\n  - [Post-order](data-structures/binary_tree/README.md)\n  - [Pre-order](data-structures/binary_tree/README.md)\n\n### Height-Balanced BST\n\nA height-balanced BST is a special form of BST which aims at improving the performance. It is useful to understand the general idea of a height-balanced BST and how height-balanced BSTs can help you in your algorithm designs.\n\n#### What is a Height-Balanced BST\n\nTerminology used in trees:\n\n- Depth of node - the number of edges from the tree's root node to the node\n- Height of node - the number of edges on the longest path between that node and a leaf\n- Height of Tree - the height of its root node\n\nA height-balanced (or self-balancing) binary search tree is a binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. That is, the height of a balanced BST with N nodes is always `log n`. Also, the height of the two subtrees of every node never differs by more than 1.\n\nAs we mentioned before, the height of a balanced BST with N nodes is always `log N`. We can calculate the total number of nodes and the height of the tree to determine if this BST is a height-balanced BST.\n\nAlso, in the definition, we mentioned a property of height-balanced BST: the depth of the two subtrees of every node never differ by more than 1. We can also validate the tree recursively according to this rule.\n\n#### Why Using a Height-Balanced BST\n\nWhen we analyze the time complexity of search, insertion and deletion operations, it is worth noting that the height of the tree is the most important factor. Taking search operation as an example, if the height of the BST is `h`, the time complexity will be `O(h)`. The height of the BST really matters.\n\nSo let's discuss the relationship between the number of nodes `N` and the height of the tree `h`. For a height-balanced BST, as we discussed in the previous section, `h =\u003e log2N`. But for a normal BST, in the worst case, it can degenerate into a chain.\n\nTherefore, the height of a BST with `N` nodes can vary from `log N` to `N`. That is, the time complexity of search operation can vary from `log N` to `N`. It is a huge difference in the performance.\n\nTherefore, a height-balanced BST play an important role in improving the performance.\n\n#### How to Implement a Height-Balanced BST\n\nThere are several different implementations for height-balanced BSTs. The details of these implementations are different but they have similar goals:\n\n- The data structure should satisfy the binary search property and the height-balanced property.\n- The data structure should support the basic operations of BST, including search, insertion and deletion within `O(log N)` time even in worst case.\n\n## N-ary Trees\n\nA binary tree can be traversed in preorder, inorder, postorder or level-order. Among these traversal methods, preorder, postorder and level-order traversal are suitable to be extended to an N-ary tree.\n\n### Top-down solution\n\n\"Top-down\" means that in each recursion level, we will visit the node first to come up with some values, and pass these values to its children when calling the function recursively.\n\nA typical \"top-down\" recursion function `top_down(root, params)` works like this:\n\n```python\n# 1. return specific value for null node\n# 2. update the answer if needed                              // answer \u003c-- params\n# 3. for each child node root.children[k]:\n# 4.      ans[k] = top_down(root.children[k], new_params[k])  // new_params \u003c-- root.val, params\n# 5. return the answer if needed                              // answer \u003c-- all ans[k]\n```\n\n### Bottom-up solution\n\n\"Bottom-up\" means that in each recursion level, we will firstly call the functions recursively for all the children nodes and then come up with the answer according to the return values and the value of the root node itself.\n\nA typical \"bottom-up\" recursion function `bottom_up(root)` works like this:\n\n```python\n# 1. return specific value for null node\n# 2. for each child node root.children[k]:\n# 3.      ans[k] = bottom_up(root.children[k])    // call function recursively for all children\n# 4. return answer                                // answer \u003c- root.val, all ans[k]\n```\n\n## Tries\n\nA `Trie` is a special form of a `Nary tree`. Typically, a trie is used to `store strings`. Each Trie node represents `a string` (`a prefix`). Each node might have several children nodes while the paths to different children nodes represent different characters. And the strings the child nodes represent will be the `origin string` represented by the node itself plus `the character on the path`.\n\nA trie is a kind of tree whose nodes typically represent string, where every descendant of a node shares a common prefix. For this reason tries are often referred to as prefix trees.\n\n\u003cpre\u003e\n  d - o - g\n /\n  - c - a - t\n \\   \\\n  \\   o - a - t\n   \\\n    b - e - a - r\n\u003c/pre\u003e\n\nFollowing all paths from the root to each leaf spells out all the words that this trie contains, in this case \"bear\", \"cat, \"coast\", and \"dog\".\n\nThere are two main methods used with tries:\n\n- `insert(word)`: add a word to the trie.\n- `find(word)`: check if a word or prefix exists in the trie.\n\nOne important property of Trie is that all the descendants of a node have a common prefix of the string associated with that node. That's why Trie is also called prefix tree.\n\nEach one of these methods will run `O(k)`, where `k` is the length of the word.\n\nTries can be implemented in several ways, but in an interview setting the simplest way is to use a nested dictionary, where each key maps to a dictionary whose keys are successive letters in a given word.\n\nHere is a basic implementation of a trie in Python:\n\n```python\nENDS_HERE = '#'\n\nclass Trie:\n  def __init__(self):\n    self._trie = {}\n\n  def insert(self, text):\n    trie = self._trie\n    for char in text:\n      if char not in trie:\n        trie[char] = {}\n      trie = trie[char]\n    trie[ENDS_HERE] = True\n\n  def find(self, prefix):\n    trie = self._trie\n    for char in prefix:\n      if char in trie:\n        trie = trie[char]\n      else:\n        return None\n    return trie\n```\n\nTrie is widely used in various applications, such as autocomplete, spell checker, etc.\n\n### Performance of a Trie\n\nLet's discuss the complexity of this algorithm.\n\n### Time complexity\n\nIf the longest length of the word is `N`, the height of Trie will be `N + 1`. Therefore, the time complexity of all insert, search and startsWith methods will be `O(N)`.\n\n### Space complexity\n\nIf we have `M` words to insert in total and the length of words is at most N, there will be at most `M*N` nodes in the worst case (any two words don't have a common prefix).\n\nLet's assume that there are maximum `K` different characters (K is equal to 26 in this problem, but might differs in different cases). So each node will maintain a map whose size is at most K.\n\nTherefore, the space complexity will be `O(M*N*K)`.\n\nIt seems that Trie is really space consuming, however, the real space complexity of Trie is much smaller than our estimation, especially when the distribution of words is dense.\n\nYou can also implement it by the array which will achieve a slightly better time performance but a slightly lower space performance.\n\n### How to represent a Trie\n\n#### Array\n\nThe first solution is to use an `array` to store children nodes.\n\nFor instance, if we store strings which only contains letter a to z, we can declare an array whose size is 26 in each node to store its children nodes. And for a specific character c, we can use c - 'a' as the index to find the corresponding child node in the array.\n\nIt is really fast to visit a child node. It is comparatively easy to visit a specific child since we can easily transfer a character to an index in most cases. But not all children nodes are needed. So there might be some waste of space.\n\n#### Hash Map\n\nThe second solution is to use a `hashmap` to store children nodes.\n\nWe can declare a hashmap in each node. The key of the hashmap are characters and the value is the corresponding child node.\n\nIt is even easier to visit a specific child directly by the corresponding character. But it might be a little slower than using an array. However, it saves some space since we only store the children nodes we need. It is also more flexible because we are not limited by a fixed length and fixed range.\n\nYou might wonder why not use a hash table to store strings. Let's do a brief comparison between these two data structures. We assume there are N keys and the maximum length of a key is M.\n\n#### Complexity Analysis Comparison\n\n###### Time Complexity\n\nThe time complexity to search in hash table is typically `O(1)`, but will be `O(log n)` in the worst time if there are too many collisions and we solve collisions using height-balanced BST.\n\nThe time complexity to search in Trie is `O(M)`.\n\nThe hash table wins in most cases.\n\n###### Space Complexity\n\nThe space complexity of hash table is `O(M * N)`. If you want hash table to have the same function with Trie, you might need to store several copies of the key. For instance, you might want to store \"a\", \"ap\", \"app\", \"appl\" and also \"apple\" for a keyword \"apple\" in order to search by prefix. The space complexity can be even much larger in that case.\n\nThe space complexity of Trie is `O(M * N)` as we estimated above. But actually far smaller than the estimation since there will be a lot of words have the similar prefix in real cases.\n\n## Heaps\n\nA heap is a tree that satisfied the (aptly named) heap property, which comes in two flavors:\n- In a max-heap, the parent node's value is always greater than or equal to its child node(s)\n- In a mix-heap, the parent node's value is always smaller than or equal to its child node(s)\n\nNote that, unlike with BSTs, it is possible for a left child to have a greater value (in the case of a min-heap) or for a right child to have a smaller value (in the case of a max-heap).\n\nWhile it is possible for parent nodes to have more than two children, almost all interview questions will deal with binary heaps, so we will make that assumption throughout the following problems. In the following explanation we will also assume that we are dealing with a min-heap, but the same principles apply for max-heaps.\n\nFor example, here is a heap of integers:\n\n\u003cpre\u003e\n        31\n       /\n10 - 14 - 26\n \\\n  19 - 42\n\u003c/pre\u003e\n\nWe can represent a heap in a more space-efficient way by using an array. In this style, the two child nodes of a parent node located at index `i` can be found at indices `2i + 1` and `2i + 2`, like so:\n\n` 10 | 14 | 19 | 26 | 31 | 42 `\n\nWhen using an array to represent a heap, the heap must be filled in level by level, left to right.\n\nWhenever you are asked to find the top k or minimum k values, a heap should be the first thing that comes to mind. Heaps are closely tied to the heapsort sorting algorithm, priority queue implementations, and graph algorithms such as Dijkstra's algorithm, which we will explore in alter chapters.\n\nYou should be familiar with the following heap operations:\n\n- insert(heap, x): add an element x to the heap, `O(log n)`\n- delete-min(heap): remove the lowest node, `O(log n)`\n- heapify(array): convert an array into a heap by repeated insertions, `O(n log n)`\n\nIn the solutions that follow we will make use of Python's `heapq` module to implement the methods above. The corresponding operations are as follows:\n\n- heapq.heappush(heap, x)\n- heapq.heappop(heap)\n- heapq.heapify(array)\n\n## Graphs\n\nGraphs are ones of the most important and widely used data structures. Website links, friend connections, and map routes all rely on graph representations, along with countless other applications.\n\nFormally, graphs are defined as a set of vertices connected by edges. If these edges go in one direction, the graph is said to be directed, otherwise it is undirected. Each edge can additionally be associated with a number that represents its \"cost\" and \"benefit\".\n\nAn example fo directed graph would be followers on Twitter. Just because you follow Elon Musk does not mean he follows you back. On the other hand, friend connections on Facebook are undirected.\n\nMathematicians working in a graph theory have names for many different graph concepts. We don't need of know all of them, but a few will be useful for the explanation that follow:\n\n- neighbor of X: any vertex connected to X by an edge\n- path: a route of edges that connects two vertices\n- cycle: a path that beings and end on the same vertex\n- direted acyclic graph (DAG): a directed graph that does not contain any cycles\n- connected graph: a graph in which there is always a path between any two vertices\n\nAnother classic example is of a (directed) graph is airline routes. In the following diagram, we see that there are flights between JFK and SFO, ORL and LAX, and so on, and each one has an associated plane ticket cost. This graph has several cycles since it is indeed possible to start and end at JFK after following several edges.\n\n\u003cpre\u003e\nLAX -- DFW\n | \\    |\n |  \\   |\n |   \\  |\nJFK-- ORL\n |    /\n |   /\n |  /\n | /\nSFO\n\u003c/pre\u003e\n\nGraphs can be represented in two main ways: adjacency lists and adjacency matrices.\n\nAn adjacency list is essentially a dictionary mapping each vertex to the other vertices between which there is an edge. For the airline diagram above this would be as follows:\n\n```python\n{\n  'JFK': ['SFO', 'LAK'],\n  'SFO': ['ORL'],\n  'ORL': ['JFK', 'LAX', 'DFW'],\n  'LAX': ['DFW']\n}\n```\n\nOn the other hand, in an adjacency matrix, each vertex is associated with a row and column of N x N matrix, and `matrix[i][j]` will be 1 if there is an edge from i to j, else 0.\n\nThis would look like the following:\n\n```python\nindices = {\n  'JFK': 0,\n  'SFO': 1,\n  'ORL': 2,\n  'LAX': 3,\n  'DFW': 4\n}\n\ngraph = [\n  [0, 1, 0, 1, 0],\n  [0, 0, 1, 0, 0],\n  [1, 0, 0, 1, 1],\n  [0, 0, 0, 0, 1],\n  [0, 0, 0, 0, 0]\n]\n```\n\nIn general, the adjacency list representation is more space efficient if there are not that many edges (also knows as a sparse graph), whereas an adjacency matrix has faster lookup times to check if a given edge exists but uses more space.\n\nYou should know the two main traversal methods for graphs: depth-first search (DFS) and breadth-first search (BFS).\n\nBelow is a typical DFS implementation. Note the recursive aspect: for each vertex we visit, we call our function again on each of its neighbors.\n\n```python\ndef DFS(graph, start, visited=set()):\n  visited.add(start)\n  for neighbor in graph[start]:\n    if neighbor not in visited:\n      DFS(graph, neighbor, visited)\n  return visited\n```\n\nBFS, on the other hand, relies on a queue. For each item that we pop off the queue, we find its unvisited neighbors and add them to the end of the queue.\n\n```python\nfrom collections import deque\n\ndef BFS(graph, start, visited={}):\n  queue = deque([start])\n\n  while queue:\n    vertex = queue.popleft()\n    visited.add(vertex)\n    for neighbor in graph[vertex]:\n      if neighbor not in visited:\n        queue.append(neighbor)\n  return visited\n```\n\nBoth of these of these algorithms run in `O(V + E)` time and `O(V)` space in the worst case.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvbrazo%2Fdata-structures-archives","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvbrazo%2Fdata-structures-archives","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvbrazo%2Fdata-structures-archives/lists"}