{"id":16370835,"url":"https://github.com/integralist/data-structures","last_synced_at":"2025-10-26T07:32:07.222Z","repository":{"id":20504287,"uuid":"23782818","full_name":"Integralist/Data-Structures","owner":"Integralist","description":"List of common abstract data structures","archived":false,"fork":false,"pushed_at":"2014-12-24T23:11:36.000Z","size":220,"stargazers_count":7,"open_issues_count":0,"forks_count":12,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-07T17:23:30.583Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Integralist.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-09-08T07:43:18.000Z","updated_at":"2021-02-17T19:51:20.000Z","dependencies_parsed_at":"2022-09-09T11:52:18.229Z","dependency_job_id":null,"html_url":"https://github.com/Integralist/Data-Structures","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Integralist/Data-Structures","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Integralist%2FData-Structures","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Integralist%2FData-Structures/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Integralist%2FData-Structures/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Integralist%2FData-Structures/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Integralist","download_url":"https://codeload.github.com/Integralist/Data-Structures/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Integralist%2FData-Structures/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281074244,"owners_count":26439422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T03:06:14.100Z","updated_at":"2025-10-26T07:32:06.975Z","avatar_url":"https://github.com/Integralist.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Structures\n\nThis is a repository of abstract data structures written in Ruby.\n\nI do not attempt to make a canonical list of all different API variations or \nimplementations across different languages. This repo provides the basic requirements \nto understand and implement these data structures using any language of your choosing.\n\nThe inspiration for this repo was acquired through the O'Reilly book: \n\"Practical Data Structures and Algorithms using JavaScript\". Although the basic \ninformation in the book was useful, it unfortunately suffered from quite a few \ncode and syntax errors, as well as non-idiomatic (almost poor) JavaScript code.\n\nBecause of this, I decided to re-implement the data structures using my language \nof choice: Ruby. But please do read the book yourself as there is much more \ninformation presented there that I will not be replicating here. This repo is a \nmere outline of what the book teaches and I suggest you read the book for yourself.\n\nWhat I've used from the book is: basic outline of the different data structure \nAPIs along with a simple explanation of the data structure (indicated by a \"quote\" block). \nOtherwise all other information acquired and code implemented is my own.\n\n- [Array](#array)\n- [List](#list)\n- [Stack](#stack)\n- [Queue](#queue)\n- [Dictionary](#dictionary)\n- [Hash Table](#hash-table)\n- [Set](#set)\n- [Tree](#tree)\n\n## Array\n\n\u003e An Array is a linear collection of elements,  \nwhere the elements can be accessed via indices,  \nwhich are usually integers used to compute offsets\n\nArrays are one of the more contentious data structures as they have a variety \nof different implementations, depending on the host language. For example, an \nArray has a specific \"set-size\" data structure (meaning it can't expand beyond \nits set dimensions and this might conflict with many reader's understanding of \na typical Array), and this data structure can be the basis of other data structures \nsuch as Lists and Strings.\n\nBut what I have found in many languages is that the Array data structure is \nimplemented using *other* data structures such as a Dictionary (or Object/Hash \nTable depending on your background), Linked List and even Search Trees.\n\nJavaScript's implementation of an Array, for example, is actually an Object, \nwhere the numeric indices are converted into String keys.\n\nThe difference in implementations means that the properties of the data structure  \nwill change depending on the implementation. So in a classical sense an Array is \na fixed size data structure, but in a language such as Ruby, PHP or JavaScript \n(for a small example) it is considered to be a \"Dynamic Array\", meaning the size \ncan change and allow for additional elements to be added on to the Array. For the \npurposes of this repository I'll consider the definition that of a Dynamic Array.\n\n### API\n\n- `index` (return index of specified element)\n- `pop` (remove last element)\n- `push` (append new element)\n- `shift` (remove first element)\n- `unshift` (prepends new element)\n\n### Non-standard API\n\n- `slice!` (Remove specified index with optional length/range)\n- `insert` (insert new elements at specified index)\n\n\u003e Note: we're using the Ruby programming language, but other languages can  \nprovide slightly different APIs. Such as JavaScript, whose API includes `slice`  \nbut not `insert` (instead it implements `splice`)\n\n### Properties\n\n- Sequential/Ordered (i.e. consistent element ordering based on collection population)\n- Random access\n- Element indices computed at run time\n- Duplicates allowed\n\n### Usefulness\n\nReference: [http://en.wikipedia.org/wiki/Dynamic_array#Performance](http://en.wikipedia.org/wiki/Dynamic_array#Performance)\n\n#### Pros\n\n- Faster lookups (due to index access, rather than looping entire data structure)\n  - This is possible due to the underlying implementation being a Hash Table\n- Inserting/deleting elements at the end of the Array (due to index access)\n\n#### Cons\n\n- Inserting/deleting elements at the beginning/middle of the Array (due to all elements needing to be re-indexed)\n\n### Notes\n\nThere is a mathematic variant of the Array called a [Vector](http://en.wikipedia.org/wiki/Vector_space) \nand you may be confused as to its use in the [Clojure programming language](http://clojure.org/data_structures#Data%20Structures-Vectors%20\\(IPersistentVector\\)) but in practical \nterms they are implemented as a Dynamic Array data structure.\n\nI've not provided an implementation of the Array as it already exists as part of the Ruby language\n\n---\n\n## List\n\n[Code](Linked-List/implementation.rb)\n\nA List is a simple collection of elements, which is different from an Array in \nthat the elements within the List can only be accessed by looping through the \nentire list searching for the element of interest.\n\nThe List is made up of a collection of \"nodes\". Each node will hold:\n\n- data for the current node\n- pointer to the next node\n\nThis type of List is known as a Linked List, meaning each node is linked to the \nnext node (hence there is no index access). The starting node is referred to as \nthe \"head\" and then from the head you can loop iteratively through each node.\n\nThere are also different types of Linked List, the standard type is a \"singly\" \nLinked List, where by the last node's pointer will point to `null` (to indicate \nthe end of the list). But there is also a \"doubly\" Linked List which allows each \nnode's pointer to point to the *previous* node (not just point to the following node).\n\n\u003e Note: the following API is based on the \"singly\" Linked List\n\nThere are also \"circularly\" Linked Lists which simply loop around and around the \ndata structure (i.e. when it reaches the last node, the pointer will point \nto the starting node and vice-versa for the first node moving backwards).\n\nWhen inserting items, for performance, new items are inserted before the node at the head\n\n### API\n\n- `insert` (prepend new element or insert new element at specified index)\n- `remove` (remove element at specified index)\n- `clear` (remove all elements from the list, leaving an empty list)\n- `next` (move to next item in the list)\n- `move_to` (move to specified index)\n- `position` (returns the current index position)\n- `get` (returns the element at the specified index)\n- `front` (change index position to be the starting position)\n- `end` (change index position to be the ending position)\n- `next?` (returns Boolean value checking if a next index exists)\n\n### Properties\n\n- Sequential/Ordered (i.e. consistent element ordering based on collection population)\n- Duplicates allowed\n\n### Usefulness\n\nReference: [http://en.wikipedia.org/wiki/Dynamic_array#Performance](http://en.wikipedia.org/wiki/Dynamic_array#Performance)\n\n#### Pros\n\n- Inserting/deleting elements at the beginning of the Array (due to sequential iteration access)\n- Better at inserting/deleting elements in the middle of the Array (although still requires search time)\n- Useful if the order of your data doesn't matter (if it does then a Stack might be preferrable)\n- Useful if you don't need to search your data in a performant manner\n\n#### Cons\n\n- Slower lookups (due to sequential iteration access, rather than having direct index access)\n- Inserting/deleting elements at the end of the Array (due to sequential iteration access)\n- Long searches or complex sorting can be a performance concern\n\n---\n\n## Stack\n\n\u003e A stack is a list of elements that are accessible only  \nfrom one end of the list, which is called the `top`  \nFor example, the stack of trays at a cafeteria.  \nTrays are always removed from the top,  \nand when trays are put back on the stack after being washed,  \nthey are placed on the top of the stack\n\nBecause this data structure is LIFO (Last In, First Out) it means that you can \nonly ever access the element that is on the top of the stack. If you want to \naccess an element further down the stack then you'll need to remove the elements \nthat currently sit above it.\n\nThe underlying implementation of the Stack will be an Array wrapped in an Object. \nThis is because the Array by default provides the relevant `pop` and `push` methods \n(see API below).\n\nAn example of a Stack's usage could be determining a \"Palindrome\" (a word that \nis spelt the same forwards and backwards, like \"dad\"). You split the word into \nletters and store each letter into the stack and then process the stack (i.e. pop \neach element into an Array and then the Array should be the same as the Stack).\n\nExample:\n\nStack -\u003e `[h, e, l, l, o]` when `pop`'ed into an Array would look like `[o, l, l, e, h]` where we \ncan see the word \"hello\" **isn't** a palindrome. But if we try again using the word \"dad\": \n`[d, a, d]`, we can see that you end up with a new Array `[d, a, d]` which **is** \nthe same word.\n\n### API\n\n- `pop` (remove top element)\n- `push` (append new element)\n- `peek` (return the top element but don't `pop` it from the Stack)\n- `clear` (remove all elements from the Stack, leaving an empty Stack)\n\n### Properties\n\n- Sequential/Ordered (i.e. consistent element ordering based on collection population)\n- LIFO (Last In, First Out)\n\n### Usefulness\n\n- Useful when the order of your data matters\n- Useful for searching the collection for specific data\n- Fast (as Stacks only allows `push` and `pop` functionality)\n- Used a lot in underlying language implementations (e.g. expression evaluation and handling function calls)\n\n---\n\n## Queue\n\n\u003e A queue is a type of list where data is  \ninserted at the end and is removed from the front.  \nQueues are used to store data in  \nthe order in which they occur, as opposed to a stack,  \nin which the last piece of data entered  \nis the first element used for processing\n\n\u003e Think of a queue like the line at your bank,  \nwhere the first person into the line  \nis the first person served,  \nand as more customers enter a line,  \nthey wait in the back until it is their turn to be served\n\nLuckily for us Ruby already implements a `Queue` class ([http://ruby-doc.org/stdlib-2.0.0/libdoc/thread/rdoc/Queue.html](http://ruby-doc.org/stdlib-2.0.0/libdoc/thread/rdoc/Queue.html)) \nalthough the API doesn't quite match up to the definition of what a Queue should \nimplement. I've listed the correct API below and implemented by own Queue in \nthe code samples, although I've had to name it differently so as not to clash \nwith the official `Queue` class.\n\n### API\n\n- `enqueue` (append new element)\n- `dequeue` (remove first element)\n- `peek` (return the first element but don't `dequeue` it from the Queue)\n- `clear` (remove all elements from the Queue, leaving an empty Queue)\n\n### Properties\n\n- Sequential/Ordered (i.e. consistent element ordering based on collection population)\n- FIFO (First In, First Out)\n\n### Usefulness\n\n- Processing data that needs to be handled in a sequential order\n- Can be useful for sorting data (depending on the sort)\n\n### Notes\n\nThere are some situations where a Queue needs to be processed not in a FIFO \nsequence. This scenario is known as a \"Priority Queue\" and it works like this: \nelements are removed from the the Queue based on a priority *constraint*.\n\nThe way this would be implemented is by instead of storing raw data inside the \nQueue you would store Objects that contain the data along with a priority field. \nWe would then redefined our `dequeue` method to loop through the Queue looking \nfor the element with the highest priority and to process that. By nature of the \nbasic implementation if we have multiple objects with the same priority then we\nstill continue to process elements on a FIFO basis (as we start looping from the \nstart of the Queue).\n\n---\n\n## Dictionary\n\nA Dictionary is simply a collection of key-value pairs. Any time we wish to \nretrieve content from the Dictionary we use the \"key\" to search for the content. \nWhen the key is found, the associated \"value\" is returned.\n\n\u003e Note: Dictionaries are often referred to as a `Map`  \nand in some languages as a `Hash` (although that  \nterminology gets confusing when considering the  \n`Hash Table` data structure). Dictionary is likely  \nthe best term to use for this type of key/value structure\n\nThe following is an example of a basic Dictionary lookup in Ruby:\n\n```ruby\ndictionary = {\n  :foo =\u003e \"bar\", \n  :baz =\u003e \"qux\"\n}\n\ndictionary[:foo] # =\u003e returns the value \"bar\"\n```\n\n### API\n\n- `add` (takes a key and value as two separate arguments and adds them to the Dictionary)\n- `find` (takes a key as a single argument and returns the associated value)\n- `remove` (takes a key as a single argument and removes it, and its associated value, from the Dictionary)\n- `all` (returns a custom display of all key-value pairs)\n\n### Properties\n\n- Unordered (i.e. random key ordering regardless of collection population)\n- The `all` method should sort the keys (i.e. `dictionary.keys.sort`)\n\n### Usefulness\n\n#### Pros\n\n- Fast insertion, deletion and retrieval\n\n#### Cons\n\n- Slow to search\n\n### Notes\n\nSome languages implement ordered key-value pairs (such as Ruby 1.9+) meaning \nthat when enumerating over the Dictionary, the keys are read in the order they \nwere inserted.\n\n---\n\n## Hash Table\n\nA Hash Table is fundamentally an Dictionary data structure. The keys for the \nDictionary are determined by a function that creates a hash of the key's \nassociated data.\n\n\u003e Note: the hash algorithm can be anything you like  \nthe `hash_it` function defined below I just made up  \nas it created a seemingly varied index based on each  \ncharacter byte from my data\n\nData | Hash Algorithm    | Hash Value\n---- | ----------------- | ----------\nMark | `hash_it(\"Mark\")` | 395\nJohn | `hash_it(\"John\")` | 399\nKaty | `hash_it(\"Katy\")` | 409\nkraM | `hash_it(\"kraM\")` | 395\n\n```ruby\ndef hash_it(name)\n  [].tap do |arr|\n    name.each_byte { |b| arr \u003c\u003c b }\n  end.reduce(:+)\nend\n```\n\nThe \"hash value\" (as seen in the table matrix above) is the key we'll use to store \nour data under within our Dictionary. What you should also notice is there is a collision \nbetween our indices. The combination of the data \"Mark\" and \"kraM\" used with our \nparticular algorithm has demonstrated that we have a collision that needs to be \nresolved somehow (we'll come back to this).\n\n\u003e Note: I'm using a Dictionary in this explanation  \nbut if your language's implementation of an Array  \nallows any type of data for its indices then you  \nshould really use an Array instead.\n\n### What data structure *should* I use?\n\nFor the purposes of the implementation code in this repo I've decided to use an \nArray and to make sure that, regardless of the data, my hashing algorithm generates \na numerical value.\n\nThe reason for this is simply because the work around for collisions are centered \naround the data structure being an Array.\n\n### Hashing algorithm\n\nIt's probably easier if I just quote from the book here:\n\n\u003e The choice of a hash function depends on the data type of the key.  \nIf your key is an integer, then the simplest hash function is  \nto return the key modulo the size of the array.  \nThere are circumstances when this function is not recommended,  \nsuch as when the keys all end in 0 and the array size is 10.  \nThis is one reason the array size should always be a prime number,  \nsuch as 137\n\n\u003e Also, if the keys are random integers,  \nthen the hash function should more evenly distribute the keys.  \nThis type of hashing is known as modular hashing\n\n\u003e To avoid collisions, you first need to make sure the array  \nyou are using for the hash table is sized to a prime number.  \nThis is necessary due to the use of modular arithmetic in  \ncomputing the key.  \nThe size of the array needs to be greater than 100 in order  \nto more evenly disperse the keys in the table.  \n\nIn the above quote, they state they're using \"modular arithmetic\" for their \nalgorithm. This means that the size of your Array data structure  will depend \nslightly on the hashing algorithm results and can only be decided upon after \nsome experimentation.\n\n### Hashing collisions\n\nThere are two popular techniques for resolving hashing collisions:\n\n1. Separate Chaining\n2. Linear Probing\n\n#### Separate Chaining\n\nEffectively our keys will hold a sub data structure and we use need a technique \nfor storing our data into the sub array, allowing us to store the same hashed \nvalue key in the top level Dictionary.\n\nThe sub data structure you use could be another Dictionary or an Array, it depends \non the technique you plan on using for storing the data within the sub data structure.\n\n#### Linear Probing\n\nThis technique dictates that when a collision is found, the Hash Table will check \nto see if the next available index is empty and if so it'll place the data into \nthat next index.\n\nThe idea behind this technique is that because Hash Table keys are generally quite \ndistributed (e.g. they're rarely sequential 0, 1, 2, 3, 4) it's likely that you'll \nhave many empty empty elements and you can use that empty space to store your \ncolliding data. \n\n\u003e Note: Linear Probing is suggested over Separate Chaining  \nif your data structure is expected to be quite large\n\nPersonally I don't like the idea of the Linear Probing technique as it feels like \nit'll introduce more complexity and bugs. Also, there is a problem with this \ntechnique which is that it relies on the top level data structure being an Array. \nWhich is fine if the key we're constructing is numerical, but if you want to have \nStrings for your keys then that wont work very well and so you'll need to be clever \nwith how you implement this using a Dictionary.\n\n### API\n\n- `put` (takes \"data\" and passes it to our hashing function, then assigns the result to the Dictionary, returning the generated key)\n- `get` (takes the hash value our data was assigned to and retrieves the data from the Dictionary)\n- `all` (returns a custom display of our Dictionary, including the hash value keys)\n\n---\n\n## Set\n\nA Set is simply a collection of unique values. \n\n### Terminology\n\n- The elements of a Set are typically referred to as \"members\"\n- A Set with no members is called an \"empty\" Set\n- The Set itself is referred to as the \"universe\"\n\nMultiple Sets are considered equal if they contain the same members.\n\nA Set is considered a \"subset\" of another Set if all the members of the first Set \nare contained in the second Set.\n\n### API\n\n- `add` (appends new element)\n- `remove` (removes specified element)\n- `union` (returns a new Set combining the elements from two provided Sets)\n- `intersection` (returns a new Set containing elements that exist in both provided Sets)\n- `difference` (returns a new Set containing elements from the current Set that DON'T exist in the provided Set)\n- `subset?` (returns a Boolean indicating whether the provided Set is a subset of the current Set)\n- `show` (returns a custom display of the Set)\n\n### Properties\n\n- Unordered (i.e. random element ordering regardless of collection population)\n- Unique list of elements (no duplicates)\n\n### Usefulness\n\n- Ensuring a collection of unique data\n\n---\n\n## Tree\n\nA Tree data structure is used to store hierarchical data (it is non-linear), \nsuch as an Operating System file system.\n\nA Tree contains \"nodes\" (a node has a value associated with it) and each node is \nconnected by a line called an \"edge\". These lines represent the relationship \nbetween the nodes.\n\nThe top level node is known as the \"root\" and a node with no children is a \"leaf\".\n\nIf a node is connected to other nodes, then the preceeding node is referred to \nas the \"parent\" and nodes following it are \"child\" nodes.\n\n### Binary Trees\n\nThere is a specific type of tree referred to as a \"Binary Tree\", which restricts \nits child nodes to no more than two.\n\nThe children of a node are referred to as the \"left\" node and the \"right\" node.\n\nWhen creating a Binary Tree object you'll pass in the first node to become the \nroot node and each node is itself an object with an associated value, along with \na left and right node linked to.\n\n### Binary Search Trees\n\nThere is another type of tree called a \"Binary Search Tree\" and this is an \nextension of the Binary Tree, with the addition that the child nodes are stored \nin a specific order depending on a custom calculation.\n\nThe calculation is very simple: if a node has a lesser value than its parent it \nis placed in the \"left\" node position. If on the other hand it has a greater value \nthan its parent it is placed in the \"right\" node position.\n\n### API\n\n- `insert` (adds new node to the tree; which can require traversal of the tree)\n- `in_order` (traversal method: visits all nodes in ascending order†),\n- `pre_order` (traversal method: same as `in_order` but prints the current node *before* checking the left/right child nodes)\n- `post_order` (traversal method: same as `in_order` but prints the current node *after* checking the left/right child nodes)\n- `show` (display the current node's value)\n\n\u003e Note: traversal methods rely on recursion  \nand need to be carefully implemented to ensure  \nthey do not cause a Stack Overflow error\n\n#### † `in_order`\n\nThe `in_order` function would theorectically look something like:\n\n```ruby\ndef in_order(node)\n  unless node.nil?\n    in_order node.left\n    puts node.show\n    in_order node.right\n  end\nend\n```\n\nNow imagine we have the following Binary Search Tree structure:\n\n```\n     23\n     |\n   -------\n   |     |\n  16     45\n  |      |\n ----   ----\n |  |   |  |\n03  22  37 99\n```\n\nIf we executed `in_order(23)` we would see the following output:\n\n`3 16 22 23 37 45 99`\n\nThis is how it works...\n\nIf we passed in the root node (23) to `in_order` then that function would check \nif the root node had a value and if it did it would check to see if it had a left \nchild node (which it does: 16). \n\nThrough recursion the function would check if that node had value (it does: 16) \nand then check if itself had a left child node (it does: 03).\n\nFrom there it would check if the `03` node had a left child node (it doesn't). \nSo it would fail the check for `nil?` and move back up the function call stack\nprint `03` (using the `node.show` function) and then see if the `03` node had a \nright value (it doesn't).\n\nNow the code again moves back up the function call stack and print `16` and now \ncheck if the node `16` has a right node (it does: 22).\n\nDoes `22` have a value, yes so see if it has a left node (it doesnt), so move \nback up the function call stack and print `22` and now check if it has a right \nnode (it doesn't) so again move up the function call stack and that takes us \nback to the root node `23` which we print and see if that has a right node \n(it does: 45)... and so forth until the end.\n\n#### Function Output\n\nInorder traversal:  \n3 16 22 23 37 45 99\n\nPreorder traversal:  \n23 16 3 22 45 37 99\n\nPostorder traversal:  \n3 22 16 37 99 45 23\n\n### Properties\n\n- No direct access to nodes (start from the head/root and navigate down)\n\n### Usefulness\n\n- Quick to search\n- Quick insertion/deletion\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintegralist%2Fdata-structures","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintegralist%2Fdata-structures","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintegralist%2Fdata-structures/lists"}