{"id":19512990,"url":"https://github.com/fdifrison/java-data-structures","last_synced_at":"2025-06-25T03:33:58.004Z","repository":{"id":255652650,"uuid":"851155747","full_name":"fdifrison/java-data-structures","owner":"fdifrison","description":"Review of ADS principles in java","archived":false,"fork":false,"pushed_at":"2024-09-16T13:13:43.000Z","size":41,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-25T23:28:46.486Z","etag":null,"topics":["algorithms","data-structures","java"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fdifrison.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-02T14:27:02.000Z","updated_at":"2024-09-16T13:13:46.000Z","dependencies_parsed_at":"2024-09-06T15:02:00.210Z","dependency_job_id":"5642d3bd-c17f-4e2a-a289-b6d4f9a8e0e4","html_url":"https://github.com/fdifrison/java-data-structures","commit_stats":null,"previous_names":["fdifrison/java-data-structures"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fdifrison/java-data-structures","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-data-structures","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-data-structures/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-data-structures/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-data-structures/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fdifrison","download_url":"https://codeload.github.com/fdifrison/java-data-structures/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-data-structures/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261798725,"owners_count":23211248,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","data-structures","java"],"created_at":"2024-11-10T23:28:25.974Z","updated_at":"2025-06-25T03:33:57.975Z","avatar_url":"https://github.com/fdifrison.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Algorithms and Data structures in Java\n\n## Array\n\nArrays are containers object of fixed size and single type of element.\n\n* **search on index: O(1)**\n* **insert last available position: O(1)**\n* **remove last item: O(1)**\n* **search for arbitrary item: O(n)**\n* **insert at random position: O(n)**\n* **remove at random position: O(n)**\n\nThe items in the array are located in the heap memory next to each other and therefore can be accessed in O(1)\ntime through their index (**Random indexing**). The reference to the array (i.e. the local variable to which the array\nis associated) is stored in the stack memory. Due to its fix size (static DS), resizing an array is an\nexpensive operation since it\nrequires to copy the current data in a new larger array, therefore having a O(n) execution\ntime. Retrieving an arbitrary item has also O(n) since we need to scan the whole\narray Removing and inserting an element at the end of the array has O(1) while perform the same operation an arbitrary\nelement has O(n) since we need to shift all the other elements in the array\n\n## Linked List\n\n* **find/insert/remove item first position: O(1)**\n* **find/insert/remove item arbitrary position: O(n)**\n\nLinked list are dynamic DS, they can acquire memory at runtime without resizing.\nThe items are not next to each other in the heap memory, therefore we can't favor random access like\nfor arrays, but we can potentially store items of different size. Each element contains the data and a reference (link)\nto the next element in the list (but not to the previous);this cause each element to occupy more space. We have direct\naccess to the first node (head node) and the last node is recognized because it points to a null. Inserting an element\nin first position has now O(1) we don't need to shift the other elements but just update the reference to the head node;\non the contrary, inserting an element at the end of the list has O(n) since we have to traverse the whole list to update\nthe reference of the former last node. Therefore, running time is poorly predictable since it depends greatly on the\noperation performed by the user. **Java implementation of linked list has is actually a doubly linked list**\n\n## Doubly Linked List\n\n* **find/insert/remove item first/last position: O(1)**\n* **find/insert/remove item arbitrary position: O(n)**\n\nVery similar to Linked List, Doubly Linked List is a dynamic DS that has a pointer both to the next and the\nprevious element; this increase the size in memory but gives some advantages. We have a reference to both the head and\nthe tail of the list, both having a null reference, respectively the previous for the head and the next for the tail.\nThanks to the double reference we can traverse the list in both direction and insertion/deletion operation have O(1)\nboth at the head and at the tail.\n\n## Stack\n\nStack is an **abstract data type**, meaning that it defines a behavior that the algorithmic implementation must satisfy.\nThe stack is based on the **LIFO** principle: **L**ast **I**tem to be added is the **F**irst item to go **O**ut.\nThe main operation that has to define are **pop()** to get and remove the last inserted item, **push()** to add an item\nand the **peek()** witch get the first item (the last inserted one) but without removing it.\n\n### Stack memory\n\nThe stack memory is that part of the RAM that is used by the JVM to store local variable and method calls (the method's\nstack frame that contains the method's variables). When a method is executed, its corresponding method's frame is popped\nfrom the stack. Stack memory is small in size but fast to access. If the stack memory is filled than we get the infamous\nStackOverFlow error, meaning that there is no more space to insert other method's frame.\n\n### Heap memory\n\nThe heap memory is another part of the RAM which is used by the JVM to perform dynamic allocation of memory, i.e.\nstoring objects (while the reference, i.e. the address of the object is stored in the heap, the object itself, i.e. the\nreference type is stored in the stack, even if the object is created as a local variable inside a method). Its size is\nway larger than the stack but its access time is also greatly slower. The heap memory is freed by the JVM calling the\nGarbage Collector to dispose an object whenever it is not used anymore (i.e. there is no live reference to it in the\nstack memory).\n\n## Queue\n\nQueue is an **abstract data type**, meaning that it defines a behavior that the algorithmic implementation must satisfy.\nThe queue is based on the **FIFO** principle: **F**irst **I**tem to be added is the **F**irst item to go **O**ut.\nThe main operation that has to define are **dequeue()** to get and remove the first inserted item, **enqueue()** to add\nan item and the **peek()** witch get the first inserted item but without removing it. A common application of queues is\nmultithreading, async tasks to ensure consistency and breath-first search algorithm.\n\n## Trees\n\nIn graph theory threes are undirected graph in which any two vertices are connected by exactly one path. We have the\ndefinition of a **root node**, which is the only access point of the tree and to which every other node as a path (every\nnode can be accessed by the via the root node). We can define also a parent-children relation between nodes, and the\nvertices of the tree are called **leaf nodes** (i.e. the only nodes that have a parent but not a children). The height\nof a tree is represented by the number of layers, or the longest path between the root and the leaves.\n\n### Binary Search Trees\n\n* **find/insert/remove item: O(logN)** (with a worst case of O(N) in the case of imbalance tree)\n\nBinary search trees try to solve the common problem to arrays and linked lists, which is the O(n) complexity in\nretrieving an arbitrary value since we have potentially to traverse the whole container. BST instead reach O(logN)\ncomplexity by storing the elements in a sorted order. Having a sorted collection, we can always start to look at the\nmiddle element, look if the value we are looking for is greater or lesser, and discard the half of the collection that\nfor sure won't contain our element.\n\nBST are particular tree DS in which the edges have a direction, every node can have at most 2 children, left child is\nsmaller and right child is bigger thant the parent node. This is how, starting from the root node, we can always discard\nhalf of the tree while searching for a value, since the right branch will contain values greater thant the root while\nthe left branch the opposite.\n\nA tree is called imbalanced if there is a high skewness between left and right nodes and this is a problem since the O(\nlogN) running time for searching and arbitrary element goes up to O(n) (we are basically traversing a linked list).\n\nDue to the structure of BST also find the min and the max value of the container has O(logN) running time since they\nwill always be respectively the leftmost and the rightmost leave node.\n\n## Associative arrays (map or dictionaries)\n\nAssociative arrays are abstract data type composed of a collection of key-value pairs where each key has to be unique.\nAt the cost of being unsorted collection of elements, we can reach O(1) running time for adding, removing and searching\nand element with the use of a hash function. The idea is to obtain random access using and underline array-like\nstructure; the idea is that we have a hash function **h(x)** that is able to map the unique key (of any type) of each\nvalue to an index of the array. A good hash function is such if it generates few **Collisions**, i.e. generates the same\nindex given two different key. However, collision can't be eliminated entirely and therefore must be handled.\nOne (bad) way to resolve collision is called **Chaining**; essentially a linked list is used as a support DS for the\nindex that contains multiple values. However, we hand up having consumed extra space to store the linked list, if many\ncollision happens we will hand up with the linked list searching running time of O(n) instead of the wanted O(1).\nTherefore, a viable choice to handle collision is by using **Open Addressing** and there are many implementations; the\nbasic idea is to try to find a new bucket (index) for the item that end up having a colliding key.\nThe easiest implementation is called **linear probing** where essentially we keep looking at the next non-empty index.\nThis approach has good performance since item are close to each other, but it is easily subjected to clustering, i.e.\nkeys ends up colliding more and more often creating an uneven distribution of the keys in the underlying array.\n\nThe probability **p(x)** of having a collision is defined as **Load Factor,** and it physiologically increases as the\nmore item we insert in the hashtable. It is defined as the ration of the **number of item / size of the keys-array,**\nand it ranges from 0 to 1. A high load factor can increase the running time up to O(n) due to clustering and collisions.\nThe LF is generally the discriminant by which we decide when to resize the hashtable, keeping the best balance between\nmemory usage and running time complexity.\n\nN.B. the size of the underlying array of a hashtable should be a prime number in order to reduce at a minimum the number\nof collision by reducing the number of factors by the prima number itself (i.e. only the index that are a multiple of\nthe array-size will end up in the same bucket). If the array size is 10, then the multiples of 2 and 5 (the two factors\nof 10) will be hashed into the same bucket, generating collisions.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdifrison%2Fjava-data-structures","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffdifrison%2Fjava-data-structures","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdifrison%2Fjava-data-structures/lists"}