{"id":17749910,"url":"https://github.com/0nom4d/binarysearchtree","last_synced_at":"2025-04-01T08:52:32.877Z","repository":{"id":104242792,"uuid":"571888043","full_name":"0Nom4D/BinarySearchTree","owner":"0Nom4D","description":"Implementation of a Binary Search Tree algorithm to find the number of occurrence of a word in a text. - Assignment from Fontys University of Applied Sciences","archived":false,"fork":false,"pushed_at":"2022-11-29T05:10:33.000Z","size":15305,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-03-04T01:41:30.191Z","etag":null,"topics":["benchmark","binary","bst","epitech","fontys","fontys-ict-students","python","search","tree"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0Nom4D.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-29T05:08:14.000Z","updated_at":"2022-12-12T15:50:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"07997ce7-2080-4f95-bd6c-05cac96f3fd9","html_url":"https://github.com/0Nom4D/BinarySearchTree","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0Nom4D%2FBinarySearchTree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0Nom4D%2FBinarySearchTree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0Nom4D%2FBinarySearchTree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0Nom4D%2FBinarySearchTree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0Nom4D","download_url":"https://codeload.github.com/0Nom4D/BinarySearchTree/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246612494,"owners_count":20805354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","binary","bst","epitech","fontys","fontys-ict-students","python","search","tree"],"created_at":"2024-10-26T11:41:49.083Z","updated_at":"2025-04-01T08:52:32.859Z","avatar_url":"https://github.com/0Nom4D.png","language":"Python","readme":"# Assignment n°2: Searching Algorithm\n\n*Every code part is written in Python.*\n\nFirstly, I developed the 2 asked functions:\n\n- LinearCount\n- BinaryCount\n\nThe first function use a linear algorithm in order to found the number of occurrence of the searched file, making a loop to iterate over a list of words.\n\nWhen the word is found, a counter is incremented and we resume until we arrive at the end of the list.\n\n```py\nfrom typing import List\n\ndef LinearCount(wordList: List[str], searchedWord: str) -\u003e int:\n    word_occurrence = int(0)\n\n    for word in wordList:\n        if word.casefold() == searchedWord.casefold():\n            word_occurrence += 1\n    print(f\"Occurence of word '{searchedWord}': {word_occurrence}\")\n    return word_occurrence\n```\n\nThe second function I developed use a Binary Tree and a Node class.\n\nThe Binary Tree has some utils functions used in the binary tree building, such as insertion or as searching.\n\n```py\nfrom enum import IntEnum\n\nclass AddOnSide(IntEnum):\n    UNKNOWN = -1\n    LEFT = 0\n    RIGHT = 1\n\nclass Node:\n    def __init__(self, word):\n        self.occurrence = 1\n        self.leftTree = None\n        self.rightTree = None\n        self.word = word\n\nclass BTree:\n    def __init__(self):\n        self.root = None\n\n    def isEmpty(self) -\u003e bool:\n        return self.root is None\n\n    def getOccurrenceOfWord(self, word) -\u003e int:\n        if self.isEmpty():\n            return 0\n        currentNode = self.root\n\n        while currentNode is not None:\n            if word.casefold() == currentNode.word:\n                return currentNode.occurrence\n            elif word.casefold() \u003c currentNode.word:\n                currentNode = currentNode.leftTree\n            else:\n                currentNode = currentNode.rightTree\n        return 0\n\n    def insertInTree(self, word) -\u003e None:\n        sideToAdd = AddOnSide.UNKNOWN\n        if self.isEmpty():\n            self.root = Node(word.casefold())\n            return\n        parentNode = None\n        currentNode = self.root\n\n        while currentNode is not None:\n            if word.casefold() \u003c currentNode.word:\n                parentNode = currentNode\n                currentNode = currentNode.leftTree\n                sideToAdd = AddOnSide.LEFT\n            elif word.casefold() \u003e currentNode.word:\n                parentNode = currentNode\n                currentNode = currentNode.rightTree\n                sideToAdd = AddOnSide.RIGHT\n            elif word.casefold() == currentNode.word:\n                currentNode.occurrence += 1\n                return\n        if sideToAdd == AddOnSide.LEFT:\n            parentNode.leftTree = Node(word.casefold())\n        elif sideToAdd == AddOnSide.RIGHT:\n            parentNode.rightTree = Node(word.casefold())\n\ndef BinaryCount(bTree: BTree, searchedWord: str, wordList: List[str] = []) -\u003e int:\n    if bTree.isEmpty():\n        for word in wordList:\n            bTree.insertInTree(word)\n\n    occurrenceOfWord = bTree.getOccurrenceOfWord(searchedWord)\n    print(f\"Occurence of word '{searchedWord}': {occurrenceOfWord}\")\n    return occurrenceOfWord\n```\n\n## Question a\n\nIn order to compare both functions for various document length and structure, I made some unit tests. Theses unit tests use a Pytest Fixture called Pytest-Benchmark.\n\nThe Pytest-Benchmark makes a benchmark for the function used. On 10 iteration for each test, I was able to arrived at this result:\n\n![Alt Text](./BenchmarkResult.png)\n\nThe results shows 3 types of benchmark:\n\n- Linear Search\n- Binary Search with Tree Creation Time\n- Binary Search without Tree Creation Time\n\nI used some text with incremented size (Phrase, Football Commentary, Lyrics of French Music, Lorem Ipsum of 2Mo, Bible).\n\n![Alt Text](./Timeplot.png)\n\n- Real values for '10x Bible' are available in the benchmark screenshot (in microseconds).\n\n- For each different size, the time taken by each function is different. The larger the file, the longer it takes to search.\n- For each test, we can see that the Binary Tree Search is faster than the Linear Search.\n\nWe also can see that the creation of the tree can is also a parameter to think about.\n\nIn some tests on tiny file, the creation of the binary tree and the search using a Binary Search Tree is faster than the linear search.\n\nWith larger files, the creation and the search using a Binary Search Tree takes too much time.\n\n## Question b\n\nConcerning the maximum handled file size, the highest file size I made was 40 times the Bible in a file without crashing (1,74 Go).\n\nBeing developed with Python, I think the function I made don't have any file size limit or the file size limit would be really long to find.\n\nThe reason is the memory management of Python making any program almost impossible to make crash.\n\nTherefore, both of my functions don't use recursion so they can't crash with recursion limit either.\n\nThe only thing that can make this functions crash would be adding a '/dev/zero' value to the end of a line in a file making the program running out of memory because the line doesn't end.\n\n## Question c\n\nThe diversity of a document vocabulary has an influence on the search performance.\n\nUsing the binary search tree, we can ask ourselves \"How can the diversity influence the search performance?\".\n\nThe more different words we have, the longer the binary tree will be and the longer it will take to go through the binary tree.\n\nIf we have words that come up frequently, we will have a number of nodes that is less than the number of words in the text. Therefore we will take less time to go through the tree.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0nom4d%2Fbinarysearchtree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0nom4d%2Fbinarysearchtree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0nom4d%2Fbinarysearchtree/lists"}