{"id":19546433,"url":"https://github.com/lucasste/havina","last_synced_at":"2026-05-12T13:40:39.631Z","repository":{"id":203299612,"uuid":"709279165","full_name":"LucasSte/havina","owner":"LucasSte","description":"A tool to generate Knowledge Graphs from sentences or evaluate language models' text comprehension.","archived":false,"fork":false,"pushed_at":"2024-01-11T20:58:16.000Z","size":193,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-08T23:18:26.031Z","etag":null,"topics":["bert","bert-embeddings","knowledge-graph","language-model","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LucasSte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-24T11:55:59.000Z","updated_at":"2024-06-22T05:20:39.000Z","dependencies_parsed_at":"2023-12-14T19:49:28.176Z","dependency_job_id":"f4822f6f-f3d1-465d-8c33-2e2f9bb1b7c6","html_url":"https://github.com/LucasSte/havina","commit_stats":null,"previous_names":["lucasste/havina"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasSte%2Fhavina","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasSte%2Fhavina/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasSte%2Fhavina/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasSte%2Fhavina/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LucasSte","download_url":"https://codeload.github.com/LucasSte/havina/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240801107,"owners_count":19859729,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","bert-embeddings","knowledge-graph","language-model","python","pytorch"],"created_at":"2024-11-11T03:45:13.627Z","updated_at":"2026-05-12T13:40:34.601Z","avatar_url":"https://github.com/LucasSte.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Havina\n\nHavina is a Python library that can generate knowledge graphs triplets from an input text. Its implementation\nis based on the paper \"[Language models are open knowledge graphs](https://arxiv.org/abs/2010.11967)\" with some\ntweaks to improve performance. Most notably, instead of summing the attention scores of each word in a relation,\nI am calculating their average. \n\nThe reasoning behind this change is that a simple sum of scores favors longer relations even if the extra words\ndo not carry any relevant meaning.\n\nHavina can be used to evaluate the language comprehension of AI models or as a tool to extract triplets from text \nand build knowledge graphs.\n\n## How to use it\n\nRun `pip install havina` to install the library. Then, after importing the `GraphGenerator` class \nfrom havina, simply call the object with the sentence to evaluate and an optional number of workers. \nEach worker will span a different process and the algorithm will split the work between them.\n\nFor more information about the constructor parameters, check the \n[Constructor parameters section](#constructor-parameters).\n\n```python\nfrom havina import GraphGenerator\n\ntext = 'John Lennon is a famous singer.'\ngenerator = GraphGenerator(\n    top_k=4,\n    contiguous_token=False\n)\n\ntriplets = generator(text, workers=1)\nprint(triplets)\n```\n\nThe code above will print the following:\n```\n[\n    HeadTailRelations(\n        head=Entity(text='john lennon', wikidata_id=None), \n        tail=Entity(text='a famous singer', wikidata_id=None), \n        relations=['be'])\n]\n```\n\nThe returned type is a list of `HeadTailReations` objects, each of which contains\nthe head and tail entities and the possible relations between them. Relations are\nPython strings.\n\n## Example sentence\n\n\nGrabbing the following paragraph from [Wikipedia](https://en.wikipedia.org/wiki/Amsterdam) and using the library as in [example.py](example.py),\nwe have the graph depicted below. It contains only the relations for the `amsterdam` and the\n`the netherlands` nodes to avoid cluttering the image.\n\n\u003e Amsterdam was founded at the mouth of the Amstel River that was dammed to control flooding; the city's name \n\u003e derives from a local linguistic variation of the word dam. Originally a small fishing village in the late 12th \n\u003e century, Amsterdam became a major world port during the Dutch Golden Age of the 17th century, when the Netherlands \n\u003e was an economic powerhouse. Amsterdam was the leading centre for finance and trade, as well as a hub of \n\u003e production of secular art. In the 19th and 20th centuries, the city expanded and many new neighborhoods and \n\u003e suburbs were planned and built. The canals of Amsterdam and the 19-20th century Defence Line of Amsterdam are both \n\u003e on the UNESCO World Heritage List. Sloten, annexed in 1921 by the municipality of Amsterdam, is the oldest part of \n\u003e the city, dating to the 9th century. The city has a long tradition of openness, liberalism, and tolerance. Cycling \n\u003e is key to the city's modern character, and there are numerous biking paths and lanes spread throughout \n\u003e the entire city.\n\n![alt text](example_graph.png \"Example\")\n\n## How it works\n\nThe last layer of a transformer based language model, like BERT, outputs attention matrices for all its attention heads.\nWe calculate the average of all the matrices to operate the algorithm. If we are looking for head-tail relationships\nwith tokens from left to right, we would have a matrix like the following. We disregard\nthe attention scores from below the diagonal because they represent a word-to-word relationship from right to left.\n\nWe assume that each attention score represent the probability of two words being related in the sentence.\nWe show below an example of the beam-search input for the sentence \"Joe is curious about cars\".\n\n|         | Joe | is  | curious | about | cars |\n|---------|-----|-----|---------|-------|------|\n| Joe     | X   | 0.1 | 0.4     | 0.2   | 0.3  |\n| is      | X   | X   | 0.1     | 0.3   | 0.1  |\n| curious | X   | X   | X       | 0.4   | 0.2  |\n| about   | X   | X   | X       | X     | 0.4  |\n| cars    | X   | X   | X       | X     | X    | \n\n\nWe utilize spaCy to determine the noun chunks and link them to form head-tail pairs. One possible\nhead-tail pair for this example would be \"(Joe, cars)\". Taking \"Joe\" as the first word, we traverse\nthe first line of the matrix forming candidate relationships. \"is\", \"curious\", \"about\" are all candidates.\n\nThe beam-search only passes for its next iteration the top-k candidates, based on their average attention scores.\nIf k is one, the only candidate for the second iteration is \"curious\" with a score of 0.4.\n\nWe now traverse the third line in the matrix, looking for possible next tokens given \"curious\".\n\"curious about\" is the only possibility here because \"cars\" belongs to the tail chunk. \"curious\" about\nhas a score of `(0.4+0.4)/2=0.4`.\n\nThe relationship we found for the head-tail pair \"(Joe, cars)\" is \"curious about\", so the triplet\nlooks like `(Joe, curious about, cars)`.\n\nIn a later stage, we remove prepositions for the relations and uncapitalize, so the final triplet is\n`(joe, curious, cars)`.\n\n\n## Using other language models\n\nHavina can be a tool for evaluating the text understanding of language models. As it finds the relationship between\nwords from the attention matrices, checking the resulting triplets indicates whether the model is learning correct\nparameters.\n\nThe easiest way to use the library with another language model is to derive the `LanguageModel` class for it, as\nthe examples in [language_model.py](havina/language_model.py).\n\n```python\nimport havina\n\nclass MyModel(havina.LanguageModel):\n    # Implement the functions here\n    pass\n\ngenerator = havina.GraphGenerator(model=MyModel)\n```\n\n## Constructor parameters\n\n\nThe `GraphGenerator` has many parameters that change the algorithm's behavior. Use table below to understand\nhow each of them may affect the results.\n\n\n| Parameter name      | Effect on results                                                                                                                                                                                                                        | Default value |\n|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|\n| `top_k`             | Determines the number of candidates for the next iteration of beam search. A larger number improves the quality of relations, but takes longer to run.                                                                                   | 4             |\n| `threshold`         | Sets the score limit for the algorithm to eliminate low quality relations. A higher threshold returns less results, but of better quality.                                                                                               | 0.015         |\n| `link_entity`       | Whether to link nouns to Wikidata entities. If set to true, the algorithm will only find relations for nouns linked to an entity. Setting to false will increase the number of results.                                                  | `False`       |\n| `model`             | Language model to use. Currently, BERT and Llama2 (MPT-7B) are supported.                                                                                                                                                                | `bert`        |\n| `contiguous_tokens` | Whether the relation tokens will be adjacent in the original sentence. If set to `True`, `(Joe, considers a friend, Anne)` will not be a triplet for \"Joe considers Anne a friend\", because the relation tokens are not adjacent.        | `True`        |\n| `forward_tokens`    | Whether the relation tokens can follow only a right-to-left order. If set to `False`, `(Joe, nice friend, Anne)` will be a potential relation for \"Joe finds Anne nice and is her friend.\" Note that `nice` is on the left of `friends`. | `True`        |\n| `frequency`         | How many times a relation should appear in the text corpus not to be eliminated by the algorithm.                                                                                                                                        | 1             |\n| `relation_length`   | How long the relation can be. If the number is higher, better results may arise, but the algorithm will take more time to converge.                                                                                                      | 8             |\n| `resolve_reference` | Resolve noun and pronoun references. For example, replace \"he\" and \"she\" from the triple head or tail by the noun they refer to.                                                                                                         | `True`        |\n| `device`            | PyTorch device to execute the language model. If set to `None`, the model is executed on CPU.                                                                                                                                            | `None`        |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasste%2Fhavina","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucasste%2Fhavina","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasste%2Fhavina/lists"}