{"id":22622859,"url":"https://github.com/msrocka/autoprox","last_synced_at":"2025-08-24T22:06:23.858Z","repository":{"id":81680444,"uuid":"221247544","full_name":"msrocka/autoprox","owner":"msrocka","description":"generates bridge processes in openLCA","archived":false,"fork":false,"pushed_at":"2019-11-14T07:22:44.000Z","size":149,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-03T12:50:50.581Z","etag":null,"topics":["life-cycle-assessment","openlca","tooling"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msrocka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-11-12T15:16:15.000Z","updated_at":"2022-01-10T09:33:30.000Z","dependencies_parsed_at":"2023-04-28T14:48:35.289Z","dependency_job_id":null,"html_url":"https://github.com/msrocka/autoprox","commit_stats":{"total_commits":19,"total_committers":1,"mean_commits":19.0,"dds":0.0,"last_synced_commit":"c4268998b8456911b62a881ac729bacc2227b2f7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msrocka%2Fautoprox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msrocka%2Fautoprox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msrocka%2Fautoprox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msrocka%2Fautoprox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msrocka","download_url":"https://codeload.github.com/msrocka/autoprox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246131242,"owners_count":20728299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["life-cycle-assessment","openlca","tooling"],"created_at":"2024-12-08T23:19:04.795Z","updated_at":"2025-03-29T02:43:06.445Z","avatar_url":"https://github.com/msrocka.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# autoprox\n`autoprox` automatically generates bridge processes as described in\n[Ingwersen et al. 2018](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463304/)\ndirectly in [openLCA](https://github.com/GreenDelta/olca-app). \nFor a process `p` in a database with a set of background processes `Q`,\n`autoprox` generates a set of bridge processes `B` that connect the product\ninputs and waste outputs of `p` with corresponding product outputs and waste\ninputs provided by the processes in `Q`. This is done by a\n[Generator](src/main/kotlin/autoprox/Generator.kt) that takes the ID of the\nprocess `p` and [Matcher](src/main/kotlin/autoprox/Matcher.kt) `M` as input.\nFor a product input or waste output `fp` of `p` that does not yet have a\nprovider process in `Q`, the matcher `M` generates a set of flow-score pairs\nfor the product outputs and waste inputs `fq` with a provider process in `Q`:\n\n```\nM: fp -\u003e {(fq, sq) | fq in Q, sq in [0, 1]} \n```\n\nThe generator selects then the top matching flows of `fq` with the following\nrule where `epsilon` can be configured:\n\n```\nabs(1.0 - (sq_i / max(sq))) \u003c= epsilon\n```\n\nA bridge process `b` is then generated that has a corresponding exchange for\neach of these matching product outputs or waste inputs. The quantitative\nreference of `b` is set to one unit of `fp` and the amount of a matching flow\n`fq_i` is set to:\n\n```\nsq_i^2 / (sum(sq) * max(sq))\n```\n\nOnly flows are currently selected that have the same reference flow property\nas `fp` so that every amount in `b` has the same unit. The name of `b` is\nset to the name of the reference flow with a `_bridge:` prefix and all processes\nof `B` are stored in the `_bridge` category so that it is easy to identify\n(and delete) them:\n\n![](images/the_bridge_category.png)\n\nFor `p`, it should be then possible to create a product system that uses the\ngenerated bridge processes `B` to connect `p` with `Q`:\n\n![](images/product_system_of_p.png)\n\n## Implemented matchers\n\n### The `BigramsDiceMatcher`\nThis matcher extracts the [bigrams](https://en.wikipedia.org/wiki/Bigram) from\nthe words of the names of the flows that are compared and computes the\n[Sørensen–Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)\nof these sets of bigrams. It is fast and simple and gives good results for\nflow names that are relatively specific:\n\n![](images/asphalt_dice.png)\n\nHowever, flow names in LCA names often contain terms like `at plant` or\n`production mix` that will lead to imprecise results using this matcher\nwithout a filter:\n\n![](images/concrete_dice.png)\n\n\n### The `InfoContentMatcher`\nThe `InfoContentMatcher` computes the information content `I(w)` of a word `w`\nas:\n\n```\nI(w) = |w| * e^(-alpha * freq(w))\n```\n\n`|w|` is the number of characters of `w` and `freq(w)` the absolute frequency of\n`w` in the flow names of `fq`. With this, long words that are less frequent get\na higher weight than terms like `at plant` when calculating the similarity\nbetween two flow names. This fixes the `concrete` example above:\n\n![](images/concrete_info.png)\n\nHowever, words that have a high information content can describe completely\ndifferent things:\n\n![](images/shaker_screen_info.png)\n\n### The `WordNetPathMatcher`\nThis matcher calculates the similarities between flow names based on the\ninformation content of the contained words as described above and a semantic\nsimilarity score that is calculated as the path distance between two words\nin [WordNet](https://wordnet.princeton.edu). It uses the\n[WS4j](https://code.google.com/archive/p/ws4j) API to calculate this\ndistance. The WordNet database that comes with WS4j is maybe a bit\noutdated. Also, technical terms that are common in LCA databases are often not\npresent in WordNet. This is why this matcher currently does not give much\nbetter results than the `InfoContentMatcher`. However, combining lexical\nmatching, corpus statistics, and semantic similarities could in principal\ngive good results (see e.g. [this paper](https://arxiv.org/pdf/1802.05667.pdf)). \n\n\n## Running / building from source\nThe easiest way to run this project is to load it into a current version\nof [IntelliJ IDEA](https://www.jetbrains.com/idea/) (e.g. the open source\ncommunity version). Adopt the process ID of `p` and the databases path\nin the [main function](src/main/kotlin/autoprox/Main.kt) and run it. In order\nto use the `WordNetPathMatcher` you need to setup the WS4j database as described\nbelow.\n\n### WS4j\nWS4j is an archived Google Code project and a bit complicated to set up (see\nbelow) and is compatible with a relative old version of WordNet. An alternative\ncould be [JWI](http://projects.csail.mit.edu/jwi/) which supports to load\na current WordNet database from a folder (just download and extract the\n[WordNet database files](http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz)\nto that folder):\n\n```kotlin\nval wordNetPath = \"C:/Users/ms/Downloads/WNdb-3.0/dict\"\nval dict = RAMDictionary(File(wordNetPath), ILoadPolicy.NO_LOAD)\ndict.open()\nval idxWord = dict.getIndexWord(\"asphalt\", POS.NOUN)\nif (idxWord != null) {\n    val word = dict.getWord(idxWord.wordIDs[0])\n    val relSynsets = word.synset.relatedSynsets\n    ...\n}\n```\n\nHowever, WS4j provides a lot of features and\n[algorithms](http://ws4jdemo.appspot.com) that can be used easily while JWI\nprovides a more low level API (but with a nice\n[tutorial](http://projects.csail.mit.edu/jwi/download.php?f=edu.mit.jwi_2.4.0_manual.pdf)).\n\nWS4j is an archived project\non [Google Code](https://code.google.com/archive/p/ws4j) but there is also a\n[Github clone available](https://github.com/Sciss/ws4j) which seems to be the\nversion that is published in the Maven central repository. In order to run WS4j,\nyou need to put the configuration files\n[jawjaw.conf and similarity.conf](https://github.com/Sciss/ws4j/tree/master/config)\nand the database file `wnjpn.db` into the class-path. The `wnjpn.db` file can\nbe extracted from the distribution packages from the\n[WS4j Google Code download pages](https://code.google.com/archive/p/ws4j/downloads).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsrocka%2Fautoprox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsrocka%2Fautoprox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsrocka%2Fautoprox/lists"}