{"id":23632198,"url":"https://github.com/harmonydata/matching","last_synced_at":"2026-01-24T13:36:45.978Z","repository":{"id":227853246,"uuid":"772094287","full_name":"harmonydata/matching","owner":"harmonydata","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-25T21:17:12.000Z","size":6853,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-25T21:21:45.574Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harmonydata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-14T14:24:04.000Z","updated_at":"2025-01-25T21:17:16.000Z","dependencies_parsed_at":"2024-04-25T14:57:02.029Z","dependency_job_id":"1ddb80fa-26f9-428c-8200-69b93aa04871","html_url":"https://github.com/harmonydata/matching","commit_stats":null,"previous_names":["harmonydata/matching"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fmatching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fmatching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fmatching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fmatching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harmonydata","download_url":"https://codeload.github.com/harmonydata/matching/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239558913,"owners_count":19658927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-28T03:27:59.522Z","updated_at":"2025-11-08T14:30:36.166Z","avatar_url":"https://github.com/harmonydata.png","language":"Jupyter Notebook","readme":"# Exploring ways to improve Harmony's matching\n\nI took cosine vs correlations dataset by McElroy et al and tried changing Harmony's model to see what improves the R² value.\n\n1. For original Harmony we get R² = 0.25\n2. OpenAI's Ada 2 model gives us R² = 0.38\n3. OpenAI's Ada 3 model gives us R² = 0.34\n\nThe scripts for evaluating different variants of Harmony against the cosine correlations dataset and other datasets are in this repo.\n\nSo how can we improve the matching? I guess the easiest way would be to switch this underlying model, rather than trying anything clever.\n\nI explored what kind of questions led to Harmony deviating from the correlations in the data. For example it looks like these two correlate highly but Harmony gives a low score, as expected:\n\n_Have had difficulty concentrating? =  Had feelings of worthlessness or guilt?_\n\nHere are the words that are very common for Harmony's false positives and false negatives. Harmony is prone to thinking things are similar because of words like \"sleep\", and misses matches containing \"worthlessness\", \"guilt\", etc.\n\n```\nStrongest predictors for class 0 harmony false positive\n\n0       sleeping\n1       asleep\n2       falling\n3       staying\n4       upsetting\n5       clearly\n6       dreams\n7       part\n8       replay\n9       related\n10      alert\n11      watchful\n12      super\n13      guard\n14      moving\n15      come\n16      into\nStrongest predictors for class 1 harmony missed\n\n0       reading\n1       newspaper\n2       such\n3       television\n4       watching\n5       normal\n6       from\n7       less\n8       worthlessness\n9       guilt\n10      relaxing\n11      concentrating\n12      doing\n13      worried\n14      interest\n15      pleasure\n16      reduced\n```\n\nMaybe the solution is to have an option to select the model in the UI. The current model runs on Ulster's servers, but OpenAI would need an API key. We don't have so many users so it would probably only cost a few pounds a month, but I'm not sure how it works with Wellcome's conditions.\n\n![mockup](mockup.png)\n\nHere are some false positives and false negatives coming out of Harmony.\n\n\n```\nCORRELATIONS WHICH HARMONY DIDN'T PREDICT (FALSE NEGATIVES)\n\nExperienced less interest or pleasure from normal activities for most of the day? \n        =  Felt hopeless?\nLittle interest or pleasure in doing things? \n        =  Feeling down, depressed, or hopeless?\nFelt down or depressed for most of the day? \n        =  Experienced less interest or pleasure from normal activities for most of the day?\nWorried a lot about different things? \n        =  Felt physically tense or agitated?\nHad feelings of worthlessness or guilt? \n        =  Felt hopeless?\nHave had difficulty concentrating? \n        =  Had feelings of worthlessness or guilt?\nExperienced less interest or pleasure from normal activities for most of the day? \n        =  Have had difficulty concentrating?\nHad difficulty concentrating? \n        =  Been easily annoyed by different things?\nFelt nervous or anxious? \n        =  Worried a lot about different things?\nFelt “on edge”? \n        =  Had difficulty concentrating?\n\nCORRELATIONS WHICH HARMONY INCORRECTLY PREDICTED (FALSE POSITIVES)\n\nFeeling tired or having little energy? \n        =  Having upsetting dreams that replay part of the experience or are clearly related to the experience? \nFeeling tired or having little energy? \n        =  Being so restless that it is hard to sit still?\nExperienced sleep disturbances? \n        =  Having upsetting dreams that replay part of the experience or are clearly related to the experience? \nTrouble falling or staying asleep, or sleeping too much? \n        =  Being so restless that it is hard to sit still?\nTrouble falling or staying asleep, or sleeping too much? \n        =  Having upsetting dreams that replay part of the experience or are clearly related to the experience? \nFeeling tired or having little energy? \n        =  Being “super-alert”, watchful, or on guard?\nHad recurrent thoughts of death or suicide? \n        =  Trouble falling or staying asleep, or sleeping too much?\nTrouble falling or staying asleep, or sleeping too much? \n        =  Moving or speaking so slowly that other people could have noticed? Or the opposite - being so fidgety or restless that you have been moving around a lot more than usual?\nTrouble falling or staying asleep, or sleeping too much? \n        =  Thoughts that you would be better off dead, or of hurting yourself in some way?\nTrouble falling or staying asleep, or sleeping too much? \n        =  Being “super-alert”, watchful, or on guard?\n\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmonydata%2Fmatching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharmonydata%2Fmatching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmonydata%2Fmatching/lists"}