{"id":23632214,"url":"https://github.com/harmonydata/harmony_original","last_synced_at":"2026-04-13T22:34:09.338Z","repository":{"id":174040219,"uuid":"550946721","full_name":"harmonydata/harmony_original","owner":"harmonydata","description":"The Harmony project","archived":false,"fork":false,"pushed_at":"2023-06-09T20:52:24.000Z","size":2906,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-19T00:38:10.228Z","etag":null,"topics":["ai","data-science","data-visualization","harmonisation","harmonization","machine-learning","mentalhealth","multilingual","multilingual-nlp","natural-language-processing","natural-language-understanding","naturallanguageprocessing","nlp","psychology","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://harmonydata.org/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harmonydata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-13T15:26:57.000Z","updated_at":"2023-06-09T20:17:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"22804649-43b1-4b85-9822-5c2d2b1f04cc","html_url":"https://github.com/harmonydata/harmony_original","commit_stats":null,"previous_names":["harmonydata/harmony_original"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/harmonydata/harmony_original","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fharmony_original","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fharmony_original/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fharmony_original/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fharmony_original/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harmonydata","download_url":"https://codeload.github.com/harmonydata/harmony_original/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmonydata%2Fharmony_original/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31774073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T20:17:16.280Z","status":"ssl_error","status_checked_at":"2026-04-13T20:17:08.216Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","data-science","data-visualization","harmonisation","harmonization","machine-learning","mentalhealth","multilingual","multilingual-nlp","natural-language-processing","natural-language-understanding","naturallanguageprocessing","nlp","psychology","python","scikit-learn"],"created_at":"2024-12-28T03:28:00.140Z","updated_at":"2026-04-13T22:34:09.320Z","avatar_url":"https://github.com/harmonydata.png","language":"Jupyter Notebook","readme":"# Harmony version 0.1.0\n\n\u003c!-- badges: start --\u003e\n![my badge](https://badgen.net/badge/Status/In%20Development/orange)\n\u003c!-- badges: end --\u003e\n\n*A second version of Harmony is in development as an API at https://github.com/harmonydata/harmony*\n\nHarmony is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies. Harmony is a collaboration project between the University of Ulster, University College London, the Universidade Federal de Santa Maria in Brazil, and Fast Data Science Ltd.\n\nYou can read more at https://harmonydata.org.\n\nThere is a live demo at: https://app.harmonydata.org/\n\n![Screenshot](images/screenshot1.png)\n\nThis front end is based on the Dash Food Footprint demo: https://dash.gallery/dash-food-footprint/\n\nRuns on Dash interactive Python framework developed by [Plotly](https://plot.ly/). \n\nDeveloped by Thomas Wood / Fast Data Science\nthomas@fastdatascience.com\n\nThis tool is written in Python using the Dash front end library and the Java library Tika for reading PDFs, and runs on Linux, Mac, and Windows, and can be deployed as a web app using Docker.\n\n# How does Harmony work in layman's terms?\n\nHarmony compares questions from different instruments by converting them to a vector representation and calculating their similarity. You can read more at https://harmonydata.org/how-does-harmony-work/ \n\n# FAIR data schema\n\nWe have defined a data schema in accordance with the [FAIR principles](https://harmonydata.org/fair-data/).\n\nQuestionnaires are represented within Harmony in a tabular format.\n\nThe file name is the unique identifier of a questionnaire, e.g. `GAD-7 English.csv`.\n\nFiles are tab-separated with the following columns:\n\n* Question No: Alphanumeric, the question ID from the original questionnaire.\n* Question: The text of the question\n* Options: Any options or Likert scale such as \"very often\", \"more than usual\", etc\n\n# Very quick guide to running the tool on your computer\n\n1. Install [Docker](https://docs.docker.com/get-docker/).\n2. Open a command line or Terminal window. Change folder to where you downloaded and unzipped the repository, and go to the folder `front_end`.  Run the following command:\n```\ndocker build -t harmony\ndocker run harmony\n```\n5. Open your browser at `https://localhost:80`. You will see the web app running.\n\n# Deploying the tool to Azure using the Azure Command Line Interface via Azure Container Registry\n\nIn command line, if you have installed Azure CLI, log into both the Azure Portal and Azure Container Registry:\n\n```\naz login\naz acr login --name regprotocolsfds\n```\n\nIf the admin user is not yet enabled, you can use the command:\n```\naz acr update -n regprotocolsfds --admin-enabled true\n```\n\nRun this script:\n\n```\n./build_deploy.sh\n```\n\n## Developer's guide: Running the tool on your computer in Python and without using Docker\n\n### Architecture\n\n![Tool architecture](images/harmony_architecture.png)\n\n### Downloading PDF data\n\ncd into data/raw_pdf and run `download_raw_pdfs.sh`.\n\n### Installing requirements\n\nDownload and install Java if you don't have it already. Download and install Apache Tika and run it on your computer https://tika.apache.org/download.html\n\n```\njava -jar tika-server-standard-2.3.0.jar\n```\n\n(the version number of your Jar file name may differ.)\n\nInstall everything in `requirements.txt`:\n\n```\npip install -r requirements.txt\n```\n\n### Running the front end app locally\n\nGo into `front_end` and run\n\n```\npython application.py\n```\n\nYou can then open your browser at `localhost:8050` and you will see the tool.\n\n## Built With\n\n- [Dash](https://dash.plot.ly/) - Main server and interactive components\n- [Plotly Python](https://plot.ly/python/) - Used to create the interactive plots\n- [Docker](https://docs.docker.com/) - Used for deployment to the web\n- [Apache Tika](https://tika.apache.org/) - Used for parsing PDFs to text\n- [spaCy](https://spacy.io/) - Used for NLP analysis\n- [NLTK](https://www.nltk.org/) - Used for NLP analysis\n- [Scikit-Learn](https://scikit-learn.org/) - Used for machine learning\n\n## Licences of Third Party Software\n\n- Apache Tika: [Apache 2.0 License](https://tika.apache.org/license.html)\n- spaCy: [MIT License](https://github.com/explosion/spaCy/blob/master/LICENSE)\n- NLTK: [Apache 2.0 License](https://github.com/nltk/nltk/blob/develop/LICENSE.txt)\n- Scikit-Learn: [BSD 3-Clause](https://github.com/scikit-learn/scikit-learn/blob/main/COPYING)\n\n## References\n\n* Deploying a Dash webapp via Docker to Azure: https://medium.com/swlh/deploy-a-dash-application-in-azure-using-docker-ed46c4b9d2b2\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmonydata%2Fharmony_original","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharmonydata%2Fharmony_original","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmonydata%2Fharmony_original/lists"}