{"id":17223301,"url":"https://github.com/slowkow/harmonypy","last_synced_at":"2026-04-24T22:01:21.451Z","repository":{"id":37978819,"uuid":"229105533","full_name":"slowkow/harmonypy","owner":"slowkow","description":"🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.","archived":false,"fork":false,"pushed_at":"2026-04-10T19:18:55.000Z","size":140681,"stargazers_count":263,"open_issues_count":2,"forks_count":27,"subscribers_count":4,"default_branch":"master","last_synced_at":"2026-04-10T20:26:31.267Z","etag":null,"topics":["bioinformatics","data-integration","data-science","single-cell-analysis"],"latest_commit_sha":null,"homepage":"https://portals.broadinstitute.org/harmony/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slowkow.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-12-19T17:25:59.000Z","updated_at":"2026-04-04T16:25:47.000Z","dependencies_parsed_at":"2023-02-18T14:40:15.084Z","dependency_job_id":"0c2544e4-9b9d-4aaa-870f-98fa420bbd27","html_url":"https://github.com/slowkow/harmonypy","commit_stats":{"total_commits":64,"total_committers":4,"mean_commits":16.0,"dds":0.140625,"last_synced_commit":"182a5c6e0fc954cb0b4db5074e507adb7a8293f3"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/slowkow/harmonypy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowkow%2Fharmonypy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowkow%2Fharmonypy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowkow%2Fharmonypy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowkow%2Fharmonypy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slowkow","download_url":"https://codeload.github.com/slowkow/harmonypy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowkow%2Fharmonypy/sbom","scorecard":{"id":832035,"data":{"date":"2025-08-11","repo":{"name":"github.com/slowkow/harmonypy","commit":"6daf80d794e65b0a4da2ccba8369b5eb4b3f5ca4"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Code-Review","score":1,"reason":"Found 3/24 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/python-package.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-package.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/slowkow/harmonypy/python-package.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-package.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/slowkow/harmonypy/python-package.yml/master?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/python-package.yml:29","Warn: pipCommand not pinned by hash: .github/workflows/python-package.yml:30","Warn: pipCommand not pinned by hash: .github/workflows/python-package.yml:32","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-23T17:59:46.063Z","repository_id":37978819,"created_at":"2025-08-23T17:59:46.063Z","updated_at":"2025-08-23T17:59:46.063Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32242315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","data-integration","data-science","single-cell-analysis"],"created_at":"2024-10-15T04:08:03.274Z","updated_at":"2026-04-24T22:01:21.438Z","avatar_url":"https://github.com/slowkow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# harmonypy\n\n[![PyPI][pb]][pypi] [![Downloads][db]][pypi] [![Tests][gb]][yml] [![DOI][zb]][zen]\n\n[pb]: https://img.shields.io/pypi/v/harmonypy.svg\n[pypi]: https://pypi.org/project/harmonypy/\n[db]: https://img.shields.io/pypi/dm/harmonypy?label=downloads\n[gb]: https://github.com/slowkow/harmonypy/actions/workflows/python-package.yml/badge.svg\n[yml]: https://github.com/slowkow/harmonypy/actions/workflows/python-package.yml\n[zb]: https://img.shields.io/badge/DOI-10.5281/zenodo.4531400-blue\n[zen]: https://doi.org/10.5281/zenodo.4531400\n\n**harmonypy** is a Python package for the [Harmony] algorithm for integrating multiple high-dimensional datasets. It uses a C++ backend (Armadillo) for fast linear algebra, matching the [R harmony2 package][Harmony] step-by-step.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/018f82a7-ebb2-47a7-a340-dc9427c51b50\"\u003e\n\u003c/p\u003e\n\nThis animation shows Harmony aligning three single-cell RNA-seq datasets from different donors. [→ How to make this animation](https://slowkow.com/notes/harmony-animation/). Before Harmony, you can clearly distinguish cells from each of the three donors. After Harmony, the cells from different donors are mixed while preserving the overall shape of the data.\n\n\n## Installation\n\nInstall from PyPI (pre-built wheels for Linux and macOS):\n\n```bash\npip install harmonypy\n```\n\n### Building from source\n\nBuilding from source requires a C++ compiler, CMake, and a BLAS library:\n\n**macOS** (uses Apple Accelerate, no extra dependencies):\n\n```bash\npip install .\n```\n\n**Linux** (requires OpenBLAS):\n\n```bash\n# Debian/Ubuntu\nsudo apt install libopenblas-dev cmake\n\n# RHEL/Fedora\nsudo dnf install openblas-devel cmake\n\npip install .\n```\n\n\n## Quick Start\n\n```python\nimport harmonypy as hm\nimport pandas as pd\n\n# Load the principal components and metadata\npcs = pd.read_csv(\"data/pbmc_3500_pcs.tsv.gz\", sep=\"\\t\")\nmeta = pd.read_csv(\"data/pbmc_3500_meta.tsv.gz\", sep=\"\\t\")\n\n# Run Harmony to correct for batch effects (donor)\nharmony_out = hm.run_harmony(pcs, meta, \"donor\")\n\n# Save corrected PCs (same shape as input)\nresult = pd.DataFrame(harmony_out.Z_corr, columns=pcs.columns)\nresult.to_csv(\"pbmc_3500_pcs_harmony.tsv\", sep=\"\\t\", index=False)\n```\n\n\n## Usage with Scanpy\n\n```python\nimport scanpy as sc\nimport harmonypy as hm\n\n# Load and preprocess your data\nadata = sc.read_h5ad(\"my_data.h5ad\")\nsc.pp.pca(adata)\n\n# Get PCs from the AnnData object\npcs = adata.obsm['X_pca']\nprint(pcs.shape)  # (n_cells, n_pcs)\n\n# Run Harmony on the PCA embedding\nharmony_out = hm.run_harmony(pcs, adata.obs, \"batch\")\n\n# Store corrected PCs back in the AnnData object\nadata.obsm['X_pca_harmony'] = harmony_out.Z_corr\n\n# Use harmonized PCs for downstream analysis\nsc.pp.neighbors(adata, use_rep='X_pca_harmony')\nsc.tl.umap(adata)\nsc.tl.leiden(adata)\n```\n\n\n## Parameters\n\n`run_harmony` accepts the same parameters as the R package:\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `theta` | 2 | Diversity penalty per batch variable |\n| `sigma` | 0.1 | Kernel bandwidth for soft clustering |\n| `nclust` | min(N/30, 100) | Number of clusters |\n| `max_iter_harmony` | 10 | Maximum Harmony iterations |\n| `max_iter_kmeans` | 4 | K-means iterations per Harmony round |\n| `epsilon_harmony` | 1e-2 | Convergence threshold |\n| `ncores` | 0 | BLAS threads (0 = all cores) |\n| `lamb` | None | Ridge penalty (None = auto-estimate) |\n\nThe `ncores` parameter controls BLAS threading (Accelerate on macOS, OpenBLAS on Linux). Default is 0 (use all available cores). Set `ncores=1` for single-threaded execution.\n\n\n## Performance\n\nThe script in `tests/test_harmony.py` on an Apple M1 (2022) chip reports:\n\n```\n  Dataset                    Time    RSS delta\n  ---------------------- -------- ------------\n  Small (3.5k cells)        0.23s     45.2 MB\n  Medium (69k cells)        4.76s    262.3 MB\n  Large (858k cells)       29.29s   1969.5 MB\n```\n\n\n## Citation\n\nIf you use Harmony in your work, please cite the original paper:\n\n\u003e Korsunsky, I., Millard, N., Fan, J. et al. **Fast, sensitive and accurate integration of single-cell data with Harmony.** *Nat Methods* 16, 1289–1296 (2019). https://doi.org/10.1038/s41592-019-0619-0\n\nThe [Supplementary Information PDF][supp] provides detailed mathematical descriptions and implementation notes.\n\nTo learn more about Harmony 2, please see the preprint here:\n\n\u003e Patikas, Nikolaos, Hongcheng Yao, Roopa Madhu, Soumya Raychaudhuri, Martin Hemberg, and Ilya Korsunsky. 2026. **Integration of Large, Complex Single-Cell Datasets with Harmony2.** *bioRxiv*. https://doi.org/10.64898/2026.03.16.711825\n\n[Harmony]: https://github.com/immunogenomics/harmony\n[supp]: https://static-content.springer.com/esm/art%3A10.1038%2Fs41592-019-0619-0/MediaObjects/41592_2019_619_MOESM1_ESM.pdf\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslowkow%2Fharmonypy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslowkow%2Fharmonypy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslowkow%2Fharmonypy/lists"}