{"id":13520315,"url":"https://github.com/jware-solutions/ggca","last_synced_at":"2025-03-31T16:31:12.167Z","repository":{"id":57633690,"uuid":"319796319","full_name":"jware-solutions/ggca","owner":"jware-solutions","description":"Blazing fast Gene/GEM Correlation Analysis for Rust and Python","archived":false,"fork":false,"pushed_at":"2024-09-12T15:02:58.000Z","size":12154,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-09-13T03:25:24.951Z","etag":null,"topics":["bioinformatic","correlation","correlation-analysis","gene-expression","gene-expression-modulation","python","rust"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/ggca","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jware-solutions.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-09T00:25:33.000Z","updated_at":"2024-09-12T15:03:10.000Z","dependencies_parsed_at":"2024-08-08T20:35:35.954Z","dependency_job_id":"d1073438-1fe4-48c3-9f04-b53e4b54b9d1","html_url":"https://github.com/jware-solutions/ggca","commit_stats":{"total_commits":104,"total_committers":1,"mean_commits":104.0,"dds":0.0,"last_synced_commit":"80660a54bab8f8b1b4ea59a45f3408ebf82bd0fd"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jware-solutions%2Fggca","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jware-solutions%2Fggca/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jware-solutions%2Fggca/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jware-solutions%2Fggca/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jware-solutions","download_url":"https://codeload.github.com/jware-solutions/ggca/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222670691,"owners_count":17020513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatic","correlation","correlation-analysis","gene-expression","gene-expression-modulation","python","rust"],"created_at":"2024-08-01T05:02:17.478Z","updated_at":"2024-11-02T03:31:14.357Z","avatar_url":"https://github.com/jware-solutions.png","language":"Rust","readme":"# Gene GEM Correlation Analysis (GGCA)\n\n[![CI](https://github.com/jware-solutions/ggca/actions/workflows/ci.yml/badge.svg)](https://github.com/jware-solutions/ggca/actions/workflows/ci.yml)\n\nComputes efficiently the correlation (Pearson, Spearman or Kendall) and the p-value (two-sided) between all the pairs from two datasets. It also supports [CpG Site IDs][cpg-site].\n\n**IMPORTANT**: GGCA is the heart of a platform called Multiomix. On the official website you will be able to use this library in a fast and agile way through a friendly graphical interface (along with many extra features!). Go to https://multiomix.org/ to get started now!\n\n[Python PyPi][pypi-site] | [Rust Crate][crate-site]\n\n\n## Index\n\n- [Requirements](#requirements)\n- [Usage](#usage)\n\t- [Python](#python)\n\t- [Rust](#rust)\n- [Contributing](#contributing)\n- [Considerations](#considerations)\n\n\n## Usage\n\nThere are a few examples in `examples` folder for both languages.\n\n\n### Python\n\n1. Install: `pip install ggca`\n1. Configure and call the `correlate` method:\n\n```python\nimport ggca\n\n\nmrna_file_path = \"mrna.csv\"\ngem_file_path = \"mirna.csv\"\n\ntry:\n\t(result_combinations, evaluated_combinations) = ggca.correlate(\n\t\tmrna_file_path,\n\t\tgem_file_path,\n\t\tcorrelation_method=ggca.CorrelationMethod.Pearson,\n\t\tcorrelation_threshold=0.5,\n\t\tsort_buf_size=2_000_000,\n\t\tadjustment_method=ggca.AdjustmentMethod.BenjaminiHochberg,\n\t\tall_vs_all=True,\n\t\tgem_contains_cpg=False,\n\t\tcollect_gem_dataset=None,\n\t\tkeep_top_n=2  # Keeps only top 2 elements\n\t)\n\n\tprint(f'Number of resulting combinations: {len(result_combinations)} of {evaluated_combinations} evaluated combinations')\n\tfor combination in result_combinations:\n\t\tprint(\n\t\t\tcombination.gene,\n\t\t\tcombination.gem,\n\t\t\tcombination.correlation,\n\t\t\tcombination.p_value,\n\t\t\tcombination.adjusted_p_value\n\t\t)\nexcept ggca.GGCADiffSamplesLength as ex:\n\tprint('Raised GGCADiffSamplesLength:', ex)\nexcept ggca.GGCADiffSamples as ex:\n\tprint('Raised GGCADiffSamples:', ex)\nexcept ggca.InvalidCorrelationMethod as ex:\n\tprint('Raised InvalidCorrelationMethod:', ex)\nexcept ggca.InvalidAdjustmentMethod as ex:\n\tprint('Raised InvalidAdjustmentMethod:', ex)\nexcept ggca.GGCAError as ex:\n\tprint('Raised GGCAError:', ex)\n```\n\n\n### Rust\n\n1. Add crate to `Cargo.toml`: `ggca = { version = \"1.0.1\", default-features = false  }`\n1. Create an analysis and run it:\n\n```rust\nuse ggca::adjustment::AdjustmentMethod;\nuse ggca::analysis::Analysis;\nuse ggca::correlation::CorrelationMethod;\n\n// File's paths\nlet df1_path = \"mrna.csv\";\nlet df2_path = \"mirna.csv\";\n\n// Some parameters\nlet gem_contains_cpg = false;\nlet is_all_vs_all = true;\nlet keep_top_n = Some(10); // Keeps the top 10 of correlation (sorting by abs values)\nlet collect_gem_dataset = None; // Better performance. Keep small GEM files in memory\n\nlet analysis = Analysis::new_from_files(df1_path.to_string(), df2_path.to_string(), false);\nlet (result, number_of_elements_evaluated) = analysis.compute(\n\tCorrelationMethod::Pearson,\n\t0.7,\n\t2_000_000,\n\tAdjustmentMethod::BenjaminiHochberg,\n\tis_all_vs_all,\n\tcollect_gem_dataset,\n\tkeep_top_n,\n)?;\n\nprintln!(\"Number of elements -\u003e {} of {} combinations evaluated\", result.len(), number_of_elements_evaluated);\n\nfor cor_p_value in result.iter() {\n\tprintln!(\"{}\", cor_p_value);\n}\n```\n\nNote that [env_logger][env-logger] crate is used to provide some warning in case some mRNA/GEM combinations produce NaN values (for instance, because the input array has 0 std). In that case, you can add RUST_LOG=warn to your command to produce warnings in the stderr. E.g:\n\n`RUST_LOG=warn cargo test --tests`\n\nor \n\n`RUST_LOG=warn cargo run --example basic`\n\n\n### Development and contributions\n\nAll kind of help is welcome! Feel free o submit an issue or a PR.\n\n- Build for rust: `cargo build [--release]` or run an example in the `examples` folder with `cargo run --example [name of the example]`\n- Build and run in Python: run `cargo build [--release]` and follow [the official instructions][pyo3-python-import] to import the compiled library in your Python script.\n- Build for Python (uses Maturin) and it's generated by CI [maturin-actions][maturin-actions]\n\n\n### Tests\n\nAll the correlation, p-values and adjusted p-values were taken from [cor.test][r-cor-test] and [p.adjust][r-p-adjust] functions from the R programming language and [statsmodels][statsmodels] package for Python language.\n\nData in `small_files` folder was retrieved with random sampling from the *Colorectal Adenocarcinoma (TCGA, Nature 2012)* dataset. This dataset can be downloaded from [cBioPortal datasets page][cbioportal-datasets-page] or [this direct link][colorectal-dataset].\n\nAll the correlations results were compared directly with R-Multiomics output (old version of [multiomix.org][multiomix] only available for R lang).\n\n\n### Performance\n\nWe use [criterion.rs][criterion] to perform benchmarks. In case you have made a contribution you can check that no regression was added to the project. Just run `cargo bench` before and after your changes to perform a statistical analysis of performance.\n\n\n## Troubleshooting\n\n### Undefined References During Compilation (Ubuntu)\n\nIf you encounter errors related to undefined references when compiling the project on Ubuntu, such as:\n\n```\nundefined reference to `_Py_Dealloc`\nundefined reference to `PyGILState_Release`\nundefined reference to `PyUnicode_AsUTF8AndSize`\n```\n\nOr linking errors like:\n\n\u003e error: linking with `cc` failed: exit status: 1\\\n\u003e ...\\\n\u003e = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified\\\n\u003e = note: use the `-l` flag to specify native libraries to link\\\n\u003e = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)\n\nThis typically happens because the necessary Python development libraries are either not installed or the linker is unable to find them. Below are steps to resolve this issue.\n\n### Steps to Resolve:\n\n1. **Install Python Development Libraries**:\n   Ensure that the Python development headers and libraries are installed. You can install them using the following command:\n   \n   ```bash\n   sudo apt-get install python3-dev\n   ```\n\n   This will provide the necessary files for linking Rust with Python.\n\n2. **Ensure Correct Python Version**:\n   Make sure you are using the correct version of Python that matches the libraries you are linking against. You can check your Python version by running:\n\n   ```bash\n   python3 --version\n   ```\n\n   If you have multiple Python versions installed, ensure that your environment is set up to use the correct one. You can do this using virtual environments or by explicitly setting the `PYTHON_PATH` and `LD_LIBRARY_PATH` environment variables.\n\n3. **Set Up Correct Linker Flags**:\n   If you are still encountering issues, ensure that the linker is able to find the necessary libraries. Add the following configuration to your `Cargo.toml`:\n\n   ```toml\n   [dependencies]\n   pyo3 = { version = \"0.15\", features = [\"extension-module\"] }\n\n   [build]\n   rustflags = [\"-L\", \"/usr/lib/python3.x/config-x.x-x86_64-linux-gnu\", \"-lpython3.x\"]\n   ```\n\n   Replace `3.x` with the version of Python you're using, e.g., `3.8` for Python 3.8.\n\n   Or just set the environment variables before compiling:\n\n   ```bash\n   export RUSTFLAGS=\"-L /usr/lib/python3.x/config-3.x-x86_64-linux-gnu -lpython3.x\"\n   ```\n\n4. **Verify Library Paths**:\n   Ensure that the paths to the Python libraries are correctly set up. You can export these variables before compiling:\n\n   ```bash\n   export LD_LIBRARY_PATH=/usr/lib/python3.x/config-x.x-x86_64-linux-gnu:$LD_LIBRARY_PATH\n   export LIBRARY_PATH=/usr/lib/python3.x/config-x.x-x86_64-linux-gnu:$LIBRARY_PATH\n   ```\n\n   This will help the linker locate the necessary Python libraries.\n\nBy following these steps, the undefined reference errors should be resolved, and your compilation should complete successfully. If the issue persists, consider checking your system’s Python installation and ensuring all dependencies are properly installed.\n\n\n## Considerations\n\nIf you use any part of our code, or the tool itself is useful for your research, please consider citing:\n\n```\n@article{camele2022multiomix,\n  title={Multiomix: a cloud-based platform to infer cancer genomic and epigenomic events associated with gene expression modulation},\n  author={Camele, Genaro and Menazzi, Sebastian and Chanfreau, Hern{\\'a}n and Marraco, Agustin and Hasperu{\\'e}, Waldo and Butti, Matias D and Abba, Martin C},\n  journal={Bioinformatics},\n  volume={38},\n  number={3},\n  pages={866--868},\n  year={2022},\n  publisher={Oxford University Press}\n}\n```\n\n\n[pypi-site]: https://pypi.org/project/ggca/\n[crate-site]: https://crates.io/crates/ggca\n[cpg-site]: https://en.wikipedia.org/wiki/CpG_site\n[pyo3-issue]: https://github.com/PyO3/pyo3/issues/1084\n[r-cor-test]: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/cor.test\n[r-p-adjust]: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust\n[statsmodels]: https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html\n[cbioportal-datasets-page]: https://www.cbioportal.org/datasets\n[colorectal-dataset]: https://cbioportal-datahub.s3.amazonaws.com/coadread_tcga_pub.tar.gz\n[multiomix]: https://www.multiomix.org\n[env-logger]: https://docs.rs/env_logger/latest/env_logger/\n[maturin-actions]: https://github.com/PyO3/maturin-action\n[criterion]: https://github.com/bheisler/criterion.rs\n[pyo3-python-import]: https://pyo3.rs/v0.22.1/building-and-distribution#manual-builds\n","funding_links":[],"categories":["Rust"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjware-solutions%2Fggca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjware-solutions%2Fggca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjware-solutions%2Fggca/lists"}