{"id":29428679,"url":"https://github.com/urbslab/lcs-visualization-pipeline","last_synced_at":"2025-07-12T15:19:18.442Z","repository":{"id":39003958,"uuid":"271126592","full_name":"UrbsLab/LCS-Visualization-Pipeline","owner":"UrbsLab","description":"LCS Discovery and Visualization Environment (LCS-DIVE)","archived":false,"fork":false,"pushed_at":"2025-01-17T22:18:01.000Z","size":15023,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-10T23:16:56.115Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UrbsLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-09T22:55:42.000Z","updated_at":"2025-01-17T14:33:01.000Z","dependencies_parsed_at":"2024-12-30T13:37:21.378Z","dependency_job_id":null,"html_url":"https://github.com/UrbsLab/LCS-Visualization-Pipeline","commit_stats":{"total_commits":37,"total_committers":2,"mean_commits":18.5,"dds":"0.027027027027026973","last_synced_commit":"0a7455f14cd8a5c3300c4a84f5c164ce4f3fbcc7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/UrbsLab/LCS-Visualization-Pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UrbsLab%2FLCS-Visualization-Pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UrbsLab%2FLCS-Visualization-Pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UrbsLab%2FLCS-Visualization-Pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UrbsLab%2FLCS-Visualization-Pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UrbsLab","download_url":"https://codeload.github.com/UrbsLab/LCS-Visualization-Pipeline/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UrbsLab%2FLCS-Visualization-Pipeline/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265009519,"owners_count":23697191,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-12T15:19:13.685Z","updated_at":"2025-07-12T15:19:18.436Z","avatar_url":"https://github.com/UrbsLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LCS Discovery and Visualization Environment (LCS-DIVE)\n\n## Installation\nLCS-DIVE is written in Python 3. First, you need to download this repository to local. To run, you will also need to first install the LCS-DIVE Python package.\n\n```sh\ngit clone https://github.com/UrbsLab/LCS-Visualization-Pipeline\ncd LCS-Visualization-Pipeline\npip install -r requirements.txt\n```\n\nThere are 5 files that are runnable from the command line: **AnalysisPhase1.py**, **AnalysisPhase1_pretrained.py**, **AnalysisPhase1_fromstreamline.py**, **AnalysisPhase2.py**, and **NetworkVisualization.py** in the LCSDIVE Folder.\nYou can run them all from the LCSDIVE folder.\n\n## AnalysisPhase1.py\nThis file runs ExSTraCS training on your dataset, and is the first file to run on a new dataset. If you have already completed ExSTraCS training from some other pipeline, you should use **AnalysisPhase1_fromstreamline.py** if you have a STREAMLINE run of ExSTraCS or **AnalysisPhase1_pretrained.py** in you have some othe run of ExSTraCS instead. There exists a few command line arguments:\n\n| Argument | Description | Default |\n| ---------- | --------------------  | ---------- |\n| --d | file path to your dataset (can be txt, csv, or gz) | MANDATORY |\n| --o | file path to your output directory, where LCS-DIVE output files will be directed | MANDATORY |\n| --e | experiment name (anything alphanumeric) | MANDATORY |\n| --class | column name in dataset of outcome variable | Class |\n| --inst | column name in dataset of row id (leave out if N/A) | None |\n| --group | column name in dataset of group id (leave out if N/A) | None |\n| --match | column name in dataset of match id for matched CV (leave out if N/A) | None |\n| --cv | number of CV partitions during training | 3 |\n| --iter | number of ExSTraCS learning iterations | 16000 |\n| --N | maximum ExSTraCS micropopulation | 1000 |\n| --nu | ExStraCS hyperparameter | 1000 |\n| --at-method | feature tracking method | wh |\n| --rc | rule compaction method | None |\n| --random-state | random seed for fixed results | None |\n| --fssample | skrebate feature selection sample size | 1000 |\n| --cluster | setting to run LCS-DIVE Phase 1 on compute cluster | 0 for No Cluster, 1 for LSF, 2 for SLURM |\n| --m1 | cluster job soft memory limit (Gb) | 2 |\n| --m2 | cluster job hard memory limit (Gb) | 3 |\n\nMost of these arguments can be left alone in most cases, except for **--d**, **--o**, **--e**. Additionally, running on the computer cluster will not work until you configure the file's submitClusterJob method to run on your own cluster (it is currently run from a UPenn cluster). Hence, you will need to either configure the method, or set **--cluster** to 0 (runs it locally in serial). Running LCS-DIVE on a compute cluster is highly recommended, as it will speed things up significantly.\n\n**Sample Run Command**:\n```sh\npython AnalysisPhase1.py --d ../Datasets/demodata.csv --o ../Outputs --e lcs --class Class --inst InstanceID --iter 200000 --N 500 --nu 10\n```\n\n## AnalysisPhase1_pretrained.py\nThis file can be run instead of **AnalysisPhase1.py** if you already have presplit train/test datasets AND pretrained ExStraCS models. This file will reorganize those precreated files into a format consistent with the outputs of **AnalysisPhase1.py**, such that AnalysisPhase2.py can be run smoothly. There exists a few command line arguments:\n\n| Argument | Description | Default |\n| ---------- | --------------------  | ---------- |\n| --d | file path to your directory containing presplit train/test datasets ending with **\\_CV_Test/Train.csv** (e.g., dataset1_CV_Test.csv) | MANDATORY |\n| --m | file path to your model directory ontaining pretrained ExSTraCS Models labeled ExStraCS_CV (e.g., ExStraCS_0) | MANDATORY |\n| --o | file path to your output directory, where LCS-DIVE output files will be directed | MANDATORY |\n| --e | experiment name (anything alphanumeric) | MANDATORY |\n| --class | column name in dataset of outcome variable | Class |\n| --inst | column name in dataset of row id (leave out if N/A) | None |\n| --cv | number of CV partitions during training | 3 |\n| --random-state | random seed for fixed results | None |\n| --cluster | setting to run LCS-DIVE Phase 1 on compute cluster | 0 for No Cluster, 1 for LSF, 2 for SLURM |\n| --m1 | cluster job soft memory limit (Gb) | 2 |\n| --m2 | cluster job hard memory limit (Gb) | 3 |\n\nMost of these arguments can be left alone in most cases, except for **--d**, **--m**, **--o**, **--e**. Also, make sure **--class**, **--inst**, **--cv** are consistent with your datasets. Additionally, running on the computer cluster will not work until you configure the file's submitClusterJob method to run on your own cluster (it is currently run from a UPenn cluster). Hence, you will need to either configure the method, or set **--cluster** to 0 (runs it locally in serial). Running this LCS-DIVE file on a compute cluster is not very necessary, as it is very quick anyways.\n\n**Sample Run Command**:\n```sh\npython AnalysisPhase1_pretrained.py --d ../Datasets --m ../Models --o ../Outputs --e test --inst InstanceID --cluster 0\n```\n\n## AnalysisPhase1_fromstreamline.py\nThis file can be run instead of **AnalysisPhase1.py** if you already have run ExStraCS model through STREAMLINE. This file will reorganize those precreated files into a format consistent with the outputs of **AnalysisPhase1.py**, such that AnalysisPhase2.py can be run smoothly. There exists a few command line arguments:\n\n| Argument | Description | Default |\n| ---------- | --------------------  | ---------- |\n| --s | file path to your STREAMLINE output directory | MANDATORY |\n| --e | name of STREAMLINE experiment | MANDATORY |\n| --d | name of STREAMLINE dataset to run LCS-DIVE | MANDATORY |\n| --o | file path to your LCS-DIVE output directory | MANDATORY |\n| --cluster | setting to run LCS-DIVE Phase 1 on compute cluster | 0 for No Cluster, 1 for LSF, 2 for SLURM |\n| --m1 | cluster job soft memory limit (Gb) | 2 |\n| --m2 | cluster job hard memory limit (Gb) | 3 |\n\nMost of these arguments can be left alone in most cases, except for **--s**, **--e**, **--d**. Additionally, running on the computer cluster will not work until you configure the file's submitClusterJob method to run on your own cluster (it is currently run from a UPenn cluster). Hence, you will need to either configure the method, or set **--cluster** to 0 (runs it locally in serial). Running this LCS-DIVE file on a compute cluster is not very necessary, as it is very quick anyways.\n\n**Sample Run Command**:\n```sh\npython AnalysisPhase1_fromstreamline.py --s /home/bandheyh/STREAMLINE/lcs/ --e lcs --d demodata --o ../Outputs/ --cluster 0\n```\n\n## AnalysisPhase2.py\nThis file runs the visualization step of LCS-DIVE (Feature Tracking Visualization, Rule Population Visualization, Network Visualization). For a given experiment, this must be run after **AnalysisPhase1.py** or **AnalysisPhase1_pretrained.py**. There exists a few command line arguments:\n\n| Argument | Description | Default |\n| ---------- | --------------------  | ---------- |\n| --o | for a given experiment, must match that from Phase 1 | MANDATORY |\n| --e | for a given experiment, must match that from Phase 1 | MANDATORY |\n| --rheight | height to width ratio of rule population heatmaps | 1 |\n| --aheight | height to width ratio of feature tracking heatmaps | 1 |\n| --cluster | setting to run LCS-DIVE Phase 2 on compute cluster | 0 for No Cluster, 1 for LSF, 2 for SLURM |\n| --am1 | feature tracking cluster job soft memory limit (Gb) | 2 |\n| --am2 | feature tracking cluster job hard memory limit (Gb) | 3 |\n| --rm1 | rule population cluster job soft memory limit (Gb) | 5 |\n| --rm2 | rule population cluster job hard memory limit (Gb) | 6 |\n| --nm1 | network cluster job soft memory limit (Gb) | 2 |\n| --nm2 | network cluster job hard memory limit (Gb) | 3 |\n| --dorule | do rule population visualization (sometimes it is too compute expensive to run) | 1 |\n\nMost of these arguments can be left alone in most cases, except for **--o**, **--e**. Additionally, running on the computer cluster will not work until you configure the file's submitClusterJob method to run on your own cluster (it is currently run from a UPenn cluster). Hence, you will need to either configure the method, or set **--cluster** to 0 (runs it locally in serial). Running LCS-DIVE on a compute cluster is highly recommended, as it will speed things up significantly.\n\nBy the end of the this Phase, LCS-DIVE has completed running.\n\n**Sample Run Command**:\n```sh\npython AnalysisPhase2.py --o ../Outputs --e lcs\n```\n\n## NetworkVisualization.py\nThe default network diagram generated by LCS-DIVE for a given dataset is not always exactly how you want it to look. This file provides a GUI interface for you to drag nodes around, resize elements, and resave. For a given experiment, this must be run after **AnalysisPhase2.py**. There exists a few command line arguments:\n\n\n| Argument | Description | Default |\n| ---------- | --------------------  | ---------- |\n| --o | for a given experiment, must match that from Phase 1 and 2 | MANDATORY |\n| --e | for a given experiment, must match that from Phase 1 and 2| MANDATORY |\n| --nodep | Node Power: node relative size is based on a power function occurence^node_power. The larger this is, the faster less relevant nodes vanish | 3 |\n| --edgep | Edge Power: edge thickness is based on a power function cooccurence^edge_power. The larger this is, the faster less relevant edges vanish | 3 |\n| --nodes | Node Size: node maximum size | 50 |\n| --edges | Edge Size: edge maximum thickness | 30 |\n| --labelshow | show all node names by default (this being off is useful if there are many features) | 0 |\n| --labels | Label Size: node label maximum size | 50 |\n| --from_save | Try to open from a previously saved config file | 1 |\n\n**Sample Run Command**:\n```sh\npython NetworkVisualization.py --o ../Outputs --e lcs\n```\n\nOnce this command is run, a GUI window will pop up. From there you can\n1) Drag nodes around\n2) Press **l** over a node to make its label appear/disappear\n3) Press X to close the window. This automatically saves your configuration and creates a new visualization. When you run this again, your previously saved configuration will pop up for you to continue working (unless **--from_save** is 0).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furbslab%2Flcs-visualization-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Furbslab%2Flcs-visualization-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furbslab%2Flcs-visualization-pipeline/lists"}