{"id":34097478,"url":"https://github.com/xinglab-ai/genomap","last_synced_at":"2026-03-08T21:38:22.957Z","repository":{"id":179264761,"uuid":"589035404","full_name":"xinglab-ai/genomap","owner":"xinglab-ai","description":"Cartography of Genomic Interactions Enables Deep Analysis of Single-Cell Expression Data (Nature Communications, 2023)","archived":false,"fork":false,"pushed_at":"2024-05-27T19:45:53.000Z","size":45821,"stargazers_count":19,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-05T20:18:23.770Z","etag":null,"topics":["bioinfomatics-pipeline","bioinformatics","bioinformatics-tool","biomarker-discovery","biomarkers","cell-annotation","classification-algorithm","data-classification","deep-learning","genomic-data-analysis","genomics","genomics-visualization","multi-omic-integration","regression-algorithms","single-cell","tabular-data","trajectory-inference","visualization"],"latest_commit_sha":null,"homepage":"https://www.nature.com/articles/s41467-023-36383-6","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xinglab-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-14T21:05:28.000Z","updated_at":"2026-02-12T05:38:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"589f3cf3-656b-42bf-95b7-5e805520666c","html_url":"https://github.com/xinglab-ai/genomap","commit_stats":{"total_commits":108,"total_committers":6,"mean_commits":18.0,"dds":0.6944444444444444,"last_synced_commit":"5bb146203f4622bdcdca9360b09ef17319af6a52"},"previous_names":["xinglab-ai/genomap"],"tags_count":47,"template":false,"template_full_name":null,"purl":"pkg:github/xinglab-ai/genomap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xinglab-ai%2Fgenomap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xinglab-ai%2Fgenomap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xinglab-ai%2Fgenomap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xinglab-ai%2Fgenomap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xinglab-ai","download_url":"https://codeload.github.com/xinglab-ai/genomap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xinglab-ai%2Fgenomap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30274841,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-08T20:45:49.896Z","status":"ssl_error","status_checked_at":"2026-03-08T20:45:49.525Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinfomatics-pipeline","bioinformatics","bioinformatics-tool","biomarker-discovery","biomarkers","cell-annotation","classification-algorithm","data-classification","deep-learning","genomic-data-analysis","genomics","genomics-visualization","multi-omic-integration","regression-algorithms","single-cell","tabular-data","trajectory-inference","visualization"],"created_at":"2025-12-14T15:57:50.718Z","updated_at":"2026-03-08T21:38:22.951Z","avatar_url":"https://github.com/xinglab-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Genomap creates images from gene expression data and offers high-performance dimensionality reduction and visualization, data clustering, classification and regression, automatic cell annotation, gene signature extraction, multi-omics data integration, and trajectory analysis \n\nGenomap is an entropy-based cartography strategy to contrive the high dimensional gene expression data into a configured image format with explicit integration of the genomic interactions. This unique cartography casts the gene-gene interactions into a spatial configuration and enables us to extract the deep genomic interaction features and discover underlying discriminative patterns of the data. For a wide variety of applications (cell clustering and recognition, gene signature extraction, single-cell data integration, cellular trajectory analysis, dimensionality reduction, and visualization), genomap drastically improves the accuracy of data analyses as compared to state-of-the-art techniques.\n\n## How to use genomap\n\nThe easiest way to start with genomap is to install it from pypi using \n\n```python\npip install genomap\n```\nThe data should be in cell (row) x gene (column) format. Genomap construction needs only one parameter: the size of the genomap (row and column number). The row and column number can be any number starting from 1. You can create square or rectangular genomaps. The number of genes in your dataset should be less than or equal to the number of pixels in the genomap. Genomap construction is very fast and you should get the genomaps within a few seconds. \n\nPlease run our Code-Ocean capsules (https://codeocean.com/capsule/4321565/tree/v1 and https://codeocean.com/capsule/6967747/tree/v1) to create the results in a single click. Please check the environment section of the Code Ocean capsules if you face any issues with the packages.\n\n## Sample data\n\nTo run the example codes below, you will need to download data files from [here](https://drive.google.com/drive/folders/1xq3bBgVP0NCMD7bGTXit0qRkL8fbutZ6?usp=drive_link).\n\n## Example codes\n\n### Example 1 - Construct genomaps\n\n```python\nimport pandas as pd # Please install pandas and matplotlib before you run this example\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport scipy\nimport genomap as gp\n\ndata = pd.read_csv('TM_data.csv', header=None,\n                   delim_whitespace=False)\ncolNum=33 # Column number of genomap\nrowNum=33 # Row number of genomap\n\ndataNorm=scipy.stats.zscore(data,axis=0,ddof=1) # Normalization of the data\n\ngenoMaps=gp.construct_genomap(dataNorm,rowNum,colNum,epsilon=0.0,num_iter=200) # Construction of genomaps\n\nfindI=genoMaps[10,:,:,:]\n\nplt.figure(1) # Plot the first genomap\nplt.imshow(findI, origin = 'lower',  extent = [0, 10, 0, 10], aspect = 1)\nplt.title('Genomap of a cell from TM dataset')\nplt.show()\n```\n\n### Example 2 - Try genoVis for data visualization and clustering\n\n```python\nimport scipy.io as sio\nimport numpy as np\nimport pandas as pd\nimport genomap.genoVis as gp\nimport matplotlib.pyplot as plt\nfrom sklearn.cluster import KMeans\nfrom sklearn.decomposition import PCA\nimport phate\nimport umap.umap_ as umap\n\ndata = pd.read_csv('TM_data.csv', header=None,\n                   delim_whitespace=False)\ndata=data.values\ngt_data = sio.loadmat('GT_TM.mat')\ny = np.squeeze(gt_data['GT'])\nn_clusters = len(np.unique(y))\n\n\nresVis=gp.genoVis(data,n_clusters=n_clusters, colNum=33,rowNum=33)\n# Use resVis=gp.genoVis(data, colNum=32,rowNum=32), if you dont know the number\n# of classes in the data\n\nresVisEmb=resVis[0] # Visualization result\nclusIndex=resVis[1] # Clustering result\n\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(resVisEmb[:, 0], resVisEmb[:, 1], c=y,cmap='jet', marker='o', s=18)      #  ax = plt.subplot(3, n, i + 1*10+1)\nplt.xlabel('genoVis1')\nplt.ylabel('genoVis2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n\nimport genomap.utils.metrics as metrics\nprint('acc=%.4f, nmi=%.4f, ari=%.4f' % (metrics.acc(y, clusIndex), metrics.nmi(y, clusIndex), metrics.ari(y, clusIndex)))\n```\n\n### Example 3 - Try genoDR for dimensionality reduction\n\n```python\nimport scipy.io as sio\nimport numpy as np\nimport genomap.genoDR as gp\nimport matplotlib.pyplot as plt\nimport umap.umap_ as umap\n\ndx = sio.loadmat('reducedData_divseq.mat')\ndata=dx['X']\ngt_data = sio.loadmat('GT_divseq.mat')\ny = np.squeeze(gt_data['GT'])\nn_clusters = len(np.unique(y))\n\nreduced_dim=32 # Number of reduced dimension\nresDR=gp.genoDR(data, n_dim=reduced_dim, n_clusters=n_clusters, colNum=33,rowNum=33) \n#resDR=gp.genoDR(data, n_dim=reduced_dim, colNum=33,rowNum=33) # if you dont know the number\n# of classes in the data\nembedding2D = umap.UMAP(n_neighbors=30,min_dist=0.3,n_epochs=200).fit_transform(resDR)\n\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(embedding2D[:, 0], embedding2D[:, 1], c=y,cmap='jet', marker='o', s=18)      #  ax = plt.subplot(3, n, i + 1*10+1)\nplt.xlabel('UMAP1')\nplt.ylabel('UMAP2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n```\n\n### Example 4 - Try genoTraj for cell trajectory analysis\n\n```python\nimport scipy.io as sio\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom sklearn.cluster import KMeans\nfrom sklearn.decomposition import PCA\nimport phate\nimport umap.umap_ as umap\nimport genomap.genoTraj as gp\n\n# Load data\ndx = sio.loadmat('organoidData.mat')\ndata=dx['X3']\ngt_data = sio.loadmat('cellsPsudo.mat')\nY_time = np.squeeze(gt_data['newGT'])\n\n# Apply genoTraj for embedding showing cell trajectories\noutGenoTraj=gp.genoTraj(data)\n\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(outGenoTraj[:, 0], outGenoTraj[:, 1], c=Y_time,cmap='jet', marker='o', s=18)      #  ax = plt.subplot(3, n, i + 1*10+1)\nplt.xlabel('genoTraj1')\nplt.ylabel('genoTraj2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n\n# Comparison with PHATE\npca = PCA(n_components=100)\nresPCA=pca.fit_transform(data)\n\nphate_op = phate.PHATE()\nres_phate = phate_op.fit_transform(resPCA)\n    \n    \nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(res_phate[:, 0], res_phate[:, 1], c=Y_time,cmap='jet', marker='o', s=18)      #  ax = plt.subplot(3, n, i + 1*10+1)\nplt.xlabel('PHATE1')\nplt.ylabel('PHATE2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n```\n\n### Example 5 - Try genoMOI for multi-omic data integration\n\n```python\nimport scanpy as sc\nimport matplotlib.pyplot as plt\nimport scipy.io as sio\nimport numpy as np\nimport pandas as pd\nimport genomap.genoMOI as gp\n\n# Load five different pancreatic datasets\ndx = sio.loadmat('dataBaronX.mat')\ndata=dx['dataBaron']\ndx = sio.loadmat('dataMuraroX.mat')\ndata2=dx['dataMuraro']\ndx = sio.loadmat('dataScapleX.mat')\ndata3=dx['dataScaple']\ndx = sio.loadmat('dataWangX.mat')\ndata4=dx['dataWang']\ndx = sio.loadmat('dataXinX.mat')\ndata5=dx['dataXin']\n\n# Load class and batch labels\ndx = sio.loadmat('classLabel.mat')\ny = np.squeeze(dx['classLabel'])\ndx = sio.loadmat('batchLabel.mat')\nybatch = np.squeeze(dx['batchLabel'])\n\n# Apply genomap-based multi omic integration and visualize the integrated data with local structure for cluster analysis\n# returns 2D visualization, cluster labels, and intgerated data\nresVis,cli,int_data=gp.genoMOIvis(data, data2, data3, data4, data5, colNum=12, rowNum=12, n_dim=32, epoch=10, prealign_method='scanorama')\n\n# Plot colored with cell class labels\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(resVis[:, 0], resVis[:, 1], c=y,cmap='jet', marker='o', s=18)      \nplt.xlabel('genoVis1')\nplt.ylabel('genoVis2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n\n# Plot colored with batch labels\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(resVis[:, 0], resVis[:, 1], c=ybatch,cmap='jet', marker='o', s=18)      \nplt.xlabel('genoVis1')\nplt.ylabel('genoVis2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n\n# Apply genomap-based multi omic integration and visualize the integrated data with global structure for trajectory analysis\n\n# returns 2D embedding, cluster labels, and intgerated data\nresTraj,cli,int_data=gp.genoMOItraj(data, data2, data3, data4, data5, colNum=12, rowNum=12, n_dim=32, epoch=10, prealign_method='scanorama')\n\n# Plot colored with cell class labels\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(resTraj[:, 0], resTraj[:, 1], c=y,cmap='jet', marker='o', s=18)      \nplt.xlabel('genoTraj1')\nplt.ylabel('genoTraj2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n\n# Plot colored with batch labels\nplt.figure(figsize=(15, 10))\nplt.rcParams.update({'font.size': 28})    \nh1=plt.scatter(resTraj[:, 0], resTraj[:, 1], c=ybatch,cmap='jet', marker='o', s=18)      \nplt.xlabel('genoTraj1')\nplt.ylabel('genoTraj2')\nplt.tight_layout()\nplt.colorbar(h1)\nplt.show()\n```\n\n### Example 6 - Try genoAnnotate for cell annotation\n\n```python\nimport scanpy as sc\nimport pandas as pd\nimport genomap.genoAnnotate as gp\nimport matplotlib.pyplot as plt\n#Load the PBMC dataset\nadata = sc.read_10x_mtx(\"./pbmc3k_filtered_gene_bc_matrices/\")\n\n# Input: adata: annData containing the raw gene counts\n# tissue type: e.g. Immune system,Pancreas,Liver,Eye,Kidney,Brain,Lung,Adrenal,Heart,Intestine,Muscle,Placenta,Spleen,Stomach,Thymus \n \nadataP=gp.genoAnnotate(adata,species=\"human\", tissue_type=\"Immune system\")\ncell_annotations=adataP.obs['cell_type'].values # numpy array containing the\n# cell annotations\n# Compute t-SNE\nsc.tl.tsne(adataP)\n# Create a t-SNE plot colored by cell type labels\nsc.pl.tsne(adataP, color='cell_type')\n```\n\n### Example 7 - Try genoSig for finding gene signatures for cell/data classes\n\n```python\nimport numpy as np\nimport scipy.io as sio\nfrom genomap.utils.util_Sig import createGenomap_for_sig\nimport pandas as pd\nimport genomap.genoSig as gp\n\n# Load data\ndx = sio.loadmat('reducedData_divseq.mat')\ndata=dx['X']\n# Load data labels\nlabel = pd.read_csv('groundTruth_divseq.csv',header=None)\n# Load gene names corresponding to the columns of the data\n# Here we create artificial gene names as Gene_1, Gene_2. You can upload your gene sets\ngene_names = ['Gene_' + str(i) for i in range(1, data.shape[1]+1)]\ngene_names=np.array(gene_names)\n\n# The cell classes for which gene signatures will be computed\nuserPD = np.array(['DG'])\n\ncolNum=32 # genomap column number\nrowNum=32 # genomap row number\n# Create genomaps\ngenoMaps,gene_namesRe,T=createGenomap_for_sig(data,gene_names,rowNum,colNum)\n# compute the gene signatures\nresult=gp.genoSig(genoMaps,T,label,userPD,gene_namesRe, epochs=50)\n\nprint(result.head())\n```\n\n### Example 8 - Try genoClassification for tabular data classification\n\n```python\nimport pandas as pd\nimport numpy as np\nimport scipy.io as sio\nimport genomap.genoClassification as gp\nfrom genomap.utils.util_genoClassReg import select_random_values\n\n# First, we load the TM data. Data should be in cells X genes format, \ndata = pd.read_csv('TM_data.csv', header=None,\n                   delim_whitespace=False)\n\n# Creation of genomaps\n# Selection of row and column number of the genomaps \n# To create square genomaps, the row and column numbers are set to be the same.\ncolNum=33 \nrowNum=33\n\n# Load ground truth cell labels of the TM dataset\ngt_data = sio.loadmat('GT_TM.mat')\nGT = np.squeeze(gt_data['GT'])\nGT=GT-1 # to ensure the labels begin with 0 to conform with PyTorch\n\n# Select 80% data randomly for training and others for testing\nindxTrain, indxTest= select_random_values(start=0, end=GT.shape[0], perc=0.8)\ngroundTruthTest = GT[indxTest-1]\n\ntraining_data=data.values[indxTrain-1]\ntraining_labels=GT[indxTrain-1]\ntest_data=data.values[indxTest-1]\n\nest=gp.genoClassification(training_data, training_labels, test_data, rowNum=rowNum, colNum=colNum, epoch=150)\n\nprint('Classification accuracy of genomap approach:'+str(np.sum(est==groundTruthTest) / est.shape[0]))  \n```\n\n### Example 9 - Try genoRegression for tabular data regression\n\n```python\nimport pandas as pd\nimport numpy as np\nimport scipy.io as sio\nimport genomap.genoRegression as gp\nfrom sklearn.metrics import mean_squared_error\nfrom genomap.utils.util_genoClassReg import select_random_values\n\n# Load data and labels\ndx = sio.loadmat('organoidData.mat')\ndata=dx['X3']\ngt_data = sio.loadmat('GT_Org.mat')\nY_time = np.squeeze(gt_data['GT'])\nY_time = Y_time - 1 # to ensure the labels begin with 0 to conform with PyTorch\n\n# Select 80% data randomly for training and others for testing\nindxTrain, indxTest= select_random_values(start=0, end=Y_time.shape[0], perc=0.8)\ngroundTruthTest = Y_time[indxTest-1]\ntraining_data=data[indxTrain-1]\ntraining_labels=Y_time[indxTrain-1]\ntest_data=data[indxTest-1]\n\n# Run genoRegression\nest=gp.genoRegression(training_data, training_labels, test_data, rowNum=40, colNum=40, epoch=200)\n\n# Calculate MSE\nmse = mean_squared_error(groundTruthTest, est)\nprint(f'MSE: {mse}')\n```\n\n# Citation\n\nIf you use the genomap code, please cite our Nature Communications paper: https://www.nature.com/articles/s41467-023-36383-6\n\nIslam, M.T., Xing, L. Cartography of Genomic Interactions Enables Deep Analysis of Single-Cell Expression Data. Nat Commun 14, 679 (2023). https://doi.org/10.1038/s41467-023-36383-6\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxinglab-ai%2Fgenomap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxinglab-ai%2Fgenomap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxinglab-ai%2Fgenomap/lists"}