{"id":22039505,"url":"https://github.com/mpolinowski/multi-dimensional-scaling","last_synced_at":"2026-05-09T06:12:30.734Z","repository":{"id":234831429,"uuid":"626925038","full_name":"mpolinowski/multi-dimensional-scaling","owner":"mpolinowski","description":"Multidimensional Scaling is a family of statistical methods that focus on creating mappings of items based on distance.","archived":false,"fork":false,"pushed_at":"2023-04-12T12:43:03.000Z","size":468,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-19T08:50:55.628Z","etag":null,"topics":["matplotlib-pyplot","multi-dimensional-scaling","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-04-13-multi-dimensional-scaling/2023-04-13","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpolinowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-12T12:39:53.000Z","updated_at":"2023-04-12T12:49:42.000Z","dependencies_parsed_at":"2024-04-21T02:03:15.946Z","dependency_job_id":"dad3da2e-05f4-4892-a5f1-0387a24922ce","html_url":"https://github.com/mpolinowski/multi-dimensional-scaling","commit_stats":null,"previous_names":["mpolinowski/multi-dimensional-scaling"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mpolinowski/multi-dimensional-scaling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Fmulti-dimensional-scaling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Fmulti-dimensional-scaling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Fmulti-dimensional-scaling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Fmulti-dimensional-scaling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpolinowski","download_url":"https://codeload.github.com/mpolinowski/multi-dimensional-scaling/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Fmulti-dimensional-scaling/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32809149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matplotlib-pyplot","multi-dimensional-scaling","python","scikit-learn"],"created_at":"2024-11-30T11:11:04.611Z","updated_at":"2026-05-09T06:12:30.717Z","avatar_url":"https://github.com/mpolinowski.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\njupyter:\n  jupytext:\n    formats: ipynb,md\n    text_representation:\n      extension: .md\n      format_name: markdown\n      format_version: '1.3'\n      jupytext_version: 1.14.4\n  kernelspec:\n    display_name: Python 3 (ipykernel)\n    language: python\n    name: python3\n---\n\n# Multidimensional Scaling (MDS)\n\nMultidimensional Scaling is a family of statistical methods that focus on creating mappings of items based on distance. It projects your dataset into a space of fewer dimensions while attempting to preserve the distances of your data points in the original space.\n\n```python\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nimport pandas as pd\nimport seaborn as sns\nfrom sklearn.manifold import MDS\nfrom sklearn.preprocessing import MinMaxScaler\n```\n\n```python\nraw_data = pd.read_csv('data/A_multivariate_study_of_variation_in_two_species_of_rock_crab_of_genus_Leptograpsus.csv')\n\ndata = raw_data.rename(columns={\n    'sp': 'Species',\n    'sex': 'Sex',\n    'index': 'Index',\n    'FL': 'Frontal Lobe',\n    'RW': 'Rear Width',\n    'CL': 'Carapace Midline',\n    'CW': 'Maximum Width',\n    'BD': 'Body Depth'})\n\ndata['Species'] = data['Species'].map({'B':'Blue', 'O':'Orange'})\ndata['Sex'] = data['Sex'].map({'M':'Male', 'F':'Female'})\ndata['Class'] = data.Species + data.Sex\n\ndata_columns = ['Frontal Lobe',\n                'Rear Width',\n                'Carapace Midline',\n                'Maximum Width',\n                'Body Depth']\n\ndata.head()\n```\n\n```python\n# generate a class variable for all 4 classes\ndata['Class'] = data.Species + data.Sex\n\nprint(data['Class'].value_counts())\ndata.head(5)\n```\n\n|    | Species | Sex | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 8.1 | 6.7 | 16.1 | 19.0 | 7.0 | BlueMale |\n| 1 | Blue | Male | 2 | 8.8 | 7.7 | 18.1 | 20.8 | 7.4 | BlueMale |\n| 2 | Blue | Male | 3 | 9.2 | 7.8 | 19.0 | 22.4 | 7.7 | BlueMale |\n| 3 | Blue | Male | 4 | 9.6 | 7.9 | 20.1 | 23.1 | 8.2 | BlueMale |\n| 4 | Blue | Male | 5 | 9.8 | 8.0 | 20.3 | 23.0 | 8.2 | BlueMale |\n\n```python\n# normalize data columns\ndata_norm = data.copy()\ndata_norm[data_columns] = MinMaxScaler().fit_transform(data[data_columns])\n\ndata_norm.describe()\n```\n\n|  | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth |\n| -- | -- | -- | -- | -- | -- | -- |\n| count | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 |\n| mean | 25.500000 | 0.527233 | 0.455365 | 0.529043 | 0.515053 | 0.511645 |\n| std | 14.467083 | 0.219832 | 0.187835 | 0.216382 | 0.209919 | 0.220953 |\n| min | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |\n| 25% | 13.000000 | 0.358491 | 0.328467 | 0.382219 | 0.384000 | 0.341935 |\n| 50% | 25.500000 | 0.525157 | 0.459854 | 0.528875 | 0.525333 | 0.503226 |\n| 75% | 38.000000 | 0.682390 | 0.569343 | 0.684650 | 0.664000 | 0.677419 |\n| max | 50.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |\n\n\n## 2-Dimensional Plot\n\n```python\nno_components = 2\nn_init = 15\nmetric = True\nn_stress='auto'\n\nmds = MDS(\n    n_components=no_components,\n    n_init=n_init, metric=metric,\n    normalized_stress=n_stress)\n\ndata_mds = mds.fit_transform(data_norm[data_columns])\nprint('MSE: ', mds.stress_)\n# MSE:  3.886582480465905\n# the more components you add the smaller\n# the mean squared error becomes - meaning\n# your model starts to fit better\n\ndata_norm[['MDS1', 'MDS2']] = data_mds\ndata_norm.head(1)\n```\n\n|    | Species | Sex | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class | MDS1 | MDS2 |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 0.056604 | 0.014599 | 0.042553 | 0.050667 | 0.058065 | BlueMale | -0.482199 | -0.917839 |\n\n```python\nfig = plt.figure(figsize=(10, 8))\nsns.scatterplot(x='MDS1', y='MDS2', hue='Class', data=data_norm)\n```\n\n![Multidimensional Scaling (MDS)](https://github.com/mpolinowski/multi-dimensional-scaling/blob/master/assets/Multi-Dimensional_Scaling_01.png)\n\n\n## 3-Dimensional Plot\n\n```python\nno_components = 3\nn_init = 15\nmetric = True\nn_stress='auto'\n\nmds = MDS(\n    n_components=no_components,\n    n_init=n_init, metric=metric,\n    normalized_stress=n_stress)\n\ndata_mds = mds.fit_transform(data_norm[data_columns])\nprint('MSE: ', mds.stress_)\n# MSE:  2.4601741009431457\n\ndata_norm[['MDS1', 'MDS2', 'MDS3']] = data_mds\ndata_norm.head(1)\n```\n\n|    | Species | Sex | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class | MDS1 | MDS2 | MDS3 |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 0.056604 | 0.014599 | 0.042553 | 0.050667 | 0.058065 | BlueMale | -0.093961 | 0.80491 | 0.645809 |\n\n```python\nclass_colours = {\n    'BlueMale': '#0027c4', #blue\n    'BlueFemale': '#f18b0a', #orange\n    'OrangeMale': '#0af10a', # green\n    'OrangeFemale': '#ff1500', #red\n}\n\ncolours = data_norm['Class'].apply(lambda x: class_colours[x])\n\nx=data_norm.MDS1\ny=data_norm.MDS2\nz=data_norm.MDS3\n\nfig = plt.figure(figsize=(10,10))\nplt.title('MDS Data Analysis')\nax = fig.add_subplot(projection='3d')\n\nax.scatter(xs=x, ys=y, zs=z, s=50, c=colours)\n```\n\n![Multidimensional Scaling (MDS)](https://github.com/mpolinowski/multi-dimensional-scaling/blob/master/assets/Multi-Dimensional_Scaling_02.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpolinowski%2Fmulti-dimensional-scaling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpolinowski%2Fmulti-dimensional-scaling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpolinowski%2Fmulti-dimensional-scaling/lists"}