{"id":22039509,"url":"https://github.com/mpolinowski/local-linear-embedding","last_synced_at":"2026-05-08T15:49:18.035Z","repository":{"id":234831426,"uuid":"626287063","full_name":"mpolinowski/local-linear-embedding","owner":"mpolinowski","description":"Improve Data Quality by discarding non-correlating, noisy Dimensions","archived":false,"fork":false,"pushed_at":"2023-04-11T06:59:12.000Z","size":1610,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-28T19:17:55.283Z","etag":null,"topics":["locally-linear-embedding","pyplot","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-04-11-locally-linear-embedding/2023-04-11","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpolinowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-11T06:59:05.000Z","updated_at":"2023-04-11T07:07:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"bed4ff8a-5f13-4df1-8e80-582e26a670c2","html_url":"https://github.com/mpolinowski/local-linear-embedding","commit_stats":null,"previous_names":["mpolinowski/local-linear-embedding"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Flocal-linear-embedding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Flocal-linear-embedding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Flocal-linear-embedding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpolinowski%2Flocal-linear-embedding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpolinowski","download_url":"https://codeload.github.com/mpolinowski/local-linear-embedding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245104529,"owners_count":20561380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["locally-linear-embedding","pyplot","python","scikit-learn"],"created_at":"2024-11-30T11:11:05.132Z","updated_at":"2026-05-08T15:49:13.006Z","avatar_url":"https://github.com/mpolinowski.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\njupyter:\n  jupytext:\n    formats: ipynb,md\n    text_representation:\n      extension: .md\n      format_name: markdown\n      format_version: '1.3'\n      jupytext_version: 1.14.4\n  kernelspec:\n    display_name: Python 3 (ipykernel)\n    language: python\n    name: python3\n---\n\n# Local Linear Embedding\n\n\u003e [An Introduction to Locally Linear Embedding](https://cs.nyu.edu/~roweis/lle/papers/lleintro.pdf): Many problems in information processing involve some form of dimension-\nality reduction. Here we describe locally linear embedding (LLE), an unsu-\npervised learning algorithm that computes low dimensional, neighborhood\npreserving embeddings of high dimensional data. LLE attempts to discover\nnonlinear structure in high dimensional data by exploiting the local symme-\ntries of linear reconstructions. Notably, LLE maps its inputs into a single\nglobal coordinate system of lower dimensionality, and its optimizations—\nthough capable of generating highly nonlinear embeddings—do not involve\nlocal minima. We illustrate the method on images of lips used in audiovisual\nspeech synthesis.\n\u003e `Lawrence K. Saul`, `Sam T. Roweis`\n\n```python\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nimport pandas as pd\nimport plotly.express as px\nimport seaborn as sns\nfrom sklearn.preprocessing import MinMaxScaler\nfrom sklearn.manifold import LocallyLinearEmbedding\n```\n\n## Dataset\n\n_(see introduction in: [Principal Component Analysis PCA](https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-04-09-principal-component-analysis/2023-04-09))_\n\n```python\nraw_data = pd.read_csv('data/A_multivariate_study_of_variation_in_two_species_of_rock_crab_of_genus_Leptograpsus.csv')\n\ndata = raw_data.rename(columns={\n    'sp' : 'Species',\n    'sex' : 'Sex',\n    'index' : 'Index',\n    'FL' : 'Frontal Lobe',\n    'RW' : 'Rear Width',\n    'CL' : 'Carapace Midline',\n    'CW' : 'Maximum Width',\n    'BD' : 'Body Depth'\n})\n\ndata['Species'] = data['Species'].map({'B':'Blue', 'O':'Orange'})\ndata['Sex'] = data['Sex'].map({'M':'Male', 'F':'Female'})\n\ndata.head(5)\n```\n\n|    | Species | Sex | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 8.1 | 6.7 | 16.1 | 19.0 | 7.0 |\n| 1 | Blue | Male | 2 | 8.8 | 7.7 | 18.1 | 20.8 | 7.4 |\n| 2 | Blue | Male | 3 | 9.2 | 7.8 | 19.0 | 22.4 | 7.7 |\n| 3 | Blue | Male | 4 | 9.6 | 7.9 | 20.1 | 23.1 | 8.2 |\n| 4 | Blue | Male | 5 | 9.8 | 8.0 | 20.3 | 23.0 | 8.2 |\n\n```python\n# generate a class variable for all 4 classes\ndata['Class'] = data.Species + data.Sex\n\nprint(data['Class'].value_counts())\ndata.head(1)\n```\n\n* BlueMale: `50`\n* BlueFemale: `50`\n* OrangeMale: `50`\n* OrangeFemale: `50`\n\n|    | species | sex | index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 8.1 | 6.7 | 16.1 | 19.0 | 7.0 | BlueMale |\n\n```python\ndata_columns = ['Frontal Lobe', 'Rear Width', 'Carapace Midline', 'Maximum Width', 'Body Depth']\n```\n\n```python\n# normalizing each feature to a given range to make them compareable\ndata_norm = data.copy()\ndata_norm[data_columns] = MinMaxScaler().fit_transform(data[data_columns])\n\ndata_norm.head()\n```\n\n|    | species | sex | index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 0.056604 | 0.014599 | 0.042553 | 0.050667 | 0.058065 | BlueMale |\n| 1 | Blue | Male | 2 | 0.100629 | 0.087591 | 0.103343 | 0.098667 | 0.083871 | BlueMale |\n| 2 | Blue | Male | 3 | 0.125786 | 0.094891 | 0.130699 | 0.141333 | 0.103226 | BlueMale |\n| 3 | Blue | Male | 4 | 0.150943 | 0.102190 | 0.164134 | 0.160000 | 0.135484 | BlueMale |\n| 4 | Blue | Male | 5 | 0.163522 | 0.109489 | 0.170213 | 0.157333 | 0.135484 | BlueMale |\n\n\n## Dimensionality Reduction\n\nThe standard [LLE algorithm](https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding) has the following stages:\n\n* __Nearest Neighbors Search__: The data is projected into a lower dimensional space while trying to preserve distances between neighbors.\n* __Weight Matrix Construction__: The weight matrix contains the information that preserves the reconstruction of the input data with fewer dimensions.\n\n```python\n# number of components = data columns = 5\n# to reduce dimensionality we are going to discard 3\nno_components = 3\nno_neighbors = 15\nlle = LocallyLinearEmbedding(n_components = no_components, n_neighbors = no_neighbors)\n\ndata_lle = lle.fit_transform(data_norm[data_columns])\n\n# Note that the reconstruction error increases when adding dimensions\nprint('Reconstruction Error: ', lle.reconstruction_error_)\n# with no_components=3 I get:\n# Reconstruction Error:  1.5214133597467682e-05\n# with no_components=2:\n# Reconstruction Error:  2.1530288023162284e-06\n\n# data_lle contains 1 column for each component\n# we can add them to our normalized data set\ndata_norm[['LLE1', 'LLE2', 'LLE3']] = data_lle\n\ndata_norm.head()\n```\n\n|    | Species | Sex | Index | Frontal Lobe | Rear Width | Carapace Midline | Maximum Width | Body Depth | Class | LLE1 | LLE2 | LLE3 |\n| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |\n| 0 | Blue | Male | 1 | 0.056604 | 0.014599 | 0.042553 | 0.050667 | 0.058065 | BlueMale | -0.145449 | 0.060973 | 0.092920 |\n| 1 | Blue | Male | 2 | 0.100629 | 0.087591 | 0.103343 | 0.098667 | 0.083871 | BlueMale | -0.133111 | 0.057664 | 0.059493 |\n| 2 | Blue | Male | 3 | 0.125786 | 0.094891 | 0.130699 | 0.141333 | 0.103226 | BlueMale | -0.126506 | 0.053316 | 0.053484 |\n| 3 | Blue | Male | 4 | 0.150943 | 0.102190 | 0.164134 | 0.160000 | 0.135484 | BlueMale | -0.118650 | 0.028331 | 0.059578 |\n| 4 | Blue | Male | 5 | 0.163522 | 0.109489 | 0.170213 | 0.157333 | 0.135484 | BlueMale | -0.117088 | 0.022013 | 0.060005 |\n\n\n### 2D Plot\n\n```python\nfig = plt.figure(figsize=(10, 8))\nsns.scatterplot(data=data_norm, x='LLE1', y='LLE2', hue='Class')\n```\n\nAlready the 2d projection allows us to distinguish between the two species - Orange and Blue:\n\n![Local Linear Embedding](https://github.com/mpolinowski/local-linear-embedding/blob/master/assets/Local_Linear_Embedding_01.png)\n\n\n### 3D Plot\n\n```python\nclass_colours = {\n    'BlueMale': '#0027c4', #blue\n    'BlueFemale': '#f18b0a', #orange\n    'OrangeMale': '#0af10a', # green\n    'OrangeFemale': '#ff1500', #red\n}\n\ncolours = data['Class'].apply(lambda x: class_colours[x])\n\nx=data_norm.LLE1\ny=data_norm.LLE2\nz=data_norm.LLE3\n\nfig = plt.figure(figsize=(10,10))\nax = fig.add_subplot(projection='3d')\n\nax.scatter(xs=x, ys=y, zs=z, s=50, c=colours)\n```\n\n![Local Linear Embedding](https://github.com/mpolinowski/local-linear-embedding/blob/master/assets/Local_Linear_Embedding_02.png)\n\n```python\nplot = px.scatter_3d(\n    data_norm,\n    x = 'LLE1',\n    y = 'LLE2',\n    z='LLE3',\n    color='Class')\n\nplot.show()\n```\n\n![Local Linear Embedding](https://github.com/mpolinowski/local-linear-embedding/blob/master/assets/Local_Linear_Embedding_03.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpolinowski%2Flocal-linear-embedding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpolinowski%2Flocal-linear-embedding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpolinowski%2Flocal-linear-embedding/lists"}