{"id":13718966,"url":"https://github.com/gbuesing/pca","last_synced_at":"2025-05-07T10:34:10.485Z","repository":{"id":29673090,"uuid":"33215362","full_name":"gbuesing/pca","owner":"gbuesing","description":"Principal component analysis (PCA) in Ruby","archived":false,"fork":false,"pushed_at":"2017-01-05T19:06:44.000Z","size":480,"stargazers_count":26,"open_issues_count":1,"forks_count":9,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-27T02:47:30.531Z","etag":null,"topics":["pca","principal-component-analysis","ruby","rubyml"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gbuesing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"MIT-LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-03-31T23:17:26.000Z","updated_at":"2024-08-10T18:01:44.000Z","dependencies_parsed_at":"2022-09-03T18:22:59.807Z","dependency_job_id":null,"html_url":"https://github.com/gbuesing/pca","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbuesing%2Fpca","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbuesing%2Fpca/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbuesing%2Fpca/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbuesing%2Fpca/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gbuesing","download_url":"https://codeload.github.com/gbuesing/pca/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252860074,"owners_count":21815458,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pca","principal-component-analysis","ruby","rubyml"],"created_at":"2024-08-03T01:00:40.093Z","updated_at":"2025-05-07T10:34:10.017Z","avatar_url":"https://github.com/gbuesing.png","language":"Ruby","readme":"# Principal Component Analysis (PCA)\n\n[Principal component analysis](http://setosa.io/ev/principal-component-analysis/) in Ruby. Uses [GSL](http://www.gnu.org/software/gsl/) for calculations.\n\nPCA can be used to map data to a lower dimensional space while minimizing information loss. \nIt's useful for data visualization, where you're limited to 2-D and 3-D plots.\n\nFor example, here's a plot of the 4-D iris flower dataset mapped to 2-D via PCA:\n\n![iris](https://raw.githubusercontent.com/gbuesing/pca/master/examples/data/iris_small.png)\n\nPCA is also used to compress the features of a dataset before feeding it into a machine learning algorithm,\npotentially speeding up training time with a minimal loss of data detail.\n\n\n## Install\n\n**GSL must be installed first**. On OS X it can be installed via homebrew: ```brew install gsl```\n\n    gem install pca\n\n\n## Example Usage\n\n```ruby\nrequire 'pca'\n\npca = PCA.new components: 1\n\ndata_2d = [ \n  [2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0],\n  [2.3, 2.7], [2.0, 1.6], [1.0, 1.1], [1.5, 1.6], [1.1, 0.9]\n]\n\ndata_1d = pca.fit_transform data_2d\n\n# Transforms 2d data into 1d:\n# data_1d ~= [\n#   [-0.8], [1.8], [-1.0], [-0.3], [-1.7],\n#   [-0.9], [0.1], [1.1], [0.4], [1.2]\n# ]\n\nmore_data_1d = pca.transform [ [3.1, 2.9] ]\n\n# Transforms new data into previously fitted 1d space:\n# more_data_1d ~= [ [-1.6] ]\n\nreconstructed_2d = pca.inverse_transform data_1d\n\n# Reconstructs original data (approximate, b/c data compression):\n# reconstructed_2d ~= [\n#   [2.4, 2.5], [0.6, 0.6], [2.5, 2.6], [2.0, 2.1], [2.9, 3.1]\n#   [2.4, 2.6], [1.7, 1.8], [1.0, 1.1], [1.5, 1.6], [1.0, 1.0]\n# ]\n\nevr = pca.explained_variance_ratio\n\n# Proportion of data variance explained by each component\n# Here, the first component explains 99.85% of the data variance:\n# evr ~= [0.99854]\n```\n\nSee [examples](examples/) for more. Also, peruse the [source code](lib/pca.rb) (~ 100 loc.)\n\n\n### Options\n\nThe following options can be passed in to ```PCA.new```:\n\noption | default | description\n------ | ------- | -----------\n:components | nil | number of components to extract. If nil, will just rotate data onto first principal component\n:scale_data | false | scales features before running PCA by dividing each feature by its standard deviation.\n\n\n### Working with Returned GSL::Matrix\n\n```PCA#transform```, ```#fit_transform```, ```#inverse_transform``` and ```#components``` return instances of ```GSL::Matrix```.\n\nSome useful methods to work with these are the ```#each_row``` and ```#each_col``` iterators,\nand the ```#row(i)``` and ```#col(i)``` accessors.\n\nOr if you'd prefer to work with a standard Ruby ```Array```, you can just call ```#to_a``` and get an array of row arrays.\n\nSee [GSL::Matrix RDoc](http://blackwinter.github.io/rb-gsl/rdoc/matrix_rdoc.html) for more.\n\n\n### Plotting Results With GNUPlot\n\nRequires [GNUPlot](http://www.gnuplot.info/) and [gnuplot gem](https://github.com/rdp/ruby_gnuplot/tree/master).\n\n```ruby\nrequire 'pca'\nrequire 'gnuplot'\n\npca = PCA.new components: 2\ndata_2d = pca.fit_transform data\n\nGnuplot.open do |gp|\n  Gnuplot::Plot.new(gp) do |plot|\n    plot.title \"Transformed Data\"\n    plot.terminal \"png\"\n    plot.output \"out.png\"\n\n    # Use #col accessor to get separate x and y arrays\n    # #col returns a GSL::Vector, so be sure to call #to_a before passing to DataSet\n    xy = [data_2d.col(0).to_a, data_2d.col(1).to_a]\n\n    plot.data \u003c\u003c Gnuplot::DataSet.new(xy) do |ds|\n      ds.title = \"Points\"\n    end\n  end\nend\n```\n\n\n## Sources and Inspirations\n\n- [A tutorial on Principal Components Analysis](http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf) (PDF) a great introduction to PCA\n- [Principal Component Analyisis Explained Visually](http://setosa.io/ev/principal-component-analysis/)\n- [scikit-learn PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)\n- [Lecture video](https://www.coursera.org/learn/machine-learning/lecture/ZYIPa/principal-component-analysis-algorithm) and [notes](https://share.coursera.org/wiki/index.php/ML:Dimensionality_Reduction) (requires Coursera login) from Andrew Ng's Machine Learning Coursera class\n- [Implementing a Principal Component Analysis (PCA) in Python step by step](http://sebastianraschka.com/Articles/2014_pca_step_by_step.html)\n- [Dimensionality Reduction: Principal Component Analysis in-depth](http://nbviewer.ipython.org/github/jakevdp/sklearn_pycon2015/blob/master/notebooks/04.1-Dimensionality-PCA.ipynb)\n","funding_links":[],"categories":["Statistics","Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgbuesing%2Fpca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgbuesing%2Fpca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgbuesing%2Fpca/lists"}