{"id":32199410,"url":"https://github.com/zjwang1013/sparsegfm","last_synced_at":"2025-10-22T03:15:20.214Z","repository":{"id":312527125,"uuid":"1047774538","full_name":"zjwang1013/sparseGFM","owner":"zjwang1013","description":"sparseGFM implements sparse generalized factor models for dimension reduction and variable selection in high-dimensional continuous, count, and binary data. Stable release available on CRAN (https://cran.r-project.org/package=sparseGFM); development version hosted on GitHub.","archived":false,"fork":false,"pushed_at":"2025-09-21T02:34:05.000Z","size":85,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-21T04:21:54.447Z","etag":null,"topics":["dimension-reduction","factor-models","penalized-regression","variable-selection"],"latest_commit_sha":null,"homepage":"https://cran.r-project.org/package=sparseGFM","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjwang1013.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-31T07:40:55.000Z","updated_at":"2025-09-20T00:42:54.000Z","dependencies_parsed_at":"2025-08-31T10:13:03.410Z","dependency_job_id":"ab5ce346-5d78-4e26-a1ad-f91f3889d3a5","html_url":"https://github.com/zjwang1013/sparseGFM","commit_stats":null,"previous_names":["zjwang1013/sparsegfm"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/zjwang1013/sparseGFM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjwang1013%2FsparseGFM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjwang1013%2FsparseGFM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjwang1013%2FsparseGFM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjwang1013%2FsparseGFM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjwang1013","download_url":"https://codeload.github.com/zjwang1013/sparseGFM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjwang1013%2FsparseGFM/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280371888,"owners_count":26319523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dimension-reduction","factor-models","penalized-regression","variable-selection"],"created_at":"2025-10-22T03:15:18.162Z","updated_at":"2025-10-22T03:15:20.208Z","avatar_url":"https://github.com/zjwang1013.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sparseGFM: Sparse Generalized Factor Models with Multiple Penalty Functions\n\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![CRAN status](https://www.r-pkg.org/badges/version/sparseGFM)](https://CRAN.R-project.org/package=sparseGFM)\n[![](https://cranlogs.r-pkg.org/badges/grand-total/sparseGFM?color=orange)](https://cran.r-project.org/package=sparseGFM)\n\n## Overview\n\nThe `sparseGFM` package provides methods for fitting sparse generalized factor models with various penalty functions. The package is designed to handle high-dimensional data and can adapt to weak factor scenarios, making it suitable for a wide range of applications in statistics and machine learning. \n\nThe package is now available on [CRAN](https://cran.r-project.org/package=sparseGFM) under the name **sparseGFM**.  \n\n## Installation\n\nThe package is now available on [CRAN](https://cran.r-project.org/package=sparseGFM) under the name **sparseGFM**.  \nOn GitHub, the development version is hosted under the repository name **sparseGFM**.\n\n### From CRAN (stable version)\n\n```r\ninstall.packages(\"sparseGFM\")\n\n# Load the package\nlibrary(sparseGFM)\n```\n\n### From GitHub (development version)\n\nYou can also install the development version of sparseGFM from GitHub:\n\n```r\n# Install devtools if you haven't already\ninstall.packages(\"devtools\")\n\n# Install sparseGFM from GitHub\ndevtools::install_github(\"zjwang1013/sparseGFM\")\n\n# Load the package\nlibrary(sparseGFM)\n```\n\n## Key Features\n\n- **Sparse Loading Matrix Estimation**: Efficiently estimates row-sparse loading matrices\n- **Multiple Penalty Functions**: Supports lasso, adaptive group lasso (aglasso), and other penalties\n- **Weak Factor Adaptation**: Automatically adapts to scenarios with weak factor structures\n- **Cross-Validation**: Built-in cross-validation for optimal parameter selection\n- **Determine the Number of Factors**: Automatic selection of the optimal number of factors using multiple information criteria\n\n## Functions Overview\n\n### Core Functions\n- `sparseGFM()`: Main function for fitting sparse generalized factor models\n- `cv.sparseGFM()`: Cross-validation for lambda selection\n- `facnum.sparseGFM()`: Information criterion-based selection of factor number\n\n### Supporting Functions\n- `add_identifiability()`: Apply identifiability constraints to factor/loading matrices\n- `evaluate_performance()`: Evaluate variable selection performance metrics\n\n## Penalty Functions\n\nThe package implements 12 different penalty functions:\n\n| Penalty | Description | Adaptive Version |\n|---------|-------------|------------------|\n| `lasso` | L1 penalty | `alasso` |\n| `SCAD` | Smoothly Clipped Absolute Deviation | `agSCAD` |\n| `MCP` | Minimax Concave Penalty | `agMCP` |\n| `group`/`glasso` | Group Lasso | `agroup`/`aglasso` |\n| `gSCAD` | Group SCAD | `agSCAD` |\n| `gMCP` | Group MCP | `agMCP` |\n\n## Missing Data\n\nThe package automatically handles missing values.\n\n## Quick Start\n\n### Load the Package\n\n```r\nlibrary(sparseGFM)\nset.seed(123)\n```\n\n### Data Generation with Sparse Loading Matrix\n\n```r\n# Parameters\nn \u003c- 200          # number of observations\np \u003c- 200          # number of variables\na_param \u003c- 0.9    # sparsity parameter\nq \u003c- 2            # number of factors\n\n# Generate sparse structure\ns \u003c- ceiling(p^a_param)  # number of non-zero loadings\n\n# Generate factor matrix\nFF \u003c- matrix(runif(n * q, min = -3, max = 3), nrow = n, ncol = q)\n\n# Generate sparse loading matrix (row-sparse)\nBB \u003c- rbind(matrix(runif(s * q, min = 1, max = 2), nrow = s, ncol = q),\n            matrix(0, nrow = (p - s), ncol = q))\n\n# Generate intercepts\nalpha_true \u003c- runif(p, min = -1, max = 1)\n\n# Add identifiability constraints\nident_res \u003c- add_identifiability(FF, BB, alpha_true)\nFF0 \u003c- ident_res$H\nBB0 \u003c- ident_res$B\nalpha0 \u003c- ident_res$mu\n\n# Generate data matrix\nmat_para \u003c- FF0 %*% t(BB0) + as.matrix(rep(1, n)) %*% t(as.matrix(alpha0))\nx \u003c- matrix(nrow = n, ncol = p)\nfor (i in 1:n) {\n  for (j in 1:p) {\n    x[i, j] \u003c- rnorm(1, mean = mat_para[i, j])\n  }\n}\n\n# True variable selection indicator\nindex_true \u003c- numeric(p)\nif (s \u003e 0 \u0026\u0026 s \u003c= p) {\n  index_true[1:s] \u003c- 1\n}\n```\n\n## Examples\n\n### Example 1: Cross-Validation with Adaptive Group Lasso\n\n```r\n# Perform cross-validation to select optimal lambda\ncv_result \u003c- cv.sparseGFM(x,\n                          type = \"continuous\",\n                          q = 2,\n                          penalty = \"aglasso\",\n                          C = 5,\n                          lambda_range = seq(0.1, 1, by = 0.1),\n                          verbose = FALSE)\n\n# Extract optimal model\noptimal_model \u003c- cv_result$optimal_model\nBB_hat \u003c- optimal_model$BB_hat\nFF_hat \u003c- optimal_model$FF_hat\nalpha_hat \u003c- optimal_model$alpha_hat\n\n# Evaluate model performance\nmat_para_hat \u003c- FF_hat %*% t(BB_hat) + as.matrix(rep(1, n)) %*% t(as.matrix(alpha_hat))\nrelative_error \u003c- base::norm((mat_para_hat - mat_para), type = \"F\") / base::norm(mat_para, type = \"F\")\n\nprint(paste(\"Optimal lambda:\", cv_result$optimal_lambda))\nprint(paste(\"Relative estimation error:\", round(relative_error, 4)))\n\n# Variable selection performance\nindex_pred \u003c- rep(1, p)\nindex_pred[optimal_model$index] \u003c- 0\nperformance \u003c- evaluate_performance(index_true, index_pred)\nprint(performance)\n```\n\n### Example 2: Direct Model Fitting\n\n```r\n# Fit sparse GFM with fixed lambda\nresult \u003c- sparseGFM(x, \n                    type = \"continuous\", \n                    q = 2,\n                    penalty = \"aglasso\",\n                    lambda = 0.1,\n                    C = 5)\n\n# Extract results\nBB_direct \u003c- result$BB_hat\nFF_direct \u003c- result$FF_hat\nalpha_direct \u003c- result$alpha_hat\n\n# View selected variables\nselected_vars \u003c- setdiff(1:p, result$index)\nprint(paste(\"Number of selected variables:\", length(selected_vars)))\nprint(paste(\"Number of zero loadings:\", length(result$index)))\n\n# Calculate space angles for evaluation\nBB_vcc \u003c- eval.space(BB_direct, BB0)[1]\nBB_tcc \u003c- eval.space(BB_direct, BB0)[2]\nFF_vcc \u003c- eval.space(FF_direct, FF0)[1] \nFF_tcc \u003c- eval.space(FF_direct, FF0)[2]\n\nprint(paste(\"Loading matrix VCC:\", round(BB_vcc, 4)))\nprint(paste(\"Loading matrix TCC:\", round(BB_tcc, 4)))\n```\n\n### Example 3: Determine the Number of Factors\n\n```r\n# Select optimal number of factors using multiple criteria\nfacnum_result \u003c- facnum.sparseGFM(x,\n                                  type = \"continuous\",\n                                  q_range = 1:5,\n                                  penalty = \"aglasso\",\n                                  lambda_range = c(0.1),\n                                  sic_type = \"sic1\",\n                                  C = 6,\n                                  verbose = FALSE)\n\n# Extract information criteria results\ndf_dd \u003c- facnum_result$df_dd\ndf_as \u003c- facnum_result$df_as\n\n# Get optimal factor numbers from different criteria\noptimal_q_sic1 \u003c- facnum_result$optimal_q\noptimal_q_sic2 \u003c- which.min(facnum_result$sic2)\noptimal_q_sic3 \u003c- which.min(facnum_result$sic3)\noptimal_q_sic4 \u003c- which.min(facnum_result$sic4)\n\nprint(\"Optimal number of factors:\")\nprint(paste(\"SIC1:\", optimal_q_sic1))\nprint(paste(\"SIC2:\", optimal_q_sic2))\nprint(paste(\"SIC3:\", optimal_q_sic3))\nprint(paste(\"SIC4:\", optimal_q_sic4))\n\n# Plot information criteria (if plotting functions are available)\nplot(1:5, facnum_result$sic1, type = \"b\", col = \"red\", \n     xlab = \"Number of Factors\", ylab = \"Information Criterion\",\n     main = \"Determine the Number of Factors\")\nlines(1:5, facnum_result$sic2, type = \"b\", col = \"blue\")\nlines(1:5, facnum_result$sic3, type = \"b\", col = \"green\")\nlines(1:5, facnum_result$sic4, type = \"b\", col = \"purple\")\nlegend(\"topright\", c(\"SIC1\", \"SIC2\", \"SIC3\", \"SIC4\"), \n       col = c(\"red\", \"blue\", \"green\", \"purple\"), lty = 1)\n```\n\n## Parameters\n\n### Main Parameters\n\n- **`x`**: Data matrix (n × p)\n- **`type`**: Data type (\"continuous\" for Gaussian data)\n- **`q`**: Number of factors\n- **`penalty`**: Penalty type (\"lasso\", \"aglasso\", etc.)\n- **`lambda`**: Regularization parameter\n- **`C`**: Constraint constant\n\n### Cross-Validation Parameters\n\n- **`lambda_range`**: Range of lambda values for cross-validation\n- **`verbose`**: Whether to print progress information\n\n### Determine the Number of Factors Parameters\n\n- **`q_range`**: Range of factor numbers to consider\n- **`sic_type`**: Type of information criterion for primary selection\n\n## Methodological Details\n\nThe sparseGFM algorithm employs:\n\n1. **Initialization**: Using the GFM package for initial estimates\n2. **Alternating minimization**: Iteratively updating factors (F) and loadings (B)\n3. **Sparsity induction**: Applying various penalty functions to achieve variable selection\n4. **Identifiability**: Ensuring unique solutions through SVD-based constraints\n5. **Convergence**: Monitoring objective function changes until convergence\n\n## Model Characteristics\n\nThe sparseGFM package is specifically designed to handle:\n\n1. **High-dimensional data** where the number of variables may exceed the number of observations\n2. **Sparse loading structures** with row-wise sparsity patterns\n3. **Weak factor scenarios** where factors have relatively small eigenvalues\n4. **Flexible penalty structures** including adaptive penalties that can provide better variable selection\n\nThe adaptive group lasso (aglasso) penalty is particularly effective for row-sparse loading matrices, as it can select entire rows (variables) rather than individual elements.\n\n## Dependencies\n\n- R (≥ 3.5.0)\n- GFM\n- MASS\n- irlba\n- stats\n\n## Bug Reports and Issues\n\nPlease report any bugs or issues on the [GitHub Issues page](https://github.com/zjwang1013/sparseGFM/issues). When reporting, include:\n\n1. A clear description of the issue\n2. Reproducible code example\n3. Your session information (`sessionInfo()`)\n4. Any relevant error messages\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss proposed modifications.\n\n## References\n\nFor more detailed information about the methodology and theoretical properties, please refer to the associated research papers and documentation. The related papers are currently under review; we will update the information as soon as they are accepted and published.\n\n## License\n\nThis package is licensed under GPL-3. See the [LICENSE](LICENSE) file for details.\n\n## Issues and Support\n\nFor bug reports, feature requests, or questions, please visit the GitHub repository issues page.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjwang1013%2Fsparsegfm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjwang1013%2Fsparsegfm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjwang1013%2Fsparsegfm/lists"}