{"id":13858127,"url":"https://github.com/Laurae2/LauraeDS","last_synced_at":"2025-07-13T23:31:22.882Z","repository":{"id":74918387,"uuid":"112105981","full_name":"Laurae2/LauraeDS","owner":"Laurae2","description":"Laurae's Data Science R Package","archived":false,"fork":false,"pushed_at":"2018-02-10T13:44:47.000Z","size":110,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-22T16:39:14.442Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://laurae2.github.io/LauraeDS/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Laurae2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-11-26T18:21:58.000Z","updated_at":"2020-07-20T07:24:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"472c66ef-3ca9-41ca-b8ce-965c0d5f8aa7","html_url":"https://github.com/Laurae2/LauraeDS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Laurae2/LauraeDS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Laurae2%2FLauraeDS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Laurae2%2FLauraeDS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Laurae2%2FLauraeDS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Laurae2%2FLauraeDS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Laurae2","download_url":"https://codeload.github.com/Laurae2/LauraeDS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Laurae2%2FLauraeDS/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265220323,"owners_count":23729792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:57.606Z","updated_at":"2025-07-13T23:31:17.872Z","avatar_url":"https://github.com/Laurae2.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# LauraeDS: Laurae's Data Science Package\n\nThis package is the sequel to [Laurae2/Laurae](https://github.com/Laurae2/Laurae) R package.\n\nIt is meant to require less stuff and more robust.\n\n## Installation\n\n```r\ndevtools::install_github(\"Laurae2/LauraeDS\", dep = FALSE)\n```\n\nDependencies installation:\n\n```r\ninstall.packages(c(\"Matrix\", \"sparsio\", \"fst\", \"data.table\", \"pbapply\", \"parallel\"))\ndevtools::install_github(\"fstpackage/fst@e060e62\")\ndevtools::install_github(\"Laurae2/ez_xgb/R-package@2017-02-15-v1\")\ndevtools::install_github(\"Microsoft/LightGBM/R-package@fc59fce\") # Jul 14 2017, v2.0.4\n\n```\n\n---\n\n## TO-DO\n\n* [x] add fold generation\n* [x] add sparse handling\n* [x] add parallel fast csv/fst converter\n* [x] add parallel handling (cluster)\n* [ ] add parallel xgboost\n* [ ] add parallel LightGBM\n* [ ] add metrics\n* [x] add metric optimizations\n* [x] xgb.DMatrix generation\n* [x] lgb.Dataset generation\n* [x] xgboost trainer\n* [ ] LightGBM trainer\n* [ ] easy GLM (xgboost)\n* [ ] easy Random Forest (xgboost)\n* [ ] easy Random Forest (LightGBM)\n* [ ] easy Gradient Boosted Trees (xgboost)\n* [ ] easy Gradient Boosted Trees (LightGBM)\n* [ ] grid learning (\"grid search\")\n* [ ] Random Patches feature generation (Subsampling + Colsampling from feature groups)\n* [ ] stacker\n* [ ] add lot of stuff\n\n---\n\n## Available functions\n\n---\n\n### Parallel functions\n\nParallel functions are provided to make R fly on multi-core and multi-socket systems, provided enough RAM.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| parallel.csv | data.table, fst, parallel | Parallelizes and multithreads the reading of CSV files and writes to fst file format for fast reading. |\n| parallel.threading | parallel | Sets processor affinity correctly on Windows machines. Provide a boost of up to 200% in memory bounded applications. |\n| parallel.destroy | parallel | Stops a parallel cluster, or destroy any available clusters bound to the current R session. |\n\n\n### I/O functions\n\nI/O Functions allows to read files from sparse matrices quickly.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| sparse.read | sparsio, Matrix | Reads SVMLight file format (sparse matrices) |\n| sparse.write | sparsio, Matrix | Writes SVMLight file format (sparse matrices) |\n\n---\n\n### Fold functions\n\nFold functions allow to generate folds for cross-validation very quickly.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| kfold | None | Generate cross-validated folds (stratified, treatment, pseudo-random, random) |\n| nkfold | None | Generate Repeated cross-validated folds (stratified, treatment, pseudo-random, random) |\n\n---\n\n### Optimized Metrics\n\nOptimized metrics might help get an edge when you can.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| metrics.acc.max | data.table | Maximum Binary Accuracy |\n| metrics.f1.max | data.table | Maximum F1 Score (Precision with Sensitivity Harmonic Mean |\n| metrics.fallout;max | data.table | Minimum Fall-Out (False Positive Rate) |\n| metrics.kappa.max | data.table | Maximum Kappa Statistic |\n| metrics.mcc.max | data.table | Maximum Matthews Correlation Coefficient |\n| metrics.missrate.max | data.table | Minim Miss-rate (False Negative Rate) |\n| metrics.precision.max | data.table | Maximum Precision (Positive Predictive Rate) |\n| metrics.sensitivity.max | data.table | Maximum Sensitivity (True Positive Rate) |\n| metrics.specifity.max | data.table | Maximum Specificity (True Negative Rate) |\n\n## Metric Computation/Solving\n\nComputing and/or solving metrics might help you understand what default values are the best for the metric.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| metrics.logloss | None | Logarithmic Loss (logloss) |\n| metrics.logloss.unsafe | None | Logarithmic Loss (logloss) without bound checking |\n| metrics.logloss.solve | stats | Logarithmic Loss Solver |\n\n---\n\n### Machine Learning, Binary Matrices\n\nGenerating binary matrices never got easier if you can throw lists and data.frames directly.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| Laurae.xgb.dmat | xgboost, Matrix | Wrapper for extensible xgb.DMatrix generation. |\n| Laurae.lgb.dmat | lightgbm, Matrix | Wrapper for extensible lgb.Dataset generation. |\n\n---\n\n### Machine Learning, Supervised\n\nNot remembering every existing hyperparameters? Now you can by pressing Tab to autocomplete hyperparameters.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| Laurae.xgb.train | xgboost, Matrix | Wrapper for xgboost Models |\n\n---\n\n### Machine Learning, Loss/Metrics Helpers\n\nCreating loss/metrics can be a tedious task without templates. Use these as template wrappers: focus on loss/metrics, wrap them with a template quickly.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| xgb.wrap.loss | xgboost | Wrapper to make quick xgboost loss function. |\n| xgb.wrap.metric | xgboost | Wrapper to make quick xgboost metric function. |\n| lgb.wrap.loss | LightGBM | Wrapper to make quick LightGBM loss function. |\n| lgb.wrap.metric | LightGBM | Wrapper to make quick LightGBM metric function. |\n\n---\n\n### Machine Learning, Loss/Metrics Functions\n\nNeed functions answering metrics quickly? Here are some.\n\n| Function | Packages | Description |\n| :--- | :--- | :--- |\n| metrics.logloss | None | Computes the logarithmic loss. |\n| metrics.logloss.unsafe | None | Computes the logarithmic loss faster by skipping out of bounds checks. | \n| metrics.logloss.solve | stats | Solves for a parameter involving the logartihmic loss (minimal loss, constant prediction value, ratio). |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLaurae2%2FLauraeDS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLaurae2%2FLauraeDS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLaurae2%2FLauraeDS/lists"}