{"id":16511320,"url":"https://github.com/pbenner/kmerlr","last_synced_at":"2025-03-02T08:13:52.053Z","repository":{"id":57535504,"uuid":"190181462","full_name":"pbenner/kmerLr","owner":"pbenner","description":null,"archived":false,"fork":false,"pushed_at":"2021-09-23T19:50:34.000Z","size":743,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-12T20:22:57.924Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-04T10:39:14.000Z","updated_at":"2021-09-23T19:50:38.000Z","dependencies_parsed_at":"2022-08-29T00:40:55.120Z","dependency_job_id":null,"html_url":"https://github.com/pbenner/kmerLr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2FkmerLr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2FkmerLr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2FkmerLr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2FkmerLr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/kmerLr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241476458,"owners_count":19968916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:59:49.655Z","updated_at":"2025-03-02T08:13:52.015Z","avatar_url":"https://github.com/pbenner.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Installation\n\nPre-compiled binaries are available [here](https://github.com/pbenner/kmerLr-binary).\n\n## Documentation\n\nBenner, Philipp. *Computing Leapfrog Regularization Paths with Applications to Large-Scale K-mer Logistic Regression.* Journal of Computational Biology (2021).\n\n## KmerLr Example\n\nForeground and background data must be stored in seperate files where the rows are the samples and the columns the features:\n```bash\n$ cat test_fg.table \n0.4,1.4,10.5,0.21, 5.1,0.22,4.21\n0.2,1.2,33.5,0.23, 1.6,0.11,5.23\n0.3,1.3,20.3,0.26, 4.4,0.13,4.25\n0.4,1.2,36.5,0.29,10.1,0.11,2.21\n0.2,1.2,65.1,0.22,50.1,0.31,4.23\n0.1,1.3,70.2,0.21, 0.2,0.21,2.21\n0.3,1.2,43.5,0.22, 0.1,0.22,4.22\n0.2,1.4,10.3,0.23, 0.4,0.31,2.32\n$ cat test_bg.table \n0.2,22.4,30.5,0.53,13.3,0.11,14.21\n0.4,11.5,13.1,0.54, 3.6,0.42,45.23\n0.5,15.4,24.2,0.44, 1.6,0.31,24.25\n0.2,21.3,26.5,0.20, 0.3,0.14,21.21\n0.4,11.1,10.3,0.20,10.4,0.61,14.23\n0.2,31.4,50.2,0.11,65.3,0.18,12.21\n1.1,11.2,32.5,0.13, 2.1,0.32,14.22\n1.3,13.4,70.3,0.45, 6.4,0.99,12.32\n```\n\nEstimate a logistic regression model with two features:\n```bash\n$ ./kmerLr -v --type=scoresLr learn --lambda-auto=2 --save-trace test_{fg,bg}.table test\nReading scores from `test_fg.table'... done\nReading scores from `test_bg.table'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=2.496875e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nExporting trace to `test.trace'... done\nExporting model to `test_2.json'... done\n```\n\nPrint the coefficients of the estimates model:\n```bash\n$ ./kmerLr -v --type=scoresLr coefficients test_2.json\nImporting distribution from `test.json'... done\n     1  -5.466291e-02 2\n     2  -3.026280e-02 7\n```\n\n## Regularization Paths\n\nEstimation of regularization paths:\n```bash\n$ ./kmerLr -v --type=scoresLr learn --lambda-auto=2,5 --save-path test_{fg,bg}.table test\nReading scores from `test_fg.table'... done\nReading scores from `test_bg.table'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=2.496875e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=4.870316e-02...\nEstimated classifier has 3 non-zero coefficients, selecting 2 new features...\nEstimating parameters with lambda=1.891759e-03...\nEstimated classifier has 3 non-zero coefficients, selecting 2 new features...\nEstimating parameters with lambda=1.482768e-04...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting regularization path to `test.path'... done\nExporting model to `test_2.json'... done\nExporting model to `test_5.json'... done\n```\nPlot regularization path:\n```R\nlibrary(RColorBrewer)\n\nplot.path \u003c- function(filename, col=brewer.pal(n = 8, name = \"RdBu\")) {\n    t \u003c- read.table(filename, header=TRUE)\n    x \u003c- t$norm\n    y \u003c- read.csv(textConnection(as.character(t$theta)), header=FALSE)\n    matplot(x, y, type=\"l\", lty=1, lwd=1.5, col=col, xlab=expression(paste(\"||\", theta, \"||\")[1]), ylab=expression(theta[i]), xlim=c(min(x), max(x)))\n    abline(v=x, lty=2, col=\"lightgray\")\n}\n\nplot.path(\"test.path\")\n```\n\n## Cross-Validation\n\nUse a logistic regression model with two features in a 5-fold cross-validation:\n```bash\n$ ./kmerLr -v --type=scoresLr learn --epsilon-loss=1e-4 --lambda-auto=2,5 --save-trace --save-path --k-fold-cv=5 test_{fg,bg}.table test\nReading scores from `test_fg.table'... done\nReading scores from `test_bg.table'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=3.185208e+00...\nEstimated classifier has 1 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=2.976290e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=6.695171e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=1.331785e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=1.012429e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=1.978671e-02...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting trace to `test_0.trace'... done\nExporting regularization path to `test_0.path'... done\nExporting model to `test_2_0.json'... done\nExporting model to `test_5_0.json'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=3.678846e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=3.088860e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=9.727014e-03...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting trace to `test_1.trace'... done\nExporting regularization path to `test_1.path'... done\nExporting model to `test_2_1.json'... done\nExporting model to `test_5_1.json'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=3.032692e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=1.763804e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=6.953048e-03...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting trace to `test_2.trace'... done\nExporting regularization path to `test_2.path'... done\nExporting model to `test_2_2.json'... done\nExporting model to `test_5_2.json'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=2.848077e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=3.217164e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=6.634468e-03...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting trace to `test_3.trace'... done\nExporting regularization path to `test_3.path'... done\nExporting model to `test_2_3.json'... done\nExporting model to `test_5_3.json'... done\nEstimating classifier with 2 non-zero coefficients...\nEstimating parameters with lambda=2.543269e+00...\nEstimated classifier has 1 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=2.581854e+00...\nEstimated classifier has 2 non-zero coefficients, selecting 0 new features...\nEstimating classifier with 5 non-zero coefficients...\nEstimating parameters with lambda=4.322065e-02...\nEstimated classifier has 4 non-zero coefficients, selecting 1 new features...\nEstimating parameters with lambda=7.162611e-03...\nEstimated classifier has 5 non-zero coefficients, selecting 0 new features...\nExporting trace to `test_4.trace'... done\nExporting regularization path to `test_4.path'... done\nExporting model to `test_2_4.json'... done\nExporting model to `test_5_4.json'... done\nExporting cross-validation results to `test_2.table'... done\nExporting cross-validation results to `test_5.table'... done\n```\n\nPlot cross-validation result:\n```R\nlibrary(ROCR)\n\nplot.roc \u003c- function(filename) {\n    t \u003c- read.table(filename, header=T)\n    p \u003c- prediction(t$prediction, t$labels)\n    p \u003c- performance(p, \"tpr\", \"fpr\")\n    plot(p)\n}\n\nplot.roc(\"test_2.table\")\n```\n\n## Objective function\n\n\u003ca href=\"https://www.codecogs.com/eqnedit.php?latex=\\omega(\\theta)\u0026space;=\u0026space;-\\frac{1}{n}\\sum_{i=1}^n\u0026space;\\left\\{y_i\u0026space;\\log\\sigma(x_i\\theta)\u0026space;\u0026plus;\u0026space;(1-y_i)\\log(1-\\sigma(x_i\\theta))\\right\\}\u0026space;\u0026plus;\u0026space;\\lambda\u0026space;\\left\u0026space;\\Vert\u0026space;\\theta\u0026space;\\right\\Vert_1\" target=\"_blank\"\u003e\u003cimg src=\"https://latex.codecogs.com/gif.latex?\\omega(\\theta)\u0026space;=\u0026space;-\\frac{1}{n}\\sum_{i=1}^n\u0026space;\\left\\{y_i\u0026space;\\log\\sigma(x_i\\theta)\u0026space;\u0026plus;\u0026space;(1-y_i)\\log(1-\\sigma(x_i\\theta))\\right\\}\u0026space;\u0026plus;\u0026space;\\lambda\u0026space;\\left\u0026space;\\Vert\u0026space;\\theta\u0026space;\\right\\Vert_1\" title=\"\\omega(\\theta) = -\\frac{1}{n}\\sum_{i=1}^n \\left\\{y_i \\log\\sigma(x_i\\theta) + (1-y_i)\\log(1-\\sigma(x_i\\theta))\\right\\} + \\lambda \\left \\Vert \\theta \\right\\Vert_1\" /\u003e\u003c/a\u003e\n\n## Symbolic regression\n\nPrimary input features:\n```bash\n$ cat scoresLr_test_primary_fg.table\na,b,c,d\n0.4,1.4,10.5,0.21\n0.2,1.2,33.5,0.23\n0.3,1.3,20.3,0.26\n0.4,1.2,36.5,0.29\n0.2,1.2,65.1,0.22\n0.1,1.3,70.2,0.21\n0.3,1.2,43.5,0.22\n0.2,1.4,10.3,0.23\n$ cat scoresLr_test_primary_bg.table\na,b,c,d\n0.2,22.4,30.5,0.53\n0.4,11.5,13.1,0.54\n0.5,15.4,24.2,0.44\n0.2,21.3,26.5,0.20\n0.4,11.1,10.3,0.20\n0.2,31.4,50.2,0.11\n1.1,11.2,32.5,0.13\n1.3,13.4,70.3,0.45\n```\n\nUse *kmerLr* to expand feature set:\n```bash\nkmerLr --type=scoresLr expand --max-features=20 --header scoresLr_test_primary_fg.table,scoresLr_test_primary_bg.table scoresLr_test_expanded\n```\n\nResult:\n```bash\n$ cat scoresLr_test_expanded_0.table\na,b,c,d,exp(a),exp(b),exp(c),exp(d),exp(-a),exp(-b),exp(-c),exp(-d),log(a),log(b),log(c),log(d),(a^2),(b^2),(c^2),(d^2)\n4.000000e-01,1.400000e+00,1.050000e+01,2.100000e-01,1.491825e+00,4.055200e+00,3.631550e+04,1.233678e+00,6.703200e-01,2.465970e-01,2.753645e-05,8.105842e-01,-9.162907e-01,3.364722e-01,2.351375e+00,-1.560648e+00,1.600000e-01,1.960000e+00,1.102500e+02,4.410000e-02\n2.000000e-01,1.200000e+00,3.350000e+01,2.300000e-01,1.221403e+00,3.320117e+00,3.538874e+14,1.258600e+00,8.187308e-01,3.011942e-01,2.825757e-15,7.945336e-01,-1.609438e+00,1.823216e-01,3.511545e+00,-1.469676e+00,4.000000e-02,1.440000e+00,1.122250e+03,5.290000e-02\n3.000000e-01,1.300000e+00,2.030000e+01,2.600000e-01,1.349859e+00,3.669297e+00,6.549045e+08,1.296930e+00,7.408182e-01,2.725318e-01,1.526940e-09,7.710516e-01,-1.203973e+00,2.623643e-01,3.010621e+00,-1.347074e+00,9.000000e-02,1.690000e+00,4.120900e+02,6.760000e-02\n4.000000e-01,1.200000e+00,3.650000e+01,2.900000e-01,1.491825e+00,3.320117e+00,7.108019e+15,1.336427e+00,6.703200e-01,3.011942e-01,1.406862e-16,7.482636e-01,-9.162907e-01,1.823216e-01,3.597312e+00,-1.237874e+00,1.600000e-01,1.440000e+00,1.332250e+03,8.410000e-02\n2.000000e-01,1.200000e+00,6.510000e+01,2.200000e-01,1.221403e+00,3.320117e+00,1.873142e+28,1.246077e+00,8.187308e-01,3.011942e-01,5.338623e-29,8.025188e-01,-1.609438e+00,1.823216e-01,4.175925e+00,-1.514128e+00,4.000000e-02,1.440000e+00,4.238010e+03,4.840000e-02\n1.000000e-01,1.300000e+00,7.020000e+01,2.100000e-01,1.105171e+00,3.669297e+00,3.072364e+30,1.233678e+00,9.048374e-01,2.725318e-01,3.254823e-31,8.105842e-01,-2.302585e+00,2.623643e-01,4.251348e+00,-1.560648e+00,1.000000e-02,1.690000e+00,4.928040e+03,4.410000e-02\n3.000000e-01,1.200000e+00,4.350000e+01,2.200000e-01,1.349859e+00,3.320117e+00,7.794889e+18,1.246077e+00,7.408182e-01,3.011942e-01,1.282892e-19,8.025188e-01,-1.203973e+00,1.823216e-01,3.772761e+00,-1.514128e+00,9.000000e-02,1.440000e+00,1.892250e+03,4.840000e-02\n2.000000e-01,1.400000e+00,1.030000e+01,2.300000e-01,1.221403e+00,4.055200e+00,2.973262e+04,1.258600e+00,8.187308e-01,2.465970e-01,3.363310e-05,7.945336e-01,-1.609438e+00,3.364722e-01,2.332144e+00,-1.469676e+00,4.000000e-02,1.960000e+00,1.060900e+02,5.290000e-02\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fkmerlr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbenner%2Fkmerlr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fkmerlr/lists"}