{"id":15519893,"url":"https://github.com/ceteri/pattern","last_synced_at":"2025-03-05T06:30:36.650Z","repository":{"id":7692040,"uuid":"9055962","full_name":"ceteri/pattern","owner":"ceteri","description":"\"Pattern\" sub-project for Cascading, which uses Cascading flows as containers for machine learning models, importing PMML model descriptions from R, SAS, KNIME, Weka, RapidMiner, etc.","archived":false,"fork":false,"pushed_at":"2013-03-17T02:11:59.000Z","size":1644,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-24T12:22:58.944Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://cascading.org/","language":"Java","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ceteri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-03-27T14:43:51.000Z","updated_at":"2024-02-24T01:03:03.000Z","dependencies_parsed_at":"2022-07-09T16:46:58.167Z","dependency_job_id":null,"html_url":"https://github.com/ceteri/pattern","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fpattern","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fpattern/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fpattern/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fpattern/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ceteri","download_url":"https://codeload.github.com/ceteri/pattern/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241979203,"owners_count":20052093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:23:28.007Z","updated_at":"2025-03-05T06:30:35.630Z","avatar_url":"https://github.com/ceteri.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"cascading.pattern\n=================\n_Pattern_ sub-project for http://Cascading.org/ which uses flows as\ncontainers for machine learning models, importing\n[PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)\nmodel descriptions from _R_, _SAS_, _Weka_, _RapidMiner_, _KNIME_,\n_SQL Server_, etc.\n\nCurrent support for PMML includes:\n\n * [Random Forest](http://en.wikipedia.org/wiki/Random_forest) in [PMML 4.0+](http://www.dmg.org/v4-0-1/MultipleModels.html) exported from [R/Rattle](http://cran.r-project.org/web/packages/rattle/index.html)\n * [Linear Regression](http://en.wikipedia.org/wiki/Linear_regression) in [PMML 1.1+](http://www.dmg.org/v1-1/generalregression.html)\n * [Hierarchical Clustering](http://en.wikipedia.org/wiki/Hierarchical_clustering) and [K-Means Clustering](http://en.wikipedia.org/wiki/K-means_clustering) in [PMML 2.0+](http://www.dmg.org/v2-0/ClusteringModel.html)\n * [Logistic Regression](http://en.wikipedia.org/wiki/Logistic_regression) in [PMML 4.0.1+](http://www.dmg.org/v4-0-1/Regression.html)\n\n\nBuild Instructions\n------------------\nTo build _Pattern_ and then run its unit tests:\n\n    gradle --info --stacktrace clean test\n\nThe following scripts generate a baseline (model+data) for the _Random\nForest_ algorithm. This baseline includes a reference data set -- \n1000 independent variables, 500 rows of simulated ecommerce orders --\nplus a predictive model in PMML:\n\n    ./src/py/gen_orders.py 500 1000 \u003e orders.tsv\n    R --vanilla \u003c ./src/r/rf_pmml.R \u003e model.log\n\nThis will generate `huge.rf.xml` as the PMML export for a Random\nForest classifier plus `huge.tsv` as a baseline data set for\nregression testing.\n\nTo build _Pattern_ and run a regression test:\n\n    gradle clean jar\n    rm -rf out\n    hadoop jar build/libs/pattern.jar data/sample.tsv out/classify out/trap \\\n     --pmml data/sample.rf.xml --measure out/measure --assert\n\nFor each tuple in the data, a _stream assertion_ tests whether the\n`predicted` field matches the `score` field generated by the\nmodel. Tuples which fail that assertion get trapped into\n`out/trap/part*` for inspection.\n\nAlso, the _confusion matrix_ shown in `out/measure/part*` should\nmatch the one logged in `model.log` from baseline generated in _R_.\n\nTo run on Amazon AWS, take a look at the `emr.sh` script.\n\n\nClassifier vs. Predictive Model\n-------------------------------\nHere's how to run an example _classifier_ using Random Forest:\n\n    gradle clean jar\n    rm -rf out\n    hadoop jar build/libs/pattern.jar data/iris.rf.tsv out/classify out/trap \\\n     --pmml data/iris.rf.xml --measure out/measure --label species\n\nHere's how to run an example _predictive model_ using Linear Regression:\n\n    gradle clean jar\n    rm -rf out\n    hadoop jar build/libs/pattern.jar data/iris.lm_p.tsv out/classify out/trap \\\n     --pmml data/iris.lm_p.xml --rmse out/measure\n\n\nUse in Cascading Apps\n---------------------\nAlternatively, if you want to re-use this assembly for your own\nCascading app, remove the parts related to `verifyPipe` and\n`measurePipe` from the sample code.\n\nThe following snippet in R shows how to train a Random Forest model,\nthen generate PMML as a file called `sample.rf.xml`:\n\n    f \u003c- as.formula(\"as.factor(label) ~ .\")\n    fit \u003c- randomForest(f, data_train, ntree=50)\n    saveXML(pmml(fit), file=\"sample.rf.xml\")\n\nTo use the PMML file in your Cascading app, this example it\nreferenced as a command line argument called `pmmlPath`:\n\n    // define a \"Classifier\" model from PMML to evaluate the orders\n    ClassifierFunction classFunc = new ClassifierFunction( new Fields( \"score\" ), pmmlPath );\n    Pipe classifyPipe = new Each( new Pipe( \"classify\" ), classFunc.getFields(), classFunc, Fields.ALL );\n\nNow when you run that Cascading app, provide a reference to\n`sample.rf.xml` for the `pmmlPath` argument.\n\nAn architectural diagram for common use case patterns is shown in\n`docs/pattern.graffle` which is an OmniGraffle document.\n\n\nExample Models\n--------------\nCheck the `src/r/rattle_pmml.R` script for examples of predictive\nmodels which are created in R, then exported using _Rattle_.\nThese examples use the popular\n[Iris](http://en.wikipedia.org/wiki/Iris_flower_data_set) data set.\n\n * random forest (rf)\n * linear regression (lm)\n * hierarchical clustering (hclust)\n * k-means clustering (kmeans)\n * logistic regression (glm)\n * multinomial model (multinom)\n * single hidden-layer neural network (nnet)\n * support vector machine (ksvm)\n * recursive partition classification tree (rpart)\n * association rules\n\nTo execute the R script:\n\n    R --vanilla \u003c src/r/rattle_pmml.R\n\nIt is possible to extend PMML support for other kinds of modeling in R\nand other analytics platforms.  Contact the developers to discuss on\nthe [cascading-user](https://groups.google.com/forum/?fromgroups#!forum/cascading-user)\nemail forum.\n\n\nPMML Resources\n--------------\n * [Data Mining Group](http://www.dmg.org/) XML standards and supported vendors\n * [PMML In Action](http://www.amazon.com/dp/1470003244) book \n * [PMML validator](http://www.zementis.com/pmml_tools.htm)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fceteri%2Fpattern","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fceteri%2Fpattern","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fceteri%2Fpattern/lists"}