{"id":23029729,"url":"https://github.com/antononcube/raku-ml-associationrulelearning","last_synced_at":"2025-04-02T20:25:45.982Z","repository":{"id":71383788,"uuid":"504353780","full_name":"antononcube/Raku-ML-AssociationRuleLearning","owner":"antononcube","description":"Raku package for association rule learning. (Apriori, Eclat, confidence, lift, conviction.)","archived":false,"fork":false,"pushed_at":"2024-03-11T03:05:43.000Z","size":377,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-08T11:13:22.692Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-06-17T01:15:06.000Z","updated_at":"2023-03-30T17:30:18.000Z","dependencies_parsed_at":"2024-03-11T04:24:00.510Z","dependency_job_id":"91f701b7-a183-4e26-82d5-ac35cc04bf90","html_url":"https://github.com/antononcube/Raku-ML-AssociationRuleLearning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-AssociationRuleLearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-AssociationRuleLearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-AssociationRuleLearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-AssociationRuleLearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-ML-AssociationRuleLearning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246886504,"owners_count":20849878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-15T14:16:58.027Z","updated_at":"2025-04-02T20:25:45.963Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Raku ML::AssociationRuleLearning\n\n[![SparkyCI](https://ci.sparrowhub.io/project/gh-antononcube-Raku-ML-AssociationRuleLearning/badge)](https://ci.sparrowhub.io)\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\nThis repository has the code of a Raku package for\n[Association Rule Learning (ARL)](https://en.wikipedia.org/wiki/Association_rule_learning)\nfunctions, [Wk1].\n\nThe ARL framework includes the algorithms \n[Apriori](https://en.wikipedia.org/wiki/Apriori_algorithm) \nand \n[Eclat](https://en.wikipedia.org/wiki/Association_rule_learning#Eclat_algorithm), \nand the measures \n[confidence](https://en.wikipedia.org/wiki/Association_rule_learning#Confidence),\n[lift](https://en.wikipedia.org/wiki/Association_rule_learning#Lift), and \n[conviction](https://en.wikipedia.org/wiki/Association_rule_learning#Conviction), \n(and others.)\n\nFor computational introduction to ARL utilization (in Mathematica) see the article\n[\"Movie genre associations\"](https://mathematicaforprediction.wordpress.com/2013/10/06/movie-genre-associations/),\n[AA1].\n\nThe examples below use the packages\n[\"Data::Generators\"](https://raku.land/cpan:ANTONOV/Data::Generators),\n[\"Data::Reshapers\"](https://raku.land/cpan:ANTONOV/Data::Reshapers), and\n[\"Data::Summarizers\"](https://raku.land/cpan:ANTONOV/Data::Summarizers), described in the article\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n[AA2].\n\n-------\n\n## Installation\n\nVia zef-ecosystem:\n\n```shell\nzef install ML::AssociationRuleLearning\n```\n\nFrom GitHub:\n\n```shell\nzef install https://github.com/antononcube/Raku-ML-AssociationRuleLearning\n```\n\n-------\n\n## Frequent sets finding \n\nHere we get the Titanic dataset (from \"Data::Reshapers\") and summarize it:\n\n```perl6\nuse Data::Reshapers;\nuse Data::Summarizers;\nmy @dsTitanic = get-titanic-dataset();\nrecords-summary(@dsTitanic);\n```\n\n**Problem:** Find all combinations of values of the variables \"passengerAge\", \"passengerClass\", \"passengerSex\", and\n\"passengerSurvival\" that appear more than 200 times in the Titanic dataset.\n\nHere is how we use the function `frequent-sets` to give an answer:\n\n```perl6\nuse ML::AssociationRuleLearning;\nmy @freqSets = frequent-sets(@dsTitanic, min-support =\u003e 200, min-number-of-items =\u003e 2, max-number-of-items =\u003e Inf):counts;\n@freqSets.elems\n```\n\nThe function `frequent-sets` returns the frequent sets together with their support.\n\nHere we tabulate the result:\n\n```perl6\nsay to-pretty-table(@freqSets.map({ %( Frequent-set =\u003e $_.key.join(' '), Count =\u003e $_.value) }), align =\u003e 'l');\n```\n\nWe can verify the result by looking into these group counts, [AA2]:\n\n```perl6\nmy $obj = group-by( @dsTitanic, \u003cpassengerClass passengerSex\u003e);\n.say for $obj\u003e\u003e.elems.grep({ $_.value \u003e= 200 });\n$obj = group-by( @dsTitanic, \u003cpassengerClass passengerSurvival passengerSex\u003e);\n.say for $obj\u003e\u003e.elems.grep({ $_.value \u003e= 200 });\n```\n\nOr these contingency tables:\n\n```perl6\nmy $obj = group-by( @dsTitanic, \"passengerClass\") ;\n$obj = $obj.map({ $_.key =\u003e cross-tabulate( $_.value, \"passengerSex\", \"passengerSurvival\" ) });\n.say for $obj.Array;\n```\n\n**Remark:** For datasets -- i.e. arrays of hashes -- `frequent-sets` preprocesses the data by concatenating\ncolumn names with corresponding column values. This is done in order to prevent \"collisions\" of same values \ncoming from different columns. If that concatenation is not desired then manual preprocessing like this can be used:\n\n```{perl6, eval=FALSE}\n@dsTitanic.map({ $_.values.List }).Array\n```\n\n**Remark:** `frequent-sets`'s argument `min-support` can take both integers greater than 1 and frequencies between 0 and 1.\n(If an integer greater than one is given, then the corresponding frequency is derived.)\n\n**Remark:** By default `frequent-sets` uses the Eclat algorithm. The functions `apriori` and `eclat`\ncall `frequent-sets` with the option settings `method=\u003e'Apriori'` and `method=\u003e'Eclat'` respectively.\n\n-------\n\n## Association rules finding\n\nHere we find association rules with min support 0.3 and min confidence 0.7:\n\n```perl6\nassociation-rules(@dsTitanic, min-support =\u003e 0.3, min-confidence =\u003e 0.7)\n==\u003e to-pretty-table\n```\n\n### Reusing found frequent sets\n\nThe function `frequent-sets` takes the adverb \":object\" that makes `frequent-sets` return an object of type\n`ML::AssociationRuleLearning::Apriori` or `ML::AssociationRuleLearning::Eclat`, \nwhich can be \"pipelined\" to find association rules.\n\nHere we find frequent sets, return the corresponding object, and retrieve the result:\n\n```perl6\nmy $eclatObj = frequent-sets(@dsTitanic.map({ $_.values.List }).Array, min-support =\u003e 0.12, min-number-of-items =\u003e 2, max-number-of-items =\u003e 6):object;\n$eclatObj.result.elems\n```\n\nHere we find association rules and pretty-print them:\n\n```perl6\n$eclatObj.find-rules(min-confidence=\u003e0.7)\n==\u003e to-pretty-table \n```\n\n**Remark:** Note that because of the specified min confidence, the number of association rules is \"contained\" --\na (much) larger number of rules would be produced with, say, `min-confidence=\u003e0.2`.\n\n\n-------\n\n## Implementation considerations\n\n### UML diagram\n\nHere is a UML diagram that shows package's structure:\n\n![](./resources/class-diagram.png)\n\n\nThe\n[PlantUML spec](./resources/class-diagram.puml)\nand\n[diagram](./resources/class-diagram.png)\nwere obtained with the CLI script `to-uml-spec` of the package \"UML::Translators\", [AAp6].\n\nHere we get the [PlantUML spec](./resources/class-diagram.puml):\n\n```shell\nto-uml-spec ML::AssociationRuleLearning \u003e ./resources/class-diagram.puml\n```\n\nHere get the [diagram](./resources/class-diagram.png):\n\n```shell\nto-uml-spec ML::AssociationRuleLearning | java -jar ~/PlantUML/plantuml-1.2022.5.jar -pipe \u003e ./resources/class-diagram.png\n```\n\n**Remark:** Maybe it is a good idea to have an abstract class named, say,\n`ML::AssociationRuleLearning::AbstractFinder` that is a parent of both\n`ML::AssociationRuleLearning::Apriori` and `ML::AssociationRuleLearning::Eclat`,\nbut I have not found to be necessary. (At this point of development.)\n\n### Eclat\n\nWe can say that Eclat uses a \"vertical database representation\" of the transactions.\n\nEclat is based on Raku's \n[sets, bags, and mixes](https://docs.raku.org/language/setbagmix)\nfunctionalities.\n\nEclat represents the transactions as a hash of sets:\n\n- The keys of the hash are items\n\n- The elements of the sets are transaction identifiers.\n\n(In other words, for each item an inverse index is made.)\n\nThis representation allows for quick calculations of item combinations support.\n\n### Apriori \n\nApriori uses the standard, horizontal database transactions representation.\n\nWe can say that Apriori:\n\n- Generates candidates for item frequent sets using the routine \n  [`combinations`](https://docs.raku.org/routine/combinations)\n\n- Filters candidates by \n  [Tries with frequencies](https://github.com/antononcube/Raku-ML-TriesWithFrequencies) \n  creation and removal by threshold\n\nApriori is usually (much) slower than Eclat. \nHistorically, Apriori is the first ARL method, and its implementation in the package is didactic.\n\n### Association rules\n\nWe can say that the association rule finding function is a general one, but that function\ndoes require fast computation of confidence, lift, etc. Hence Eclat's transactions representation\nis used.\n\nAssociation rules finding with Apriori is the same as with Eclat. \nThe package function `assocition-rules` with the option setting `method=\u003e'Apriori'`\nsimply sends frequent sets found with Apriori to the Eclat based association rule finding.\n\n-------\n\n## References\n\n### Articles\n\n[Wk1] Wikipedia entry, [\"Association Rule Learning\"](https://en.wikipedia.org/wiki/Association_rule_learning).\n\n[AA1] Anton Antonov,\n[\"Movie genre associations\"](https://mathematicaforprediction.wordpress.com/2013/10/06/movie-genre-associations/),\n(2013),\n[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).\n\n[AA2] Anton Antonov,\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n(2021),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n### Packages\n\n[AAp1] Anton Antonov,\n[Implementation of the Apriori algorithm in Mathematica](https://github.com/antononcube/MathematicaForPrediction/blob/master/AprioriAlgorithm.m),\n(2014-2016),\n[MathematicaForPrediction at GitHub/antononcube](https://github.com/antononcube/MathematicaForPrediction/).\n\n[AAp1a] Anton Antonov\n[Implementation of the Apriori algorithm via Tries in Mathematica](https://github.com/antononcube/MathematicaForPrediction/blob/master/Misc/AprioriAlgorithmViaTries.m),\n(2022),\n[MathematicaForPrediction at GitHub/antononcube](https://github.com/antononcube/MathematicaForPrediction/).\n\n[AAp2] Anton Antonov,\n[Implementation of the Eclat algorithm in Mathematica](https://github.com/antononcube/MathematicaForPrediction/blob/master/EclatAlgorithm.m),\n(2022),\n[MathematicaForPrediction at GitHub/antononcube](https://github.com/antononcube/MathematicaForPrediction/).\n\n[AAp3] Anton Antonov,\n[Data::Generators Raku package](https://raku.land/cpan:ANTONOV/Data::Generators),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp4] Anton Antonov,\n[Data::Reshapers Raku package](https://raku.land/cpan:ANTONOV/Data::Reshapers),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp5] Anton Antonov,\n[Data::Summarizers Raku package](https://raku.land/cpan:ANTONOV/Data::Summarizers),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp6] Anton Antonov,\n[UML::Translators Raku package](https://raku.land/zef:antononcube/UML::Translators),\n(2022),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp7] Anton Antonov,\n[ML::TrieWithFrequencies Raku package](https://raku.land/cpan:ANTONOV/ML::TriesWithFrequencies),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-ml-associationrulelearning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-ml-associationrulelearning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-ml-associationrulelearning/lists"}