{"id":21458535,"url":"https://github.com/hibayesian/spark-fim","last_synced_at":"2025-10-08T09:56:11.200Z","repository":{"id":85150677,"uuid":"92003965","full_name":"hibayesian/spark-fim","owner":"hibayesian","description":"A library of scalable frequent itemset mining algorithms based on Spark","archived":false,"fork":false,"pushed_at":"2017-06-07T03:46:14.000Z","size":33,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-08T09:56:08.195Z","etag":null,"topics":["frequent-itemset-mining","machine-learning","spark"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hibayesian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-22T02:36:10.000Z","updated_at":"2025-08-07T12:22:00.000Z","dependencies_parsed_at":"2023-03-04T20:15:30.754Z","dependency_job_id":null,"html_url":"https://github.com/hibayesian/spark-fim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hibayesian/spark-fim","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hibayesian%2Fspark-fim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hibayesian%2Fspark-fim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hibayesian%2Fspark-fim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hibayesian%2Fspark-fim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hibayesian","download_url":"https://codeload.github.com/hibayesian/spark-fim/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hibayesian%2Fspark-fim/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278924144,"owners_count":26069400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["frequent-itemset-mining","machine-learning","spark"],"created_at":"2024-11-23T06:23:09.334Z","updated_at":"2025-10-08T09:56:11.144Z","avatar_url":"https://github.com/hibayesian.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spark-FIM\nSpark-FIM is a library of scalable frequent itemset mining algorithms based on Spark. It includes:\n  + PHybridFIN - A parallel frequent itemset mining algorithm based on a novel data structure named HybridNodeset to represent itemsets. It achieves a significantly better performance on different datasets when the minimum support decreases comparing to the FP-Growth algorithm which is implemented in Spark MLlib.\n\n# Examples\n## Scala API\n```scala\nval minSupport = 0.85\nval numPartitions = 4\n\nval spark = SparkSession\n  .builder()\n  .appName(\"PHyrbidFINExample\")\n  .master(\"local[*]\")\n  .getOrCreate()\n\nval schema = new StructType(Array(\n  StructField(\"features\", StringType)))\nval transactions = spark.read.schema(schema).text(\"data/chess.csv\").cache()\nval numTransactions = transactions.count()\nval startTime = currentTime\nval freqItemsets = new PHybridFIN()\n  .setMinSupport(minSupport)\n  .setNumPartitions(transactions.rdd.getNumPartitions)\n  .setDelimiter(\" \")\n  .transform(transactions)\n\nval numFreqItemsets = freqItemsets.count()\nval endTime = currentTime\nval totalTime: Double = endTime - startTime\n\nprintln(s\"====================== PHybridFIN - STATS ===========================\")\nprintln(s\" minSupport = \" + minSupport + s\"    numPartition = \" + numPartitions)\nprintln(s\" Number of transactions: \" + numTransactions)\nprintln(s\" Number of frequent itemsets: \" + numFreqItemsets)\nprintln(s\" Total time = \" + totalTime/1000 + \"s\")\nprintln(s\"=====================================================================\")\n\nspark.stop()\n```\n\n# Requirements\nSpark-FIM is built against Spark 2.1.1.\n\n# Build From Source\n```scala\nsbt package\n```\n\n# Licenses\nSpark-FIM is available under Apache Licenses 2.0.\n\n# Contact \u0026 Feedback\nIf you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:\n+ hibayesian (hibayesian@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhibayesian%2Fspark-fim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhibayesian%2Fspark-fim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhibayesian%2Fspark-fim/lists"}