https://github.com/hibayesian/spark-fim
A library of scalable frequent itemset mining algorithms based on Spark
https://github.com/hibayesian/spark-fim
frequent-itemset-mining machine-learning spark
Last synced: 5 months ago
JSON representation
A library of scalable frequent itemset mining algorithms based on Spark
- Host: GitHub
- URL: https://github.com/hibayesian/spark-fim
- Owner: hibayesian
- License: apache-2.0
- Created: 2017-05-22T02:36:10.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-07T03:46:14.000Z (over 8 years ago)
- Last Synced: 2025-10-08T09:56:08.195Z (5 months ago)
- Topics: frequent-itemset-mining, machine-learning, spark
- Language: Scala
- Size: 32.2 KB
- Stars: 8
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Spark-FIM
Spark-FIM is a library of scalable frequent itemset mining algorithms based on Spark. It includes:
+ PHybridFIN - A parallel frequent itemset mining algorithm based on a novel data structure named HybridNodeset to represent itemsets. It achieves a significantly better performance on different datasets when the minimum support decreases comparing to the FP-Growth algorithm which is implemented in Spark MLlib.
# Examples
## Scala API
```scala
val minSupport = 0.85
val numPartitions = 4
val spark = SparkSession
.builder()
.appName("PHyrbidFINExample")
.master("local[*]")
.getOrCreate()
val schema = new StructType(Array(
StructField("features", StringType)))
val transactions = spark.read.schema(schema).text("data/chess.csv").cache()
val numTransactions = transactions.count()
val startTime = currentTime
val freqItemsets = new PHybridFIN()
.setMinSupport(minSupport)
.setNumPartitions(transactions.rdd.getNumPartitions)
.setDelimiter(" ")
.transform(transactions)
val numFreqItemsets = freqItemsets.count()
val endTime = currentTime
val totalTime: Double = endTime - startTime
println(s"====================== PHybridFIN - STATS ===========================")
println(s" minSupport = " + minSupport + s" numPartition = " + numPartitions)
println(s" Number of transactions: " + numTransactions)
println(s" Number of frequent itemsets: " + numFreqItemsets)
println(s" Total time = " + totalTime/1000 + "s")
println(s"=====================================================================")
spark.stop()
```
# Requirements
Spark-FIM is built against Spark 2.1.1.
# Build From Source
```scala
sbt package
```
# Licenses
Spark-FIM is available under Apache Licenses 2.0.
# Contact & Feedback
If you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:
+ hibayesian (hibayesian@gmail.com).