https://github.com/hibayesian/spark-fim

A library of scalable frequent itemset mining algorithms based on Spark
https://github.com/hibayesian/spark-fim

frequent-itemset-mining machine-learning spark

Last synced: 9 months ago
JSON representation

A library of scalable frequent itemset mining algorithms based on Spark

Host: GitHub
URL: https://github.com/hibayesian/spark-fim
Owner: hibayesian
License: apache-2.0
Created: 2017-05-22T02:36:10.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-06-07T03:46:14.000Z (about 9 years ago)
Last Synced: 2025-10-08T09:56:08.195Z (9 months ago)
Topics: frequent-itemset-mining, machine-learning, spark
Language: Scala
Size: 32.2 KB
Stars: 8
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Spark-FIM

Spark-FIM is a library of scalable frequent itemset mining algorithms based on Spark. It includes:

  + PHybridFIN - A parallel frequent itemset mining algorithm based on a novel data structure named HybridNodeset to represent itemsets. It achieves a significantly better performance on different datasets when the minimum support decreases comparing to the FP-Growth algorithm which is implemented in Spark MLlib.

# Examples

## Scala API

```scala

val minSupport = 0.85

val numPartitions = 4

val spark = SparkSession

  .builder()

  .appName("PHyrbidFINExample")

  .master("local[*]")

  .getOrCreate()

val schema = new StructType(Array(

  StructField("features", StringType)))

val transactions = spark.read.schema(schema).text("data/chess.csv").cache()

val numTransactions = transactions.count()

val startTime = currentTime

val freqItemsets = new PHybridFIN()

  .setMinSupport(minSupport)

  .setNumPartitions(transactions.rdd.getNumPartitions)

  .setDelimiter(" ")

  .transform(transactions)

val numFreqItemsets = freqItemsets.count()

val endTime = currentTime

val totalTime: Double = endTime - startTime

println(s"====================== PHybridFIN - STATS ===========================")

println(s" minSupport = " + minSupport + s"    numPartition = " + numPartitions)

println(s" Number of transactions: " + numTransactions)

println(s" Number of frequent itemsets: " + numFreqItemsets)

println(s" Total time = " + totalTime/1000 + "s")

println(s"=====================================================================")

spark.stop()

```

# Requirements

Spark-FIM is built against Spark 2.1.1.

# Build From Source

```scala

sbt package

```

# Licenses

Spark-FIM is available under Apache Licenses 2.0.

# Contact & Feedback

If you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:

+ hibayesian (hibayesian@gmail.com).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hibayesian/spark-fim

Awesome Lists containing this project

README