https://github.com/mikemybytes/spark-fim

Frequent Itemsets Mining algorithms for Apache Spark
https://github.com/mikemybytes/spark-fim

Last synced: 4 months ago
JSON representation

Frequent Itemsets Mining algorithms for Apache Spark

Host: GitHub
URL: https://github.com/mikemybytes/spark-fim
Owner: mikemybytes
License: mit
Created: 2015-09-21T20:00:23.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2015-10-01T00:05:34.000Z (almost 10 years ago)
Last Synced: 2024-10-30T08:51:31.455Z (9 months ago)
Language: Scala
Size: 176 KB
Stars: 5
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Spark-FIM

Spark-FIM provides Frequent Itemsets Mining algorithms implemented for Apache Spark platform (using Scala). 

Currently the following algorithms are available:

* _DistEclat_ (_Moens, Sandy, Emin Aksehirli, and Bart Goethals. "Frequent itemset mining for big data." Big Data, 

2013 IEEE International Conference on. IEEE, 2013_ - MapReduce implementation available [here](https://gitlab.com/adrem/bigfim-sa))

* _BigFIM_ (published & distributed with _DistEclat_).

### Building project

Since the project uses [sbt](http://www.scala-sbt.org/) as a build tool, to prepare _.jar_ package run:

    sbt package

    

    

### Running tests

Spark provides a great way to simulate distributed environment on the local computer. To run all the tests (including Spark running in local mode) run:

	sbt test

    

### Running algorithms

All available algorithms can be customized easily using command line arguments. To see available parameters run driver class without arguments:

	# DistEclat

    spark-submit --class "net.mkowalski.sparkfim.driver.DistEclatDriver"

    

    # BigFIM

    spark-submit --class "net.mkowalski.sparkfim.driver.BigFimDriver"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikemybytes/spark-fim

Awesome Lists containing this project

README