Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mikemybytes/spark-fim
Frequent Itemsets Mining algorithms for Apache Spark
https://github.com/mikemybytes/spark-fim
Last synced: 15 days ago
JSON representation
Frequent Itemsets Mining algorithms for Apache Spark
- Host: GitHub
- URL: https://github.com/mikemybytes/spark-fim
- Owner: mikemybytes
- License: mit
- Created: 2015-09-21T20:00:23.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2015-10-01T00:05:34.000Z (about 9 years ago)
- Last Synced: 2024-08-01T16:38:45.726Z (3 months ago)
- Language: Scala
- Size: 176 KB
- Stars: 5
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Spark-FIM
Spark-FIM provides Frequent Itemsets Mining algorithms implemented for Apache Spark platform (using Scala).
Currently the following algorithms are available:* _DistEclat_ (_Moens, Sandy, Emin Aksehirli, and Bart Goethals. "Frequent itemset mining for big data." Big Data,
2013 IEEE International Conference on. IEEE, 2013_ - MapReduce implementation available [here](https://gitlab.com/adrem/bigfim-sa))
* _BigFIM_ (published & distributed with _DistEclat_).### Building project
Since the project uses [sbt](http://www.scala-sbt.org/) as a build tool, to prepare _.jar_ package run:
sbt package
### Running testsSpark provides a great way to simulate distributed environment on the local computer. To run all the tests (including Spark running in local mode) run:
sbt test
### Running algorithmsAll available algorithms can be customized easily using command line arguments. To see available parameters run driver class without arguments:
# DistEclat
spark-submit --class "net.mkowalski.sparkfim.driver.DistEclatDriver"
# BigFIM
spark-submit --class "net.mkowalski.sparkfim.driver.BigFimDriver"