https://github.com/lucivpav/bachelors-thesis
Source code of my Bachelor's thesis that was made at CTU FIT.
https://github.com/lucivpav/bachelors-thesis
ctu-fit dnbc fit-ctu hmm mle scala spark thesis
Last synced: 8 months ago
JSON representation
Source code of my Bachelor's thesis that was made at CTU FIT.
- Host: GitHub
- URL: https://github.com/lucivpav/bachelors-thesis
- Owner: lucivpav
- License: gpl-3.0
- Created: 2018-05-09T19:15:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-09T20:20:47.000Z (over 7 years ago)
- Last Synced: 2025-02-02T05:26:17.668Z (10 months ago)
- Topics: ctu-fit, dnbc, fit-ctu, hmm, mle, scala, spark, thesis
- Language: TeX
- Size: 8.35 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Parallel implementation of dynamic naive Bayesian classifier
## Abstract
Dynamic naive Bayesian classifier (DNBC) has many applications, such as
in speech recognition, handwriting recognition or weather prediction. DNBC
viiextends a hidden Markov model by supporting multiple observed variables. It
is assumed that these variables are mutually statistically independent. This
assumption greatly simplifies computations and a phenomenon called curse
of dimensionality does not occur. I have implemented the classifier in Scala
language on top of Apache Spark. The implementation can be parallelized by
using Map Reduce paradigm. I have managed to double the speed up when
using 15 processor cores. I have further demonstrated, that the speed up can
be achieved not only by increasing the number of cores, but also by increasing
the number of machines in a cluster.
## Paper
Download the paper [here](https://github.com/lucivpav/bachelors-thesis/raw/master/Pavel%20Lu%C4%8Div%C5%88%C3%A1k%20BP.pdf).