Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucivpav/dnbc-scala

Parallel implementation of dynamic naive Bayesian classifier
https://github.com/lucivpav/dnbc-scala

apache-spark bayesian-networks ctu-fit dnbc fit-ctu naive-bayes-classifier scala spark

Last synced: about 1 month ago
JSON representation

Parallel implementation of dynamic naive Bayesian classifier

Host: GitHub
URL: https://github.com/lucivpav/dnbc-scala
Owner: lucivpav
Created: 2018-02-13T14:54:15.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-05-09T19:24:33.000Z (over 6 years ago)
Last Synced: 2024-11-01T14:36:54.592Z (3 months ago)
Topics: apache-spark, bayesian-networks, ctu-fit, dnbc, fit-ctu, naive-bayes-classifier, scala, spark
Language: Java
Size: 2.6 MB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # dnbc-scala

Parallel implementation of dynamic naive Bayesian classifier

# Download the [paper](https://github.com/lucivpav/bachelors-thesis/raw/0cc4b877f1c41fdd7a91b923ce885bec820b7f0b/Pavel%20Lu%C4%8Div%C5%88%C3%A1k%20BP.pdf).

## Accuracy 

Data sets based on [Toy Robot data set](https://www.cs.princeton.edu/courses/archive/fall06/cos402/hw/hw5/hw5.html)

|Data set type                   |Average success rate [%]|

|--------------------------------|------------------------|

|Discrete                        |65                      |

|Continuous                      |42                      |

|Bivariate                       |76                      |

|Gaussian mixture (without hint) |96                      |

|Gaussian mixture (with hint)    |99                      |

The average success rate means the average percentage of hidden states inferred correctly.

There are two main reasons for relatively low overall sucess rate:

1) Only about 90% of observed symbols are accurate

2) There are multiple transitions to hidden states with the same observed symbol

## Performance

### Data set

|Property                        |Value|

|--------------------------------|-----|

|Number of hidden states         |10   |

|Sequence length                 |200  |

|Observed discrete variables     |5    |

|Observed continuous variables   |5    |

|Learning set length (#sequences)|1000 |

|Testing set length (#sequences) |200  |

|Max Gaussians per mixture       |3    |

|Transitions per hidden state    |5    |

### Machine

|Property |Value                                  |

|---------|---------------------------------------|

|Processor|2× 8-core Intel Xeon E5-2650 v2 2.6 GHz|

|Memory   |15 GB                                  |

|Disk     |10 GB HDD                              |

### Results

|Property                |Workers=1|Workers=2|Workers=4|Workers=8|Workers=15|

|------------------------|---------|---------|---------|---------|----------|

|Learning time speed up  |1        |1.3      |1.5      |1.8      |2         |