Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucivpav/dnbc-scala
Parallel implementation of dynamic naive Bayesian classifier
https://github.com/lucivpav/dnbc-scala
apache-spark bayesian-networks ctu-fit dnbc fit-ctu naive-bayes-classifier scala spark
Last synced: about 1 month ago
JSON representation
Parallel implementation of dynamic naive Bayesian classifier
- Host: GitHub
- URL: https://github.com/lucivpav/dnbc-scala
- Owner: lucivpav
- Created: 2018-02-13T14:54:15.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-09T19:24:33.000Z (over 6 years ago)
- Last Synced: 2024-11-01T14:36:54.592Z (3 months ago)
- Topics: apache-spark, bayesian-networks, ctu-fit, dnbc, fit-ctu, naive-bayes-classifier, scala, spark
- Language: Java
- Size: 2.6 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dnbc-scala
Parallel implementation of dynamic naive Bayesian classifier# Download the [paper](https://github.com/lucivpav/bachelors-thesis/raw/0cc4b877f1c41fdd7a91b923ce885bec820b7f0b/Pavel%20Lu%C4%8Div%C5%88%C3%A1k%20BP.pdf).
## Accuracy
Data sets based on [Toy Robot data set](https://www.cs.princeton.edu/courses/archive/fall06/cos402/hw/hw5/hw5.html)|Data set type |Average success rate [%]|
|--------------------------------|------------------------|
|Discrete |65 |
|Continuous |42 |
|Bivariate |76 |
|Gaussian mixture (without hint) |96 |
|Gaussian mixture (with hint) |99 |The average success rate means the average percentage of hidden states inferred correctly.
There are two main reasons for relatively low overall sucess rate:
1) Only about 90% of observed symbols are accurate
2) There are multiple transitions to hidden states with the same observed symbol## Performance
### Data set|Property |Value|
|--------------------------------|-----|
|Number of hidden states |10 |
|Sequence length |200 |
|Observed discrete variables |5 |
|Observed continuous variables |5 |
|Learning set length (#sequences)|1000 |
|Testing set length (#sequences) |200 |
|Max Gaussians per mixture |3 |
|Transitions per hidden state |5 |### Machine
|Property |Value |
|---------|---------------------------------------|
|Processor|2× 8-core Intel Xeon E5-2650 v2 2.6 GHz|
|Memory |15 GB |
|Disk |10 GB HDD |### Results
|Property |Workers=1|Workers=2|Workers=4|Workers=8|Workers=15|
|------------------------|---------|---------|---------|---------|----------|
|Learning time speed up |1 |1.3 |1.5 |1.8 |2 |