https://github.com/banool/comp30018-assn2
Assignment 2 for COMP30018 - Knowledge Technologies. The report is the main part, python scikit code also included.
https://github.com/banool/comp30018-assn2
Last synced: 5 months ago
JSON representation
Assignment 2 for COMP30018 - Knowledge Technologies. The report is the main part, python scikit code also included.
- Host: GitHub
- URL: https://github.com/banool/comp30018-assn2
- Owner: banool
- Created: 2016-10-01T06:28:55.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-10-19T03:53:23.000Z (over 8 years ago)
- Last Synced: 2025-03-14T23:43:22.575Z (over 1 year ago)
- Language: HTML
- Homepage:
- Size: 21 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Knowledge Technologies Assignment 2
`ml.py` is the main script. The layout is a bit messy, but considering that it's
not marked quite a bit of work went into making it easily extensible for both
additional classifiers as well as additional evaluation methods. There is also
functionality which pickles the processed data pulled from the arff files, which
cuts down the time of each run enormously (from the order of 5 minutes for the
446 training dataset down to a few seconds).
Just try to run ml.py and it'll print usage information. Use it like this:
`./ml.py 446 nb macro`
The averaging type (e.g. "macro") is optional.
The supported datasets are 35 and 446. These represent the development datasets.
If you want to use the test sets instead, the syntax is 35test and 446test.
If you run the scripts with the test sets, it will crash after generating the
predictions (since there are no labels with which to evaluate), but everything
up to that point, including the predictions, will work fine.
The classifiers are nb, svm and nbBoosted, from an earlier idea where I would
experiment with boosting the algorithm with AdaBoost and changing the kernel
for SVM.
`plot_confusion_matrix.py` is just some script from the sci-kit documentation
which I modified, all it does is print a graphical confusion matrix.
Source here: https://goo.gl/XwMr6N
Predictions have been included for the four primary datasets considered. If
marking requires only one set, use predictedNB446test, as it has the best
results. These are of course for the test data, not the dev data.
## Results
Final result: 14.5/15
Critical analysis: 6.5/7 || Technical Tasks: 1/1 || Creativity: 1/1 || Report quality: 3/3 || Reviews: 3/3