https://github.com/pnb/dlwed17
Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
https://github.com/pnb/dlwed17
edm educational-data-mining
Last synced: 8 months ago
JSON representation
Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
- Host: GitHub
- URL: https://github.com/pnb/dlwed17
- Owner: pnb
- License: mit
- Created: 2017-06-27T15:32:08.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-05T15:51:40.000Z (almost 9 years ago)
- Last Synced: 2025-08-04T10:07:05.497Z (11 months ago)
- Topics: edm, educational-data-mining
- Language: Python
- Homepage: http://pnigel.com
- Size: 4.36 MB
- Stars: 14
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
This repository contains the code for the paper (see `bosch-dlwed17-camera.pdf`) presented at the
_Deep Learning with Educational Data_ workshop at the
[2017 Educational Data Mining](http://educationaldatamining.org/EDM2017/) conference.
#### Citation
Bosch, N., & Paquette, L. (2017). Unsupervised deep autoencoders for feature extraction with educational data. In Deep Learning with Educational Data Workshop at the 10th International Conference on Educational Data Mining.
## Requirements
The code was tested with Keras 2.0.3 and Tensorflow 1.1.0 neural network libraries.
Data were from [Betty's Brain](http://www.teachableagents.org/research/bettysbrain.php). These data
are required for the code to run, and are not publicly available. However, the code could be
(relatively) easily adapted to another dataset.
## Model-building steps
Model building generally consists of data preprocessing, autoencoder feature extraction, and
supervised learning phases.
### Data preprocessing
1. `preprocess_bromp.py` - takes raw BROMP files created by the HART application and combines them
into an easily-used format
2. `preprocess_timeseries.py` - creates timeseries (evenly spaced in time) data from Betty's Brain
interaction logs
3. `preprocess_seq.py` - creates sequences suitable for training RNN models from the timeseries
data; sequences are saved to numpy binary files for faster loading later
### Autoencoder feature extraction
1. `ae_lstm.py` - this and similar files (e.g., `vae_lstm.py`) trains the autoencoders
2. `extracy_embeddings.py` - takes a trained model, feeds in data sequences, and saves the
embeddings generated by the model to be used as features for supervised models
3. `align_embeddings+labels.py` - matches up BROMP affect/behavior labels to the embeddings
extracted from a model, saving only the rows with labels to create a file with features and labels
which can be used for supervised learning
### Supervised learning
1. `supervised/ae_feats_test.py` - trains a decision tree (CART) model with the autoencoder features
2. `supervised/expert_feats_extract.py` - extracts some simple features with the traditional method
(manual design by experts) of feature extraction for model building
3. `supervised/expert_feats_test.py` - builds a model using the expert features to serve as a
baseline
## Visualization
`visualize_activations.py` generates images of model activations by feeding in a random subset of
samples to a trained autoencoder and creating histograms of the activations of every layer in the
network. For layers with several neurons (> 15), a subset of neurons is sampled to create a more
tractable image.
The model structure is also visualized (requires the `pydot` package).