https://github.com/pnb/dlwed17

Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
https://github.com/pnb/dlwed17

edm educational-data-mining

Last synced: 8 months ago
JSON representation

Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

Host: GitHub
URL: https://github.com/pnb/dlwed17
Owner: pnb
License: mit
Created: 2017-06-27T15:32:08.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2017-07-05T15:51:40.000Z (almost 9 years ago)
Last Synced: 2025-08-04T10:07:05.497Z (11 months ago)
Topics: edm, educational-data-mining
Language: Python
Homepage: http://pnigel.com
Size: 4.36 MB
Stars: 14
Watchers: 1
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
This repository contains the code for the paper (see `bosch-dlwed17-camera.pdf`) presented at the
_Deep Learning with Educational Data_ workshop at the
[2017 Educational Data Mining](http://educationaldatamining.org/EDM2017/) conference.

#### Citation
Bosch, N., & Paquette, L. (2017). Unsupervised deep autoencoders for feature extraction with educational data. In Deep Learning with Educational Data Workshop at the 10th International Conference on Educational Data Mining.

## Requirements
The code was tested with Keras 2.0.3 and Tensorflow 1.1.0 neural network libraries.

Data were from [Betty's Brain](http://www.teachableagents.org/research/bettysbrain.php). These data
are required for the code to run, and are not publicly available. However, the code could be
(relatively) easily adapted to another dataset.

## Model-building steps
Model building generally consists of data preprocessing, autoencoder feature extraction, and
supervised learning phases.

### Data preprocessing
1. `preprocess_bromp.py` - takes raw BROMP files created by the HART application and combines them
into an easily-used format
2. `preprocess_timeseries.py` - creates timeseries (evenly spaced in time) data from Betty's Brain
interaction logs
3. `preprocess_seq.py` - creates sequences suitable for training RNN models from the timeseries
data; sequences are saved to numpy binary files for faster loading later

### Autoencoder feature extraction
1. `ae_lstm.py` - this and similar files (e.g., `vae_lstm.py`) trains the autoencoders
2. `extracy_embeddings.py` - takes a trained model, feeds in data sequences, and saves the
embeddings generated by the model to be used as features for supervised models
3. `align_embeddings+labels.py` - matches up BROMP affect/behavior labels to the embeddings
extracted from a model, saving only the rows with labels to create a file with features and labels
which can be used for supervised learning

### Supervised learning
1. `supervised/ae_feats_test.py` - trains a decision tree (CART) model with the autoencoder features
2. `supervised/expert_feats_extract.py` - extracts some simple features with the traditional method
(manual design by experts) of feature extraction for model building
3. `supervised/expert_feats_test.py` - builds a model using the expert features to serve as a
baseline

## Visualization
`visualize_activations.py` generates images of model activations by feeding in a random subset of
samples to a trained autoencoder and creating histograms of the activations of every layer in the
network. For layers with several neurons (> 15), a subset of neurons is sampled to create a more
tractable image.

The model structure is also visualized (requires the `pydot` package).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pnb/dlwed17

Awesome Lists containing this project

README