https://github.com/networks-learning/memorize

Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019
https://github.com/networks-learning/memorize

algorithm control duolingo machine-learning pnas point-processes spaced-repetition

Last synced: 6 months ago
JSON representation

Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019

Host: GitHub
URL: https://github.com/networks-learning/memorize
Owner: Networks-Learning
License: mit
Created: 2019-01-08T17:41:12.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-01-10T15:40:38.000Z (over 2 years ago)
Last Synced: 2024-12-12T19:44:14.360Z (6 months ago)
Topics: algorithm, control, duolingo, machine-learning, pnas, point-processes, spaced-repetition
Language: Jupyter Notebook
Homepage: http://learning.mpi-sws.org/memorize/
Size: 1.01 MB
Stars: 181
Watchers: 14
Forks: 28
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Memorize

This is a repository containing code and data for the paper:

> B. Tabibian, U. Upadhyay, A. De, A. Zarezade, Bernhard Schölkopf, and M. Gomez-Rodriguez. _Enhancing Human Learning via Spaced Repetition Optimization._ Proceedings of the National Academy of Sciences (PNAS), March, 2019.

The paper is available [from PNAS website](https://www.pnas.org/content/116/10/3988) and the [supporting website](http://learning.mpi-sws.org/memorize/) also gives a description of our algorithm in a nutshell.

As a follow-up of this work, we tested a variant of the algorithm presented here (named [Select](https://github.com/Networks-Learning/spaced-selection)) in the wild by means of a Randomized Trial and found that it performed significantly better than competitive baselines. We present those findings in the following [paper](https://www.nature.com/articles/s41539-021-00105-8):

> U. Upadhyay, G. Lancashire, C. Moser and M. Gomez-Rodriguez. Large-scale randomized experiment reveals machine learning helps people learn and remember more effectively., npj Science of Learning, 6, Article number: 26 (2021).

## Pre-requisites

This code depends on the following packages:

1. `numpy`
2. `pandas`
3. `matplotlib`
4. `seaborn`
5. `scipy`
6. `dill`
7. `click`

Apart from this, the instructions assume that the [Duolingo dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/N8XJME) has been downloaded, extracted, and saved at `./data/raw/duolingo.csv`.

## Code structure

- `memorize.py` contains the memorize algorithm.
- `preprocesed_weights.csv` contains estimated model parameters for the [HLR model](https://github.com/duolingo/halflife-regression), as described in section 8 of supplementary materials.
- `observations_1k.csv` contains a set of 1K user-item pairs and associated number of total/correct attempts by every user for given items. This dataset has been curated from a larger dataset released by Duolingo, available [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/N8XJME).

## Execution

The code can by executed as follows:

`python memorize.py`

The code will use default parameter value (q) used in the code.

----

# Experiments with Duolingo data

## Pre-processing

Convert to Python `dict` by `user_id, lexeme_id` and pruning it for reading it:

python dataset2dict.py ./data/raw/duolingo.csv ./data/duo_dict.dill --success_prob 0.99 --max_days 30
python process_raw_data.py ./data/raw/duolingo.csv ./data/duolingo_reduced.csv

## Plots

See the notebook `plots.ipynb`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/networks-learning/memorize

Awesome Lists containing this project

README