Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/networks-learning/memorize
Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019
https://github.com/networks-learning/memorize
algorithm control duolingo machine-learning pnas point-processes spaced-repetition
Last synced: about 5 hours ago
JSON representation
Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019
- Host: GitHub
- URL: https://github.com/networks-learning/memorize
- Owner: Networks-Learning
- License: mit
- Created: 2019-01-08T17:41:12.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-10T15:40:38.000Z (almost 2 years ago)
- Last Synced: 2024-04-16T02:14:56.070Z (7 months ago)
- Topics: algorithm, control, duolingo, machine-learning, pnas, point-processes, spaced-repetition
- Language: Jupyter Notebook
- Homepage: http://learning.mpi-sws.org/memorize/
- Size: 1.01 MB
- Stars: 173
- Watchers: 14
- Forks: 28
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Memorize
This is a repository containing code and data for the paper:
> B. Tabibian, U. Upadhyay, A. De, A. Zarezade, Bernhard Schölkopf, and M. Gomez-Rodriguez. _Enhancing Human Learning via Spaced Repetition Optimization._ Proceedings of the National Academy of Sciences (PNAS), March, 2019.
The paper is available [from PNAS website](https://www.pnas.org/content/116/10/3988) and the [supporting website](http://learning.mpi-sws.org/memorize/) also gives a description of our algorithm in a nutshell.
As a follow-up of this work, we tested a variant of the algorithm presented here (named [Select](https://github.com/Networks-Learning/spaced-selection)) in the wild by means of a Randomized Trial and found that it performed significantly better than competitive baselines. We present those findings in the following [paper](https://www.nature.com/articles/s41539-021-00105-8):
> U. Upadhyay, G. Lancashire, C. Moser and M. Gomez-Rodriguez. Large-scale randomized experiment reveals machine learning helps people learn and remember more effectively., npj Science of Learning, 6, Article number: 26 (2021).
## Pre-requisites
This code depends on the following packages:
1. `numpy`
2. `pandas`
3. `matplotlib`
4. `seaborn`
5. `scipy`
6. `dill`
7. `click`
Apart from this, the instructions assume that the [Duolingo dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/N8XJME) has been downloaded, extracted, and saved at `./data/raw/duolingo.csv`.## Code structure
- `memorize.py` contains the memorize algorithm.
- `preprocesed_weights.csv` contains estimated model parameters for the [HLR model](https://github.com/duolingo/halflife-regression), as described in section 8 of supplementary materials.
- `observations_1k.csv` contains a set of 1K user-item pairs and associated number of total/correct attempts by every user for given items. This dataset has been curated from a larger dataset released by Duolingo, available [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/N8XJME).## Execution
The code can by executed as follows:
`python memorize.py`
The code will use default parameter value (q) used in the code.
----
# Experiments with Duolingo data
## Pre-processing
Convert to Python `dict` by `user_id, lexeme_id` and pruning it for reading it:
python dataset2dict.py ./data/raw/duolingo.csv ./data/duo_dict.dill --success_prob 0.99 --max_days 30
python process_raw_data.py ./data/raw/duolingo.csv ./data/duolingo_reduced.csv## Plots
See the notebook `plots.ipynb`.