https://github.com/paveles/Machine_Learning_and_Equity_Index_Returns
Machine Learning and Equity Index Returns
https://github.com/paveles/Machine_Learning_and_Equity_Index_Returns
cross-validation machine-learning predictive-modeling stock-market
Last synced: about 1 year ago
JSON representation
Machine Learning and Equity Index Returns
- Host: GitHub
- URL: https://github.com/paveles/Machine_Learning_and_Equity_Index_Returns
- Owner: paveles
- License: mit
- Created: 2019-04-05T08:55:41.000Z (about 7 years ago)
- Default Branch: release
- Last Pushed: 2023-02-25T22:28:13.000Z (over 3 years ago)
- Last Synced: 2024-10-31T18:38:32.515Z (over 1 year ago)
- Topics: cross-validation, machine-learning, predictive-modeling, stock-market
- Language: Jupyter Notebook
- Homepage:
- Size: 62.1 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
Machine Learning and Equity Index Returns
==============================
## A time-series predictive framework that features:
- Application of advanced machine learning algorithms.
- Usage of scikit-learn pipelines that simplify automation of the analysis.
- New scikit-learn transformers.
- New scikit-learn time-series cross-validation methods (one step forward expanding window nested cross-validation).
- Domain-tailored statistical tests on the significance of improvement in prediction accuracy.
- [Jupyter notebooks and a report that explain and visualize obtained findings.](/reports/Results.ipynb)
- Clear project structure with a makefile based on a data science template.
- Dockerfile to run the project on any platform or in the cloud
## Project Organization
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for using this project.
|
├── out
│ ├── expanding <- Output results for the models with one step ahead expanding window nested cross-validation.
│ └── rolling <- Output results for the models with one step ahead fixed rolling window nested cross-validation.
|
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module.
│ │
│ ├── data.py <- Scripts to generate data.
│ │
│ ├── train.py <- Script to train models and then use trained models to make
│ │ predictions.
│ ├── visualize.py <- Scripts to create exploratory and results-oriented visualization.
│ │
│ ├── model_configs.py <- Configurations, GridSearch methods, cross-validation methods of the models.
│ ├── settings.py <- Global settings and variables + loads `model_configs`.
│ ├── transform_cv.py <- Transformation and cross-validation methods used in the analysis.
│ └── walkforward_functions.py <- Main functions used to estimate and evaluate trained models.
│
│
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
## Workflow
- Setup:
- Clone or download this repository.
- Install Make. See the [website](https://www.gnu.org/software/make/).
- You can access help for the Makefile by typing `make` in the project folder.
- `make create_environment` to create a new virtual environment. This new environment will be called "epml", an abbreviation for Equity Premium and Machine Learning.
- Activate the new environment. In Anaconda, `conda activate epml`.
- Added new packages to `requirements.txt` if needed.
- `make requirements` to install packages.
- Analysis:
- Activate the new environment before starting your analysis. In Anaconda, `conda activate epml`.
- `make data` to prepare the data.
- Change settings in `settings.py` to choose models to be estimated and evaluated (for the first run, one simple model is already chosen).
- `make train` to train the chosen models (please note that some models take long hours to run).
- `make visualize` to get prediction accuracy and produce a figure summarizing strategy performance.
## Setup Details for Windows
There are some challenges to install Make on Windows. These steps might help:
- To install Make on Windows use a prebuilt [Installer](https://github.com/swcarpentry/windows-installer/releases/tag/v0.3) from Software Carpentry. Please add the Make directory to the the system environment variable PATH, e.g. `C:\Users\Admin\.swc\lib\make`.
- You can test Make by accessing help for the Makefile by typing `make` in the project folder.
- In case Make still does not work, please install [MSYS2](https://www.msys2.org/) and add its `bin` directory to the the system environment variable PATH, e.g. `C:\msys64\usr\bin`.
- Try restarting your computer.
## Alternatively, use Docker
To run this project via Docker:
- Install [Docker](https://www.docker.com/) on your computer.
- Build Docker image in the project folder containing the Dockerfile `docker image build --tag=epml .`.
- To run container with the predefined code in `main.py`, type `docker run -v Absolute/Path/At/Localhost/out:/code/out/ epml:latest`. The `-v Absolute/Path/On/Localhost/out:/code/out/` part ensures that output produced in the Docker container is saved in the `/out/` folder of the project at your computer (please create this folder beforehand).
- Alternatively, run the container iteratively (with the ability to modify files and run make commands described in the workflow above inside the container) `docker container run -v Absolute/Path/At/Localhost/out:/code/out/ -it epml:latest bash`.
--------
Project based on the cookiecutter data science project template. #cookiecutterdatascience