https://github.com/jeff1evesque/fin-654
Syracuse FIN-654 Final Project
https://github.com/jeff1evesque/fin-654
Last synced: 2 months ago
JSON representation
Syracuse FIN-654 Final Project
- Host: GitHub
- URL: https://github.com/jeff1evesque/fin-654
- Owner: jeff1evesque
- Created: 2019-01-16T02:32:27.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-19T03:44:24.000Z (about 6 years ago)
- Last Synced: 2025-02-09T22:46:07.468Z (4 months ago)
- Language: R
- Homepage:
- Size: 1.87 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fin-654
This project originally desired to study companies recently hacked, otherwise made vulnerable. However, as the course progressed, the analysis mainly focused on the general portfolio. Furthermore, the data covers a span less than 1.5 years, a constraint of the original data and original project aspiration:
* [Privacy Rights Clearinghouse](https://www.privacyrights.org/data-breaches)
* [World's Biggest Data Breaches & Hacks](https://informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)Specifically, the above data was merged with historical stock prices. To reduce development, corresponding [stock data](https://github.com/jeff1evesque/fin-654/tree/master/data) was collected locally. However, an [untested feature](https://github.com/jeff1evesque/fin-654/blob/8d3606fee63c0b4c9ad16f633c73c6287211a94b/app.R#L315) was coded, allowing stock prices to be collected from [quandl](https://www.quandl.com/).
## Dashboard
The dashboard shows the overall variance for selected companies within the portfolio. Since variance is a measure of risk, the smallest overall variance is preferred for less risk-averse investors:

However, if the general time series displays a pattern of seasonality, and a model can be trained with good predictive abilities, then high volatility provides an investment opportunity.
## Exploratory
Some exploratory analysis was conducted on individual company stock. Specifically, timeseries plots were made, as well as [autocorrelation function](https://en.wikipedia.org/wiki/Autocorrelation) (ACF), and [partial autocorrelation function](https://people.duke.edu/~rnau/411arim3.htm) (PACF) plots. However, later analysis focused on the collective portfolio, rather than individual timeseries. An overall decomposed time series was generated:

The decomposition consists of the following [components](https://machinelearningmastery.com/decompose-time-series-data-trend-seasonality/):
* Level: The average value in the series.
* Trend: The increasing or decreasing value in the series.
* Seasonality: The repeating short-term cycle in the series.
* Noise: The random variation in the series.If more time were to be allocated to this project, an overall ACF and PACF would be computed, and would [determine](https://www.youtube.com/watch?v=R-oWTWdS1Jg) autoregression (AR), and the moving average (MA) components to the below Arima model.
## General Pareto Distribution
A [general pareto distribution](https://www.mathworks.com/help/stats/examples/modelling-tail-data-with-the-generalized-pareto-distribution.html) (GPD) was computed as a risk measure for the overall portfolio. Though some components were visually minimized, the GPD was computed for the overall opening, closing, and general volume. Moreover, the [value at risk](https://github.com/jeff1evesque/fin-654/blob/master/resources/VAR.pdf) (VaR) is a measure of potential loss for a given portfolio, while the [expected shortfall](https://en.wikipedia.org/wiki/Expected_shortfall) (ES) is the average of all losses greater than the VaR. Both measures, are provided with the below GPD distributions:

Since this project made some great simplifications, the portfolio was equally distributed (one share) among the selected stocks. Therefore, corresponding risk measures are significantly small.
**Note:** the user-interface allows different segments to be toggled. Additionally, content on the above VAR was borrowed from [Professor Damodaran](http://people.stern.nyu.edu/adamodar/), from the Stern School of Business at New York University.
## Efficient Frontier
A general [efficient frontier](https://www.youtube.com/watch?v=PiXrLGMZr1g) was created, along with the tangent markowitz model to signify the most efficient portfolio. Moreover, individual stocks were also plotted:

## Arima Model
A general [arima model](https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/) was computed for the overall portfolio:

A [stationarity](https://www.youtube.com/watch?v=ZIWyGjrAlks) test using the [augmented dickey fuller test](https://www.youtube.com/watch?v=X8nGZ2UCJsk) was implemented. Moreover, ACF and PACF measures provide suggestive values for the AR and MA arguments as an approach to reduce [seasonal patterns](https://github.com/jeff1evesque/fin-654/blob/master/resources/Slides_on_ARIMA_models--Robert_Nau.pdf). Furthermore, a general [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE) was computed to allow comparison with the below recurrent neural network.
## Recurrent Neural Network
A [long-short-term-memory](https://www.youtube.com/watch?v=QuELiw8tbx8) (LSTM) recurrent neural network was created:
