{"id":18336035,"url":"https://github.com/maxim5/time-series-machine-learning","last_synced_at":"2025-04-06T16:14:52.578Z","repository":{"id":41192402,"uuid":"93741618","full_name":"maxim5/time-series-machine-learning","owner":"maxim5","description":"Machine learning models for time series analysis","archived":false,"fork":false,"pushed_at":"2021-08-19T11:42:03.000Z","size":186,"stargazers_count":376,"open_issues_count":14,"forks_count":104,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-03-30T15:09:18.260Z","etag":null,"topics":["bitcoin","blockchain","cryptocurrency","deep-learning","ethereum","financial-engineering","machine-learning","neural-network","poloniex-api","poloniex-trade-bot","python","quantitative-finance","recurrent-neural-networks","statistics","tensorflow","time-series","time-series-prediction","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxim5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-08T11:27:12.000Z","updated_at":"2025-03-16T22:09:16.000Z","dependencies_parsed_at":"2022-09-13T05:02:24.684Z","dependency_job_id":null,"html_url":"https://github.com/maxim5/time-series-machine-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Ftime-series-machine-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Ftime-series-machine-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Ftime-series-machine-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Ftime-series-machine-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxim5","download_url":"https://codeload.github.com/maxim5/time-series-machine-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247509237,"owners_count":20950232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitcoin","blockchain","cryptocurrency","deep-learning","ethereum","financial-engineering","machine-learning","neural-network","poloniex-api","poloniex-trade-bot","python","quantitative-finance","recurrent-neural-networks","statistics","tensorflow","time-series","time-series-prediction","xgboost"],"created_at":"2024-11-05T20:05:40.614Z","updated_at":"2025-04-06T16:14:52.560Z","avatar_url":"https://github.com/maxim5.png","language":"Python","funding_links":[],"categories":["📦 Legacy \u0026 Inactive Projects"],"sub_categories":[],"readme":"# Time Series Prediction with Machine Learning\n\nA collection of different Machine Learning models predicting the time series, \nconcretely the market price for given the currency chart and target.\n\n\u003cp align=\"center\"\u003e \n  \u003cimg src=\".images/btc_eth_prediction.png\" alt=\"BTC_ETH chart\" width=\"100%\"/\u003e\n  \u003cimg src=\".images/btc_ltc_prediction.png\" alt=\"BTC_LTC chart\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\nRequirements\n------------\n\nRequired dependency: `numpy`. Other dependencies are optional, but to diversify the final models ensemble, \nit's recommended to install these packages:  `tensorflow`, `xgboost`.\n\nTested with python versions: 2.7.14, 3.6.0.\n\nFetching data\n-------------\n\nThere is one built-in data provider, which fetches the data from [Poloniex exchange](https://poloniex.com/exchange).\nCurrently, all models have been tested with crypto-currencies' charts.\n\nFetched data format is standard security [OHLC trading info](https://en.wikipedia.org/wiki/Open-high-low-close_chart): \ndate, high, low, open, close, volume, quoteVolume, weightedAverage.\nBut the models are agnostic of the particular time series features and can be trained with sub- or superset of these features.\n\nTo fetch the data, run [`run_fetch.py`](run_fetch.py) script from the root directory:\n\n```sh\n# Fetches the default tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC for all time periods.\n$ ./run_fetch.py\n```\n\nBy default, the data is fetched for all time periods available in Poloniex (day, 4h, 2h, 30m, 15m, 5m) \nand is stored in `_data` directory. One can specify the tickers and periods via command-line arguments.\n\n```sh\n# Fetches just BTC_ETH ticker data for only 3 time periods.\n$ ./run_fetch.py BTC_ETH --period=2h,4h,day\n```\n\n**Note**: the second and following runs *won't* fetch all charts from scratch, but just the update from the last run till now.\n\nTraining the models\n-------------------\n\nTo start training, run [`run_train.py`](run_train.py) script from the root directory:\n\n```sh\n# Trains all models until stopped.\n# The defaults: \n# - tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC\n# - period: day\n# - target: high\n$ ./run_train.py\n\n# Trains the models for specified parameters.\n$ ./run_train.py --period=4h --target=low BTC_BCH\n```\n\nBy default, the script trains all available methods (see below) with random hyper-parameters, cross-validates each model and\nsaves the result weights if the performance is better than current average (the limit can be configured). \n\nAll models are placed to the `_zoo` directory (note: it is possible that early saved models will perform much worse than \nlater ones, so you're welcome to clean-up the models you're definitely not interested in, because they can only spoil \nthe final ensemble).\n\n**Note 1**: specifying multiple periods and targets will force the script to train all combinations of those. \nCurrently, the models *do not* reuse weights for different targets. In other words, if set `--target=low,high`, \nit will train *different* models particularly for `low` and for `high`.\n\n**Note 2**: under the hood, the models work with transformed data, \nin particular `high`, `low`, `open`, `close`, `volume` are transform to *percent changes*. Hence, the prediction for these\ncolumns is also *percent changes*.\n\nMachine Learning methods\n------------------------\n\nCurrently supported methods:\n- Ordinary linear model. Even though it's very simple, as it turns out, the linear regression shows pretty good results\n  and compliments the more complex models in the final ensemble.\n- Gradient boosting (using `xgboost` implementation).\n- Deep neural network (in `tensorflow`).\n- Recurrent neural network: LSTM, GRU, one or multi-layered (in `tensorflow` as well).\n- Convolutional neural network for 1-dimensional data (in `tensorflow` as well).\n\nAll models take as input a window of certain size (named `k`) and predict a single target value for the next time step. \nExample:\nwindow size `k=10` means that the model accepts `(x[t-10], x[t-9], ..., x[t-1])` array to predict `x[t].target`. \nEach of `x[i]` includes a number of features (open, close, volume, etc). Thus, the model takes `10 * features` values in\nand outputs a single value - percent change for the target column.\n\nInspecting the model\n--------------------\n\nSaved models consist of the following files:\n - `run-params.txt`: each model has the following run parameters:\n    - Ticker name, e.g., `BTC_ETH`.\n    - Time period, e.g., `4h`.\n    - Target column, e.g., `high` (means the model is predicting the next high price).\n    - Model class, e.g., `RecurrentModel`.\n    - The `k` value, which denotes the input length, \n      e.g., `k=16` with `period=day` means the model needs 16 days to predict the next one.\n - `model-params.txt`: holds the specific hyper-parameters that the model was trained with.\n - `stats.txt`: evaluation statistics (for both training and test sets, see the details below).\n - One or several files holding the internal weights.\n \nEach model is evaluated for both training and test set, but the final evaluation score is computed *only from the test set*.\n  \nHere's the example report:\n\n```\n# Test results:\nMean absolute error: 0.019528\nSD absolute error:   0.023731\nSign accuracy:       0.635158\nMean squared error:  0.000944\nSqrt of MSE:         0.030732\nMean error:          -0.001543\nResiduals stats:     mean=0.0195 std=0.0238 percentile=[0%=0.0000 25%=0.0044 50%=0.0114 75%=0.0252 90%=0.0479 100%=0.1917]\nRelative residuals:  mean=1.1517 std=0.8706 percentile=[0%=0.0049 25%=0.6961 50%=0.9032 75%=1.2391 90%=2.3504 100%=4.8597]\n```\n\nYou should read it like this: \n - The model is on average `0.019528` or about 2% away from the ground truth percent change (absolute difference),\n   but only `-0.001543` away taking into account the sign. In other words, the model underestimates and overestimates the\n   target equally, usually by 2%.\n - The standard deviation of residuals is also about 2%: `0.023731`, so it's rarely far off the target.\n - The model is 63% right about the sign of the change: `0.635158`. \n   For example, this means that when the model says *\"Buy!\"*,\n   it may be wrong about how high the predicted price will be, but the price will go up in 63% of the cases.\n - Residuals and relative residuals show the percentiles of error distribution. In particular, in 75% of the cases\n   the residual percent value is less than 2.5% away from the ground truth and no more than 124% larger relatively.\n   \n   Example: if `truth=0.01` and `prediction=0.02`, then `residual=0.01` (1% away) and `relative_residual=1.0` (100% larger).\n   \nIn the end, the report is summarized to one evaluation result, which is `mean_abs_error + risk_factor * sd_abs_error`.\nYou can vary the `risk_factor` to prefer the models that are better or worse on average vs in the worst case. \nBy default, `risk_factor=1.0`, hence the model above is evaluated at `0.0433`. Lower evaluation is better.\n\nRunning predictions\n-------------------\n\nThe [`run_predict.py`](run_predict.py) script downloads the current trading data for the selected currencies and runs an \nensemble of several best models (5 by default) that have been saved for these currencies, period and target. \nResult prediction is the aggregated value of constituent model predictions.\n\n```sh\n# Runs ensemble of best models for BTC_ETH ticker and outputs the aggregated prediction.\n# Default period: day, default target: high.\n$ ./run_predict.py BTC_ETH\n```\n\nLicense\n-------\n\n[Apache 2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim5%2Ftime-series-machine-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxim5%2Ftime-series-machine-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim5%2Ftime-series-machine-learning/lists"}