{"id":13510744,"url":"https://github.com/jmrichardson/tuneta","last_synced_at":"2025-03-30T17:31:21.361Z","repository":{"id":38349873,"uuid":"331153611","full_name":"jmrichardson/tuneta","owner":"jmrichardson","description":"Intelligently optimizes technical indicators and optionally selects the least intercorrelated for use in machine learning models","archived":false,"fork":false,"pushed_at":"2023-10-13T21:26:28.000Z","size":730,"stargazers_count":411,"open_issues_count":5,"forks_count":66,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-11-01T11:34:49.777Z","etag":null,"topics":["correlation","finance","hyperparameter-optimization","machine-learning","optimize","optuna","pareto-front","stock-market","stocks","technical-analysis","technical-indicators","trading","trading-systems","tune"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jmrichardson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-01-20T01:05:51.000Z","updated_at":"2024-10-26T12:16:53.000Z","dependencies_parsed_at":"2023-10-14T21:59:39.939Z","dependency_job_id":null,"html_url":"https://github.com/jmrichardson/tuneta","commit_stats":{"total_commits":165,"total_committers":5,"mean_commits":33.0,"dds":"0.10909090909090913","last_synced_commit":"9469c7268355cc749615df9bcde16afd2204695f"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmrichardson%2Ftuneta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmrichardson%2Ftuneta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmrichardson%2Ftuneta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmrichardson%2Ftuneta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jmrichardson","download_url":"https://codeload.github.com/jmrichardson/tuneta/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246355383,"owners_count":20763990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["correlation","finance","hyperparameter-optimization","machine-learning","optimize","optuna","pareto-front","stock-market","stocks","technical-analysis","technical-indicators","trading","trading-systems","tune"],"created_at":"2024-08-01T02:01:52.590Z","updated_at":"2025-03-30T17:31:21.353Z","avatar_url":"https://github.com/jmrichardson.png","language":"Python","funding_links":[],"categories":["Python","trading"],"sub_categories":["Trading \u0026 Backtesting","交易与回测"],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/jmrichardson/tuneta\"\u003e\n    \u003cimg src=\"images/logo.png\" alt=\"tuneTA\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n[TuneTA](https://github.com/jmrichardson/tuneta) optimizes technical indicators using a [distance correlation](https://towardsdatascience.com/introducing-distance-correlation-a-superior-correlation-metric-d569dc8900c7) measure to a user defined target feature such as next day return.  Indicator parameter(s) are selected using clustering techniques to avoid \"peak\" or \"lucky\" values.  The set of tuned indicators can be pruned by choosing the most correlated with the target while minimizing correlation with each other (based on user defined maximum correlation). TuneTA maintains its state to add all tuned indicators to multiple data sets (train, validation, test).\n\n### Features\n\n* Given financial prices (OHLCV) and a target feature such as return, TuneTA optimizes the parameter(s) of technical indicator(s) using distance correlation to the target feature. Distance correlation captures both linear and non-linear strength and provides significant benefit over the popular Pearson correlation.\n* Optimal indicator parameters are selected in a multi-step clustering process to avoid values which are not consistent with neighboring values, providing a more robust parameter selection.\n* Prune indicators with a maximum correlation to each other.  This is helpful for machine learning models which generally perform better with lower feature intercorrelation.\n* Supports tuning indicator(s) for single or multiple equities.  Multiple equities can be combined into a market basket where indicator parameters are optimized across the entire basket of equities.\n* Multiple time ranges (ie: short, medium and long)\n* Supports pruning preexisting features\n* Persists state to generate identical indicators on multiple datasets (train, validation, test)\n* Parallel processing for technical indicator optimization as well as correlation pruning\n* Supports technical indicators produced from the following packages:\n  * [Pandas TA](https://github.com/twopirllc/pandas-ta)\n  * [TA-Lib](https://github.com/mrjbq7/ta-lib)\n  * [FinTA](https://github.com/peerchemist/finta)\n* Correlation report of target and features\n* Early stopping\n\n### Overview\n\nTuneTA simplifies the process of optimizing many technical indicators while avoiding \"peak\" values, and selecting the best indicators with minimal correlation between each other (optional). At a high level, TuneTA performs the following steps:\n\n1.  For each indicator, [Optuna](https://optuna.org) searches for parameter(s) which maximize its correlation to a user defined target (for example, next day return).\n2.  After the specified Optuna trials are complete, a 3-step KMeans clustering method is used to select the optimal parameter(s):\n\n    1. Each trial is placed in its nearest neighbor cluster based on its distance correlation to the target.  The optimal number of clusters is determined using the elbow method.  The cluster with the highest average correlation is selected with respect to its membership.  In other words, a weighted score is used to select the cluster with highest correlation but also with the most trials.\n    2. After the best correlation cluster is selected, the parameters of the trials within the cluster are also clustered. Again, the best cluster of indicator parameter(s) are selected with respect to its membership.\n    3. Finally, the centered best trial is selected from the best parameter cluster.\n    \n3.  Optionally, the tuned indicators can be pruned by selecting the indicators with a maximum correlation to the all other indicators.\n4.  Finally, TuneTA generates all optimized indicators.\n---\n\n### Installation\n\nNote: Forcing re-installation of TA-Lib as last step to ensure it's compiled correctly with environment.  \n\n```python\npip install -U git+https://github.com/jmrichardson/tuneta\npip install --force-reinstall --no-cache-dir --no-deps TA-Lib\n```\n\nInstall the latest release:\n\n```python\npip install -U tuneta\npip install --force-reinstall --no-cache-dir --no-deps TA-Lib\n```\n\nInstall using Colab:\n\n```python\n!wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz\n!tar -xzvf ta-lib-0.4.0-src.tar.gz\n%cd ta-lib\n!./configure --prefix=/usr\n!make\n!make install\n!pip install Ta-Lib\n!pip install -U git+https://github.com/jmrichardson/tuneta\n!pip install -U git+https://github.com/DistrictDataLabs/yellowbrick.git\n!pip install numpy==1.20.3\n!pip install numba==0.54.1\n!pip install pandas==1.3.4\n!pip install scikit-learn==1.0.1\n```\n---\n\n### Examples\n\n* [Tune RSI Indicator](#tune-rsi-indicator)\n* [Tune Multiple Indicators](#tune-multiple-indicators)\n* [Tune and Prune all Indicators](#tune-and-prune-all-indicators)\n* [TuneTA fit usage](#tuneta-fit-usage)\n* [Tune Market](#tune-market)\n* [Prune Existing Features](#prune-existing-features)\n\n### Tune RSI Indicator\n\nFor simplicity, lets optimize a single indicator:\n\n* RSI Indicator\n* Two time periods (short and long term): 4-30 and 31-180\n* Maximum of 100 trials per time period to search for the best indicator parameter\n* Stop after 20 trials per time period without improvement\n\nThe following is a snippet of the complete example found in the examples directory:\n\n```python\ntt = TuneTA(n_jobs=4, verbose=True)\ntt.fit(X_train, y_train,\n    indicators=['tta.RSI'],\n    ranges=[(4, 30), (31, 180)],\n    trials=100,\n    early_stop=20,\n)\n```\n\nTwo studies are created for each time period with up to 100 trials to test different indicator length values.  The correlation values are displayed based on the trial parameter.  The best trial with its respective parameter value is saved for both time ranges. \n\nTo view the correlation of both indicators to the target return as well as each other:\n```python\ntt.report(target_corr=True, features_corr=True)\n```\n```csharp\nIndicator Correlation to Target:\n\n                         Correlation\n---------------------  -------------\ntta_RSI_timeperiod_19       0.23393\ntta_RSI_timeperiod_36       0.227434\n\nIndicator Correlation to Each Other:\n\n                         tta_RSI_timeperiod_19    tta_RSI_timeperiod_36\n---------------------  -----------------------  -----------------------\ntta_RSI_timeperiod_19                  0                        0.93175\ntta_RSI_timeperiod_36                  0.93175                  0\n```\nTo generate both RSI indicators on a data set:\n```python\nfeatures = tt.transform(X_train)\n```\n\n```csharp\n            tta_RSI_timeperiod_19  tta_RSI_timeperiod_36\nDate                                                    \n2011-10-03                    NaN                    NaN\n2011-10-04                    NaN                    NaN\n2011-10-05                    NaN                    NaN\n2011-10-06                    NaN                    NaN\n2011-10-07                    NaN                    NaN\n...                           ...                    ...\n2018-09-25              62.173261              60.713051\n2018-09-26              59.185666              59.362731\n2018-09-27              61.026238              60.210235\n2018-09-28              61.094793              60.241806\n2018-10-01              63.384824              61.305540\n```\n\n### Tune Multiple Indicators\n\nBuilding from the previous example, lets optimize a handful of indicators:\n\n* Basket of indicators from 3 different packages ([TA-Lib](https://github.com/mrjbq7/ta-lib), [Pandas-TA](https://github.com/twopirllc/pandas-ta), [FinTA](https://github.com/peerchemist/finta))\n* One time period: 4-60\n\n```python\ntt.fit(X_train, y_train,\n    indicators=['pta.slope', 'pta.stoch', 'tta.MACD', 'tta.MOM', 'fta.SMA'],\n    ranges=[(4, 60)],\n    trials=100,\n    early_stop=20,\n)\n```\n\nYou can view how long it took to optimize each indicator:\n```python\ntt.fit_times()\n```\n```csharp\n    Indicator      Times\n--  -----------  -------\n 1  pta.stoch      23.56\n 0  tta.MACD       12.03\n 2  pta.slope       6.82\n 4  fta.SMA         6.42\n 3  tta.MOM         5.7\n```\n\nLet's have a look at each indicator's distance correlation to target as well as each other:\n```python\n    tt.report(target_corr=True, features_corr=True)\n```\n```csharp\nIndicator Correlation to Target:\n                                                       Correlation\n---------------------------------------------------  -------------\ntta_MACD_fastperiod_43_slowperiod_4_signalperiod_52       0.236575\npta_stoch_k_57_d_29_smooth_k_2                            0.231091\npta_slope_length_15                                       0.215603\ntta_MOM_timeperiod_15                                     0.215603\nfta_SMA_period_30                                         0.080596\n\nIndicator Correlation to Each Other:\n                                                       tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52    pta_stoch_k_57_d_29_smooth_k_2    pta_slope_length_15    tta_MOM_timeperiod_15    fta_SMA_period_30\n---------------------------------------------------  -----------------------------------------------------  --------------------------------  ---------------------  -----------------------  -------------------\ntta_MACD_fastperiod_43_slowperiod_4_signalperiod_52                                               0                                 0.886265               0.779794                 0.779794             0.2209\npta_stoch_k_57_d_29_smooth_k_2                                                                    0.886265                          0                      0.678311                 0.678311             0.110129\npta_slope_length_15                                                                               0.779794                          0.678311               0                        1                    0.167069\ntta_MOM_timeperiod_15                                                                             0.779794                          0.678311               1                        0                    0.167069\nfta_SMA_period_30                                                                                 0.2209                            0.110129               0.167069                 0.167069             0\n\n```\n\nNotice above that both slope(15) and mom(15) are perfectly correlated in the intercorrelation report (indicated by value of 1) as well as having the same correlation to the target.  Initially, I thought this had to be a bug, but they are indeed identically correlated on a different scale (notice the same heat color coding):\n\n![](images/slope_mom.jpg)\n\nLets remove correlated indicators with a maximum threshold of .85 for demonstration purposes. Based on the above correlation report, the two indicator pairs that have a correlation of greater than .85 are MACD/Stoch and Slope/Mom.  We can easily remove the worst correlated to the target of each pair (removes Stoch as MACD is more correlated to the target and either slope or mom can be removed as they are both identically correlated to the target).  Notice that all indicators now have an intercorrelation less than .85:\n\n```python\ntt.prune(max_inter_correlation=.85)\n```\n```csharp\nIndicator Correlation to Target:\n                                                       Correlation\n---------------------------------------------------  -------------\ntta_MACD_fastperiod_43_slowperiod_4_signalperiod_52       0.236576\npta_slope_length_15                                       0.215603\nfta_SMA_period_6                                          0.099375\nIndicator Correlation to Each Other:\n                                                       tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52    pta_slope_length_15    fta_SMA_period_6\n---------------------------------------------------  -----------------------------------------------------  ---------------------  ------------------\ntta_MACD_fastperiod_43_slowperiod_4_signalperiod_52                                               0                      0.779794            0.252834\npta_slope_length_15                                                                               0.779794               0                   0.188658\nfta_SMA_period_6                                                                                  0.252834               0.188658            0\nBackend TkAgg is interactive backend. Turning interactive mode on.\n\n```\n\nAs in the previous example, we can easily create features:\n\n```python\nfeatures = tt.transform(X_train)\n```\n\n### Tune and Prune all Indicators\n\nBuilding from the previous examples, lets optimize all available indicators.  Note the addition of min_target_correlation which removes indicators below target correlation threshold:\n\n\n```python\ntt.fit(X_train, y_train,\n    indicators=['all'],\n    ranges=[(4, 30)],\n    trials=500,\n    early_stop=100,\n    min_target_correlation=.05,\n)\n```\nAs in the previous examples we can see the correlation to the target with the report function:\n\n```python\ntt.report(target_corr=True, features_corr=False)\n```\nFor brevity, only showing the top 10 of the many results:\n```csharp\nIndicator Correlation to Target:\n                                                                              Correlation\n--------------------------------------------------------------------------  -------------\npta_natr_length_4_scalar_27                                                      0.253049\ntta_NATR_timeperiod_6                                                            0.247999\ntta_MACD_fastperiod_3_slowperiod_29_signalperiod_25                              0.240217\npta_macd_fast_3_slow_29_signal_25                                                0.240217\npta_pgo_length_26                                                                0.239584\npta_tsi_fast_28_slow_2_signal_25_scalar_15                                       0.238303\npta_smi_fast_29_slow_2_signal_20_scalar_26                                       0.238294\nfta_TSI_long_3_short_29_signal_26                                                0.234654\ntta_RSI_timeperiod_19                                                            0.23393\npta_rsi_length_19_scalar_26                                                      0.23393\n...\n```\n\nLet's prune the indicators to have a maximum of .7 correlation with any of the other indicators:\n\n```python\ntt.prune(max_inter_correlation=.7)\n```\nShow the correlation for both target and intercorrelation after prune:\n```python\ntt.report(target_corr=True, features_corr=True)\n```\nAgain, showing only top 10 rows of each for brevity (intercorrelation omitted as well):\n```csharp\n                                                       Correlation\n---------------------------------------------------  -------------\npta_natr_length_4_scalar_27                               0.253049\ntta_MACD_fastperiod_3_slowperiod_29_signalperiod_25       0.240217\npta_pvol_                                                 0.199302\npta_kc_length_3_scalar_27                                 0.193162\nfta_VZO_period_20                                         0.171986\nfta_DMI_period_4                                          0.148614\npta_pvo_fast_27_slow_28_signal_29_scalar_15               0.14692\npta_cfo_length_28_scalar_26                               0.141013\nfta_IFT_RSI_rsi_period_28_wma_period_4                    0.140977\npta_stc_fast_18_slow_27                                   0.140789\n...\n```\n\n### Tune Market\n\nTuneTA supports tuning indicators across a market of equities. Simply, index the input dataframe with the date and symbol similar to the following.  Notice the dataframe still contains OHLCV but is indexed by both date and symbol (see tune_market.py in examples folder):\n\n![](images/market_dataframe.png)\n\nUse TuneTA in the same way as the previous examples\n\n\n### Prune Existing Features\n\nIf you have preexisting features in your dataframe (regardless if you use TuneTA to create new ones), I've added a helper prune_df function to prune the all of the features based on intercorrelation.  This is helpful, for example, if you have custom features that you would like to combine with TuneTA and select only the features with maximum correlation with minimal intercorrelation.  The prune_df helper function takes a dataframe and returns the column names of the appropriate features to keep.  The column names can then be used to filter your datasets:\n\n```python\n# Features to keep\nfeature_names = tt.prune_df(X_train, y_train, min_target_correlation=.05, max_inter_correlation=.7, report=False)\n\n# Filter datasets\nX_train = X_train[feature_names]\nX_test = X_test[feature_names]\n```\n\nSee prune_dataframe.py in the examples folder\n\n\n### TuneTA fit usage\n\ntt.fit(X, y, indicators, ranges, trials, early_stop)\n\nParameters:\n\n* indicators: List of indicators to optimize\n    * ['all']: All indicators\n    * ['pta']: All pandas-ta indicators\n    * ['tta']: All ta-lib indicators\n    * ['fta']: All fin-ta indicators\n    * ['tta.RSI']: RSI indicator from ta-lib\n    * See config.py for available indicators and the parameters that are optimized\n* ranges: Time periods to optimize\n    * [(2-30)]: Single time period (2 to 30 days)\n    * [(2-30, 31-90)]: Two time periods (short and long term)\n* trials: Number of trials to search for optimal parameters\n* early_stop: Max number of trials without improvement\n* min_target_correlation: Minimum correlation to target required\n---\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmrichardson%2Ftuneta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjmrichardson%2Ftuneta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmrichardson%2Ftuneta/lists"}