https://github.com/khaledsharif/quantopian-ensemble-methods

Assisting repository for the published paper investigating ensemble methods in algorithmic trading.
https://github.com/khaledsharif/quantopian-ensemble-methods
classification-methods day-trading machine-learning quantopian simulation trading-algorithms
Last synced: 7 months ago
JSON representation
Assisting repository for the published paper investigating ensemble methods in algorithmic trading.
Host: GitHub
URL: https://github.com/khaledsharif/quantopian-ensemble-methods
Owner: KhaledSharif
License: mit
Created: 2016-01-15T00:41:46.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2018-02-06T08:22:11.000Z (almost 8 years ago)
Last Synced: 2025-04-09T07:11:51.019Z (10 months ago)
Topics: classification-methods, day-trading, machine-learning, quantopian, simulation, trading-algorithms
Language: Python
Size: 8.79 KB
Stars: 43
Watchers: 4
Forks: 14
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Investigating Algorithmic Stock Market Trading using Efficient Ensemble Techniques

This is an assisting repository for the published paper investigating ensemble methods in algorithmic trading. [It is available publicly at this link](https://drive.google.com/file/d/0B9hY6ZTULtEcbFdrQjhBbWh5ZXM/view?usp=sharing). It was written by Khaled Sharif and Mohammad Abu-Ghazaleh, and was supervised by Dr Ramzi Saifan. Below is a brief overview of the paper contents, in addition to a summary of the code and results of the paper.

## Abstract

Recent advances in the machine learning field have given rise to efficient ensemble methods that accurately forecast time-series. In this paper, we will use the Quantopian algorithmic stock market trading simulator to assess ensemble method performance in daily prediction and trading; simulation results show significant returns relative to the benchmark and strengthen the role of machine learning in stock market trading.



Figure 1: The graph above shows the cumulative returns of each of the three algorithms when working with 100 automatically selected stocks (selected at the start of each month) and using one classifier to predict the trading stocks. 

    

## Simulation Code    

The graph above can be produced by following the series of steps outlined by the Quantopian API. We being by initializing the simulator with both machine learning specific variables and stock trading specific variables.

```python

def initialize(context):

    set_symbol_lookup_date('2012-01-01')

    

    # Parameters to be changed

    

    context.model = ExtraTreesClassifier(n_estimators=300)

    context.lookback = 14

    context.history_range = 1000

    context.beta_coefficient = 0.0

    context.percentage_change = 0.025

    context.maximum_leverage = 2.0

    context.number_of_stocks = 150

    context.maximum_pe_ratio = 8

    context.maximum_market_cap = 0.1e9

    context.starting_probability = 0.5

    

    # End of parameters

    schedule_function(create_model, date_rules.month_start(), time_rules.market_open())

    schedule_function(rebalance, date_rules.month_start(), time_rules.market_open())

    schedule_function(trade, date_rules.every_day(), time_rules.market_open())

    context.algorithm_returns = []

    context.longs = []

    context.shorts = []

    context.training_stocks = symbols('SPY')

    context.trading_stocks  = []

    context.beta = 1.0

    context.beta_list = []

    context.completed = False

```

Following the variable initialization, we define two functions to run monthly and daily, respectively. The first function is the model creation, and it is outlined below. It runs once every month, on the first trading day of that month, and uses the last 1000 days (or whatever is defined by the global "history range" variable, as training data for the machine learning algorithm.

```python

def create_model(context, data):

    X = []

    Y = [] 

    

    for S in context.training_stocks:

        recent_prices = history(context.history_range, '1d', 'price')[S].values

        recent_lows   = history(context.history_range, '1d', 'low')[S].values

        recent_highs  = history(context.history_range, '1d', 'high')[S].values

        recent_closes = history(context.history_range, '1d', 'close_price')[S].values

        atr = talib.ATR(recent_highs, recent_lows, recent_closes, timeperiod=14)

        prev_close = np.roll(recent_closes, 2)

        upside_signal = (recent_prices - (prev_close + atr)).tolist()

        downside_signal = (prev_close - (recent_prices + atr)).tolist()

        price_changes = np.diff(recent_prices).tolist()

        upper, middle, lower = talib.BBANDS(recent_prices,timeperiod=10,nbdevup=2,nbdevdn=2,matype=1)

        upper = upper.tolist()

        middle = middle.tolist()

        lower = lower.tolist()

   

        for i in range(15, context.history_range-context.lookback-1):

            Z = price_changes[i:i+context.lookback] + \

                upside_signal[i:i+context.lookback] + \

                downside_signal[i:i+context.lookback] + \

                upper[i:i+context.lookback] + \

                middle[i:i+context.lookback] + \

                lower[i:i+context.lookback] 

                

            if (np.any(np.isnan(Z)) or not np.all(np.isfinite(Z))): continue

            X.append(Z)

            if abs(price_changes[i+context.lookback]) > abs(price_changes[i]*(1+context.percentage_change)):

                if price_changes[i+context.lookback] > 0:

                    Y.append(+1)

                else:

                    Y.append(-1)

            else:

                Y.append(0)

    context.model.fit(X, Y) 

    

```

Automatic stock selection is also done monthly, at the start of every month, and uses basic fundamental analysis to automatically select 100 stocks from the NYSE and NASDAQ stock markets in a way free of survivorship bias. 

```python

def before_trading_start(context): 

    if context.completed: return

    fundamental_df = get_fundamentals(query(fundamentals.valuation.market_cap)

        .filter(fundamentals.company_reference.primary_exchange_id == 'NAS' or 

                fundamentals.company_reference.primary_exchange_id == 'NYSE')

        .filter(fundamentals.valuation_ratios.pe_ratio < context.maximum_pe_ratio)

        .filter(fundamentals.valuation.market_cap < context.maximum_market_cap)

        .order_by(fundamentals.valuation.market_cap.desc())

        .limit(context.number_of_stocks)) 

    update_universe(fundamental_df.columns.values)

    

    context.trading_stocks = [stock for stock in fundamental_df]

    context.completed = True

```

The final part of the simulator is the actual day-to-day trading that occurs in the simulator at the start of every trading day. When the one classifier method is used, the trading algorithm only takes action when the classifier is confident in its classification over a certain threshold (this was fixed to 60%). When the two classifier method is used, the trading algorithm only takes action when the two classifiers agree on a classification. For both methods, inaction meant that the current portfolio remained unchanged.

```python

def trade(context, data):

    if (context.account.leverage > context.maximum_leverage): return

    

    if not context.model: return

    

    for stock in context.trading_stocks: 

        if stock not in data: 

            context.trading_stocks.remove(stock)

            

    for stock in context.trading_stocks:  

        if stock.security_end_date < get_datetime(): 

            context.trading_stocks.remove(stock)

        if stock in security_lists.leveraged_etf_list: 

            context.trading_stocks.remove(stock)

    

    for one_stock in context.trading_stocks:

        if get_open_orders(one_stock): continue

        recent_prices = history(context.lookback+30, '1d', 'price')[one_stock].values

        recent_lows   = history(context.lookback+30, '1d', 'low')[one_stock].values

        recent_highs  = history(context.lookback+30, '1d', 'high')[one_stock].values

        recent_closes = history(context.lookback+30, '1d', 'close_price')[one_stock].values

        

        if (np.any(np.isnan(recent_prices)) or not np.all(np.isfinite(recent_prices))): continue

        if (np.any(np.isnan(recent_lows)) or not np.all(np.isfinite(recent_lows))): continue

        if (np.any(np.isnan(recent_highs)) or not np.all(np.isfinite(recent_highs))): continue

        if (np.any(np.isnan(recent_closes)) or not np.all(np.isfinite(recent_closes))): continue

            

        atr = talib.ATR(recent_highs, recent_lows, recent_closes, timeperiod=14)

        prev_close = np.roll(recent_closes, 2)

        upside_signal = (recent_prices - (prev_close + atr)).tolist()

        downside_signal = (prev_close - (recent_prices + atr)).tolist()

        price_changes = np.diff(recent_prices).tolist()

        upper, middle, lower = talib.BBANDS(recent_prices,timeperiod=10,nbdevup=2,nbdevdn=2,matype=1)

        upper = upper.tolist()

        middle = middle.tolist()

        lower = lower.tolist()

        

        L = context.lookback        

        Z = price_changes[-L:] + upside_signal[-L:] + downside_signal[-L:] + \

            upper[-L:] + middle[-L:] + lower[-L:] 

            

        if (np.any(np.isnan(Z)) or not np.all(np.isfinite(Z))): continue

            

        prediction = context.model.predict(Z)

        predict_proba = context.model.predict_proba(Z)

        probability = predict_proba[0][prediction+1]

        

        p_desired = context.starting_probability + 0.1*context.portfolio.returns   

        

        if probability > p_desired:

            if prediction > 0:

                if one_stock in context.shorts:

                    order_target_percent(one_stock, 0)

                    context.shorts.remove(one_stock)

                elif not one_stock in context.longs:

                    context.longs.append(one_stock)

                    

            elif prediction < 0:

                if one_stock in context.longs:

                    order_target_percent(one_stock, 0)

                    context.longs.remove(one_stock)

                elif not one_stock in context.shorts:

                    context.shorts.append(one_stock)

                    

            else:

                order_target_percent(one_stock, 0)

                if one_stock in context.longs:    context.longs.remove(one_stock)

                elif one_stock in context.shorts: context.shorts.remove(one_stock)

        

                

    if get_open_orders(): return

    

    for one_stock in context.longs:

        if not one_stock in context.trading_stocks:

            context.longs.remove(one_stock)

        else:

            order_target_percent(one_stock, \

               context.maximum_leverage/(len(context.longs)+len(context.shorts)))

    

    for one_stock in context.shorts:

        if not one_stock in context.trading_stocks:

            context.shorts.remove(one_stock)

        else:

            order_target_percent(one_stock, \

              (-1.0)*context.maximum_leverage/(len(context.longs)+len(context.shorts)))

        

    order_target_percent(symbol('SPY'),  \

       (-1.0)*context.maximum_leverage*(context.beta*context.beta_coefficient))

```

## Simulation Results

Table 1: The table below compares the average values of the alpha and beta coefficients over 12-month periods for each of the three classification methods when used in simulation over the time-period 2010 to 2015.

|                                       |     12-month Alpha    |     12-month Alpha     |     12-month Beta     |      12-month Beta     |

|:-------------------------------------:|:---------------------:|:----------------------:|:---------------------:|:----------------------:|

|                                       | One Classifier Method | Two Classifiers Method | One Classifier Method | Two Classifiers Method |

|        Random Forest Classifier       |          0.40         |          1.29          |          1.89         |          2.79          |

| Extremely Randomized Trees Classifier |          0.40         |          1.05          |          1.25         |          2.77          |

|      Gradient Boosting Classifier     |          0.62         |          1.37          |          1.74         |          4.70          |

Table 2: The table below compares the average values of the Sharpe, Sortino and Information ratios over 12-month periods for each of the three classification methods when used in simulation over the time-period 2010 to 2015.

    

    

|                                       |      Sharpe Ratio     |      Sharpe Ratio      |     Sortino Ratio     |      Sortino Ratio     |   Information Ratio   |    Information Ratio   |

|:-------------------------------------:|:---------------------:|:----------------------:|:---------------------:|:----------------------:|:---------------------:|:----------------------:|

|                                       | One Classifier Method | Two Classifiers Method | One Classifier Method | Two Classifiers Method | One Classifier Method | Two Classifiers Method |

|        Random Forest Classifier       |          2.26         |          3.42          |          4.06         |          5.28          |          0.11         |          0.10          |

| Extremely Randomized Trees Classifier |          2.68         |          3.24          |          4.07         |          3.25          |          0.10         |          0.10          |

|      Gradient Boosting Classifier     |          3.61         |          3.84          |          5.28         |          5.73          |          0.15         |          0.13          |

Table 3: The table below compares the average values of the volatility and maximum draw-down indicators over 12-month periods for each of the three classification methods when used in simulation over the time-period 2010 to 2015.

|                                       |       Volatility      |       Volatility       |   Maximum Draw-down   |    Maximum Draw-down   |

|:-------------------------------------:|:---------------------:|:----------------------:|:---------------------:|:----------------------:|

|                                       | One Classifier Method | Two Classifiers Method | One Classifier Method | Two Classifiers Method |

|        Random Forest Classifier       |          0.24         |          0.35          |         11.55%        |         21.45%         |

| Extremely Randomized Trees Classifier |          0.23         |          0.49          |         11.69%        |         25.25%         |

|      Gradient Boosting Classifier     |          0.22         |          0.38          |         24.00%        |         24.02%         |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/khaledsharif/quantopian-ensemble-methods

Awesome Lists containing this project

README