{"id":18567108,"url":"https://github.com/diovisgood/qgen_lua","last_synced_at":"2025-07-20T01:05:08.470Z","repository":{"id":75449225,"uuid":"189212044","full_name":"diovisgood/QGEN_Lua","owner":"diovisgood","description":"Competing Genetic Algorithm to find profitable Trading Strategies on a financial market","archived":false,"fork":false,"pushed_at":"2019-05-29T11:40:31.000Z","size":56791,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T16:47:01.157Z","etag":null,"topics":["competing-genetic-algorithm","genetic-algorithm","lua","machine-learning","moex","trading-strategies"],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diovisgood.png","metadata":{"files":{"readme":"readme.md","changelog":"history/SPFB.SBRF-12.10.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-29T11:27:10.000Z","updated_at":"2024-11-02T10:59:00.000Z","dependencies_parsed_at":"2023-03-09T15:15:18.910Z","dependency_job_id":null,"html_url":"https://github.com/diovisgood/QGEN_Lua","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diovisgood%2FQGEN_Lua","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diovisgood%2FQGEN_Lua/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diovisgood%2FQGEN_Lua/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diovisgood%2FQGEN_Lua/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diovisgood","download_url":"https://codeload.github.com/diovisgood/QGEN_Lua/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248163251,"owners_count":21057894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["competing-genetic-algorithm","genetic-algorithm","lua","machine-learning","moex","trading-strategies"],"created_at":"2024-11-06T22:25:28.694Z","updated_at":"2025-07-20T01:05:08.458Z","avatar_url":"https://github.com/diovisgood.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"left\"\u003e\n    \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/License-MIT-brightgreen.svg?style=flat-square\"\n            alt=\"MIT license\"\u003e\u003c/a\u003e \u0026nbsp;\n\u003c/p\u003e\n\n# QGEN Project - Competing Genetic Algorithm\n\n## Abstract\n\nIn this article I describe **Competing Genetic Algorithm** that was created during my work on **QGEN Project**.\n\nIt was a project for the task of finding new profitable _trading strategies_ in a financial market.\n\nIn this project by _trading strategy_ I mean: a **mathematical formula** that uses some datasets of past market data\nand produces a number.\nIf the number is positive we should trade **LONG** for a given instrument.\nIf it is negative - we should trade **SHORT**.\nOtherwise - **CLOSE** any position.\n\nAt first I describe well known **Genetic Algorithm** in details.\n\nThen I show it's limitations for tasks of particular kind,\n**when you can expect many different good solutions**.\nIn this case a simple Genetic Algorithm will most probably stuck in a local optimum with one result.\n\nUsing analogy with some imaginary example I show the way to solve this problem: **competition for resources**.\n\nThen I describe the proposed **Competing Genetic Algorithm**.\nIt's main principles are:\n- Evolve several species in parallel, each with its own population.\n- Make species compete for resources with each other.\nThis is achieved by adding a competing penalty (or regularizing) term into scoring function.\n\n\n## Genetic Algorithm\n\n[Genetic Algorithm](https://en.wikipedia.org/wiki/Genetic_algorithm) is well known since 1960.\nIt is used in optimization and search problems.\n\nParticularly this project is a _search problem_.\nThere is enormous space of all possible trading strategies, some of which are profitable, others are not.\n\nBrute-force search through this space seems impossible.\n\nThis is a good task for Genetic Algorithm if we manage to setup all conditions for it to work.\n\nGenetic algorithm requires several conditions to work:\n1. You **must** specify how to write down your formula as a sequence of 'symbols'.\n Either of fixed or variable length.\n2. Most important: you **must** specify the scoring (or fitness) function for any sequence.\n3. You **must** specify _mutate_ operation for any such sequence.\n4. You **may** also specify _crossover_ operation for any two (three, four, ...) such sequences.\n\nAs you will see later a _sequence_ is not necessarily a _string_ and\n _symbols_ could be other than _letters_.\n\nAll living creatures on Earth have the same building 'symbols' for their DNA 'sequences':\n- Adenine (A)\n- Thymine (T)\n- Guanine (G)\n- Cytosine (C)\n\nFour 'letters': A, T, G, C - are combined into sequences of any length that describe all organisms of our planet.\n\nWhy only four? Nobody knows... I bet it happened by chance!\n\n**But we are not obliged to repeat the same limitations for our simulated evolution!**\n(Thanks to my friend Eldar Mukhamedzyanov for this idea)  \n\nWe don't need to emulate **chromosomes**, **start/stop codons**, **complementary mRNA strands**, or even **codons** themselves!\nInstead we can use **array or list** as a _sequence_ and use **keywords** as _symbols_.\n\n### Formula as sequence\n\nAs you know, any formula can be written as a sequence of symbols.\nTake, for example, a formula in LaTex format:\n```\n\\sqrt{ \\frac{ a ^ {2*x^3 - 5} } {(b*x - 5)}}\n```\n\nWhich is equivalent of:\n\n![math](img/math_formula_sample.png)\n\nThe problem with such string is in **parenthesis**.\nThey make it hard to _mutate_ or _crossover_ sequences.\nAs you can't simply change any symbol in string in any place or dissect sequence at any random place.\n\nLuckily there is a solution - [reverse polish notation](https://en.wikipedia.org/wiki/Reverse_Polish_notation).\nIt helps to get rid of parenthesis.\n\nWith it the formula sequence looks like this:\n```\na 2 x 3 ^ * 5 - ^ b x * 5 - / sqrt\n```\n\nA good example of polish notation usage is the [Postscript](https://en.wikipedia.org/wiki/PostScript) language,\nwhich you implicitly evoke every time when reading PDF file.\n\nThis language also introduced **stack** as a way to get rid of variables.\nLet's see how our formula could look in a postscript-like language:\n\n```\npush a\npush 2\npush x\npush 3\npow\nmul\npush 5\nsub\npow\npush b\npush x\nmul\npush 5\nsub\ndiv\nsqrt\n```\n\nAfter execution there will be one item on a stack, containing the result of this formula.\n\nThe same principle is used in this project.\nIn [`syntax.lua`](syntax.lua) you may find sequence generation, mutation and crossover operations.\nAnd in [`processor.lua`](processor.lua) you may find all operations.\n\n\u003e Note: each line here should be treated like one **symbol**.\n\u003e For instance, when mutate, you can change `pow` to `add` or `mul`,\n\u003e but not `pow` to `poZ`.\n\u003e Now you understand why I'm using **sequences**, not **strings**.\n\nI don't use `push` command for operands like `a`, `b` or `x`. I use `Val` instead.\nWhile constants are simply written without any `push` or `Val`.\nI had some reasons to do it this way:\n\n```\na Val 2 x Val 3 Pow Mul 5 Sub Pow b Val x Val Mul 5 Sub Div Sqrt\n```\n\nThe string above should be treated like a sequence, a list of keywords.\nEach keyword representing a _symbol_, recognizable by the syntax **processor**.\n\n### Datasets\n\nInstead of `a`, `b` and `x` in the previous example we need to operate some real market data.\n\nThis data in most cases comes as a time series.\n\nThat is why I use **vectors** as operands.\nWhere first element of vector is the earliest one, and the last element is the newest one.\n\n**Datasets** of past market data are as follows:\n  - `po`: vector of open prices for intervals (10min, 1hour, 1day)\n  - `ph`: vector of highest prices for intervals (10min, 1hour, 1day)\n  - `pl`: vector of lowest prices for intervals (10min, 1hour, 1day)\n  - `pc`: vector of close prices for intervals (10min, 1hour, 1day)\n  - `ptr`: vector of price true range at intervals (10min, 1hour, 1day)\n  - `vwap`: vector of Volume-Weighted-Average-Prices for intervals (10min, 1hour, 1day)\n  - `vol`: vector of open prices for intervals (10min, 1hour, 1day)\n  - `time`: vector of close times for intervals (10min, 1hour), where each value is in [0..1]\n    such that 0 = start of of trading session, 1 = end of trading session.\n\nFor more details dig into [`data.lua`](data.lua).\n\n### Operations\n\nTo build up any formula we need some **operations**.\nAll operations could be divided into several categories:\n- operations on a single operand\n- operations on two operands\n- stack control and other special operations\n\nList of single operand operations: `Delta`, `Min`, `Max`, `Sum`, `Prod`, `Rank`, `iRank`, `Std`, `SMA`, `WMA`,\n `Neg`, `Abs`, `Sign`, `Exp`, `Log`.\n\nList of two operands operations: `Add`, `Sub`, `Mul`, `Div`, `Mod`, `Pow`, `Min2`, `Max2`, `Covar`.\n\nList of stack control and other special operations: `Swap`, `Dup`, `Rep`, `RepN`, `RepM`, `If`, `Lt`, `Lte`, `Gt`, `Gte`.\n\n**Why there are so many operations?**\n\nOf course, some operations could be implemented as a sequence of other basic operations.\nFor instance: `x 5 *` can be written as `x x x x x + + + +`.\n\nBut my experiments show that it takes much longer time for genetic algorithm to get to solution\nwhen there are fewer operations possible.\n**The longer the target sequence - the lower the probability to get it randomly.**\n\nIt seems reasonable to have a good variety of operations.\nSo your algorithm could easily find new solutions.\n\nFor details you are encouraged to dig inside [`processor.lua`](processor.lua).\n\n### Library of Sequences\n\nI also developed additional approach to speed up evolution process.\nSome sequences are too large to wait for evolution to create them,\nbut they _seem to be useful_ in many trading strategies.\n\nFor example, the following sequence calculates body of candle, i.e. `|close_price - open_price|`:\n```\npc Val po Val Sub Abs\n```\n\nFollowing example calculates the upper shadow of a candles:\n```\nph Val po Val Sub ph Val pc Val Sub Min2\n```\n\nNext example acts like a filter that keeps only the first values of each trading sessions,\nzeroing other values out:\n```\nI time Delta 0 Lt Mul\n``` \n\nI came to a solution when in some rare cases instead of a regular _mutate_ operation\ngenetic algorithm **inserts random record from the library into a sequence**.\n\nThis approach is wide spread in nature. As, for example,\n[antimicrobial resistance](https://en.wikipedia.org/wiki/Antimicrobial_resistance)\ncould be caused by\n[horizontal gene transfer](https://en.wikipedia.org/wiki/Horizontal_gene_transfer).\nIn this process microbes exchange some short portions of genetic information as\na _packages_ of DNA/RNA.\n\n![horizontal gene transfer](img/horiz_gen_trans.png)\nImage from (C) University of Leicester\n\nI believe that this method is helpful for my task though I didn't perform any measurements to prove it.\nPlease let me know your opinion, if any.\n\n### Genetic Algorithm and its parameters\n\nNaive approach for genetic algorithm could be as follows:\n1. Start with initially random sequence.\n2. Measure the score of sequence.\n3. Mutate current sequence to get new mutant sequence.\n4. Measure the score of mutant sequence.\n5. If it is greater - then make it the current sequence.\n6. Goto 3.\n\nThe problem with such approach is that **it usually requires more than one mutation\nto achieve next level**. \n\nFor instance, deadly influenza virus H5N1 needs only\n[5 mutations](https://www.latimes.com/science/sciencenow/la-sci-sn-bird-flu-five-mutations-20140410-story.html)\nto become transmissible through coughing or sneezing, like regular flu viruses.\n\n![](img/evolution_unlikely.png)\n\nThis has not happened yet, because the probability for 5 exact mutations at once is low,\nthough it is not impossible.\n\n![](img/evolution_probable.png)\n\nIn order to speed up evolution we need to maintain **variety of genes**.\nThat is why we need to work with not only one sequence, but with a **population of sequences**.\n\nSo our algorithm becomes as follows:\n1. Start with a population of initially random sequences.\n2. Measure the score of each sequence in population.\n3. Drop out the worst 10% of population sequences, keeping 90% others.\n4. Use remaining sequences to create new sequences via mutations and crossover,\nrestoring the population to it's predefined size.\n5. Goto 2.\n\nFor this algorithm you have to specify several parameters:\n- population size,\n- preservation rate,\n- mutation factor.\n\n**Population size**: the larger - the better the variety of genes.\nBut large populations require more memory and computational resources.\n\nWhat would be the optimal population size?\nI would say it strongly depends on your task.\nFor example, there are still debates for\n[optimal human population size](https://en.wikipedia.org/wiki/Optimum_population).\nChoose according to your computing power available.\nUsually you can choose population size from 10 to 1000.\n\n**Preservation rate** (90%) is adjustable parameter which determines speed of population renewal vs. genes preservation.\nIf it is big the variety will be good. But when it is too big - the evolution slows down.\nI have found that values from 50% to 90% are good enough. \n\n**Mutation factor** specifies the probability of any symbol to change\nwhen new sequences are created from existing using _mutation_ operation.\n\nNote that **mutation factor** does not have to be a constant value.\nI made it a function of overall score: the better the score the lower the mutation probability.\nSo I start initially from 0.15 in the beginning down to 0.01 for 'mature' sequences. \n\n**Population size**, **preservation rate** and **mutation factor** all together influence the speed of your evolution.\nI came to the conclusion that they don't have to be precisely tuned to achieve good result.\nEvolution algorithm is pretty robust and can work in a wide range of these parameter's values.\n\nBut in a worst case, when population size is too small, or preservation factor is too low,\nor mutation factor is too high, you may meet a **genetic drift** problem.\n\n\u003e [Genetic drift](https://www.khanacademy.org/science/biology/her/heredity-and-genetics/a/genetic-drift-founder-bottleneck)\n\u003e is a mechanism of evolution in which allele frequencies of a population change over generations due to chance (sampling error).\n\u003e Genetic drift occurs in all populations of non-infinite size, but its effects are strongest in small populations.\n\nIf you notice that:\n- your population does not evolve;\n- all sequences are very similar to each other (low diversity);\n\nIn this case you should adjust your parameters.\n\n### Limitations of Genetic Algorithm\n\nThe algorithm presented above is good enough for most cases.\nFor example:\n- to find best hyperparameters for neural network in a large space of possible values,\n- to find strong and lightweight structural design of some element,\n- to dynamically organize routing in telecommunication networks,\n- for trip, traffic and shipment routing,\n- to find 3D shape of long molecule,\n- etc.\n\n[These are tasks](https://www.brainz.org/15-real-world-applications-genetic-algorithms/)\nin which **you have a pretty good idea of what the result will be**.\n\nThe task for this project (finding profitable trading strategies) however is more complicated.\nAs it could have **a lot of good solutions**.\n\nTake for example an imaginary world.\nIt has some island with flies and grass, and algae in the sea.\nFlies, grass and algae - are available renewable resources.\n\n![](img/available_resources.png)\n\nSay you want to use evolution algorithm to design a new creatures that could feed on available resources.\n\nIf you apply genetic algorithm presented above you will most likely end up with a creatures\nthat could eat either grass or flies or algae, but not all kinds of resources.\n\nAlgorithm will quickly evolve creatures to consume any kind of resources and it will ignore all other kinds.\nIt easily stops at local optimum and has low chances to escape from it.\n**Evolution chooses the easiest way, not the best one**.\n\nYou will probably never get a creature that could feed on all kinds of resources, like this:\n\n![](img/magic_creature.png)\n\nIn the real world different organisms **compete with each other for food to survive**.\nThis lead to development of large variety of species, where **each specie specializes on a particular resource**.\n\nThe best solution for this imaginary world would be\nto **evolve different species in parallel while making them compete for resources**.\n\nThen we could, for example, get to a solution with three species: **birds**, **cows** and **fish**.\nWhere:\n- birds eat flies,\n- cows eat grass,\n- fish eat algae.\n\n![](img/optimal_species.png)\n\nThe same logic can be applied to finance market.\nThere could be many profitable trading strategies (= resources).\nThe problem is that described above simple genetic algorithm can lead you to one strategy only at one run.\nAnd if you make another run from scratch, it will most probably lead you to the same strategy again.\n\nWe need modified **Competing Genetic Algorithm\nwhich evolves different species in parallel while making them compete for resources**.\n\n## Competing Genetic Algorithm\n\nIn this algorithm we should evolve not only one population,\nbut several species each with its own population of sequences.\n\nBesides mutation and crossover operations, you should also specify a **competing function**,\nwhich compares behaviour of two sequences and returns a value in between [0...1].\nWhere 0 means that sequences use completely different resources,\nand 1 means both sequences use totally the same resources.\n\nThis competing function is needed to prohibit two species to use same resources.\nYou can use it as a **regularizing or penalizing term in overall score** for each sequence, as follows:\n1. Compare behaviour of current sequence with behaviour of _best sequence of each other specie_\n and compute **competition** function for each pair.\n2. Use maximum result of **competition** function in a regularizing term to calculate final score of current sequence.\n\nYou can either multiply or subtract regularizing term to overall score:\n\n![](img/score_mul_competing.gif)\n\nor\n\n![](img/score_sub_competing.gif)\n\nwhere gamma is the weight of regularizing term. \n\nThe logic of this regularizing term is simple:\nwhen current sequence behaves like the best sequence of any other specie\nit means that they are using same 'resources'.\nThen its score should be low.\n\nPlease note, that this regularizing term does not kicks out the specie,\nwhich had first took over this particular resource.\nBecause we don't compare each sequence with all other sequences in all species,\nbut with the _best sequences of each specie_ only.\n\nIt means that if any specie has evolved a formula to exploit some resource,\nthen it is protected from future claims for this resource.\n\nI.e. when later there in some other specie appears a new mutant that want to utilize the same resource,\nit is penalized by **competing function** and, most probably, is thrown away.\n\n**How to implement competing function?**\n\nIt depends on your task and definition of its 'resources'.\nFor this project I used\n[Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).\n\nJust to remind you: a trading strategy in this project is a formula that output a number for each time period.\nHence we get a vector of numbers as a result of each sequence.\n\n```\n[ 12 11 4  0  0 -10  -5  -7  4  -5  0  0 ]\n```\n\nFirst we need to apply a `sign` function to this vector, as we need only the sign of a number:\npositive means 'BUY', negative means 'SELL', zero means 'CLOSE position'.\n\n```\nsign( [ 12 11 4  0  0 -10  -5  -7  4  -5  0  0 ] )\n\n    = [ 1  1  1  0  0  -1  -1  -1  1  -1  0  0 ]\n```\n\nThen we can calculate **Pearson correlation coefficient** for any two output vectors\nand use its absolute value as the result of **competing function**.\n\n```\ncorr( [ 1  1  1  0  0  -1  -1  -1  1  -1  0  0 ],\n      [ 1  0  0  1  1  -1   0  -1  0  -1  0  1 ] )\n\n    = 0.538\n```\n\nPlease note that you should manually treat special cases when vector contain all zeroes, or all 1 or -1.\n\n**Now the algorithm itself.**\n\nYou can start with several species, each with randomly initialized population.\nBut I suggest you would start with one specie, adding new species one by one,\nwhen old species have reached some level of maturity (= profitability in my case).\n\nSo, the proposed algorithm is as follows:\n1. If there are no species, or all species reached some required level of maturity (= min_score)\n  then create new specie with randomly initialized population.\n2. For each specie:\n   - Perform simple genetic algorithm for each specie, running it for several generations.\n Spend more time (= cycles) for younger species with low scores, and less time for mature species with high scores.\n   - Save output signals of best sequence of each specie to be later used in competing function.\n   - During score calculation for each sequence take into account output signals from other species.\n3. Delete species that evolve for some long period of time (= iterations)\n but still have not reached required maturity (= profitability).\n4. Goto 1.\n\nThis algorithm is implemented in [`main.lua`](main.lua).\nNote that it is designed to run in parallel in multiple instances\nand it synchronizes different processes through file-locking mechanism.\nThis way it is able to evolve different species in parallel and speed up whole evolution process.\n\n## Scoring Function\n\nThe Competing Genetic Algorithm was not hard to invent and to implement.\nI'm almost sure that someone else had already published a similar algorithm under another name.\n(Do you know any?)\nAs all ideas come from the evolution in the real world.\n\nIts the creation of **scoring function** that was the _hardest_ task in this project.\n\nWhen I say: **Evolution chooses the easiest way, not the best one**, I mean:\nyou will have to trace and close all the easiest paths to make evolution to work as intended. \n\n**What are these easiest paths?**\n\nIf you did not remove trend from price time-series - the evolution will simply stick to this trend.\nAfter that it won't try to find some other trading strategy.\n\nIf you have mistake in a dataset - be sure that algorithm will exploit it.\n\nIf you made a mistake in some operation - algorithm will find and exploit it.\n\nWhen you have solved all errors in datasets and program codes,\nyou can start adjusting the scoring function.\n\nWhile does it need adjusting? Can I simply set:\n\n```\nScore(x) = NetProfit(x)\n```\n\nWell, unfortunately, no.\n\nBecause when you initialize random sequences they do not make any profit at all.\nThey all contain logical errors and most of them could not be executed properly.\n\nSo you might want to support any sequence that perform at least some operations and has less computational errors:\n\n```\nScore(x) = NetProfit(x) + nOperations(x) - nErrors(x)\n```\n\nBut the result would be disappointing, as genetic algorithm will quickly create a sequence that\nperforms very large amount of operations without any errors... and without any profit too!\n\nBecause it is much easier to increase number of correct _meaningless_ operations and get more scores,\nthan to find any trading strategy and get more scores.\nAnd like I said: **evolution chooses the easiest way, not the best one**.\n\nThen you may decide that **nOperations** should be around 10..100, not more.\nThat **nErrors** is less important.\nAnd that **Net Profit** is the most important part of equation.\n\nAfter that you realize that **Net Profit** is not the best way to score a trading strategy.\nThat **Profit Factor** is better. And also **Annual Rate** and **Max Drawdown**.\n\nYou append a whole bunch of coefficients that specify the importance of this or that factor.\nSo your equation quickly becomes large, unreadable and ugly.\n\nBut somehow I managed to make it work.\nHere is a short list of factors that influence the score:\n- Annual Rate of Return\n- Max Drawdown Percent\n- Profit Factor\n- nOperations should be about 10..100\n- nErrors\n- nTrades performed should be about 25..500\n- sequence length is not too big\n- sequence addresses to past data not too far away\n- sequence outputs more than one item on stack after execution\n\nOf course, do not forget about **competition penalty term** described above.\n\nFor details you may dig inside [`score.lua`](score.lua)\n\n## Running Evolution\n\nThis algorithm was applied to [Moscow Exchange](https://www.moex.com/) historical data.\n\nParticularly: Futures for Sberbank stock ([SBRF](https://www.moex.com/ru/contract.aspx?code=SRU8\u0026utm_source=www.moex.com\u0026utm_term=sbrf)).\n\nAll settings for this instrument are located in [`SBRF/config.lua`](SBRF/config.lua) file.\nFor instance:\n\n- Minial frame interval: 600 seconds (= 10 minutes)\n- Population size: 100\n- Species count limit: 70\n- etc.\n\n### Creating datasets\n\nDatasets are created from raw market data: **history text files in CSV format with OHLCV values for 1 minute interval**.\n\nAll history files can be found in [`history`](history) folder.\nSince futures contract runs for a limited period of time there are 36 files for different contracts.\nThey cover period: **Nov 2009 - Dec 2018**.\nBut dataset is created only for the latest 24 contracts.\nAs in trading the recent data is valued the most.\n\nYou can create datasets with this command:\n\n```bash\n$ luajit data.lua SBRF\nChanged dir to SBRF\nSearching for history files...\nScanning ../history for history files...\nPattern=^SPFB%.SBRF%-(%d+)%.(%d+)%.txt$\nFound 36 files:\n ../history/SPFB.SBRF-3.10.txt\n ../history/SPFB.SBRF-6.10.txt\n ../history/SPFB.SBRF-9.10.txt\n\n...\n\n ../history/SPFB.SBRF-6.18.txt\n ../history/SPFB.SBRF-9.18.txt\n ../history/SPFB.SBRF-12.18.txt\nLoading latest 24 files...\nLoading ../history/SPFB.SBRF-3.13.txt...\n55101/55101 loaded 49865 candles, 10370 frames, 928 hours, 68 days\nLoading ../history/SPFB.SBRF-6.13.txt...\n63966/63966 loaded 56065 candles, 11430 frames, 1003 hours, 73 days\n\n...\n\nLoading ../history/SPFB.SBRF-6.18.txt...\n61708/61708 loaded 59576 candles, 12307 frames, 1096 hours, 80 days\nLoading ../history/SPFB.SBRF-9.18.txt...\n59820/59820 loaded 58763 candles, 12277 frames, 1113 hours, 81 days\nLoading ../history/SPFB.SBRF-12.18.txt...\n61686/61686 loaded 58630 candles, 12155 frames, 1087 hours, 79 days\nSaving dataset to dataset.gz file...\n```\n\n### Running evolution\n\nWhen you run `main.lua` it starts Competing Genetic Algorithm.\n\n```bash\n$ luajit main.lua SBRF\nLoaded library of 42 elements from library.dat\nLoaded 197 names from names.dat\nChanged dir to SBRF\nLoading Dataset...\nLoaded Dataset from dataset.gz\nDataset table: 24 array elements\nAquiring index...\nIndex has 70 records\nChecking if all species are profitable... False\n```\n\nAt this point algorithm takes a random specie - **Urania**\n(I used viking and greek gods names to identify different species, see [`names.dat`](names.dat)) -\nand displays current info about this specie:\n\n```bash\nOpening urania ... Ok\nspecie info table:\n  annual_rate = 0.0032012154172002\n  annual_rate_std = 0.010049071912475\n  best_code = \"I I I I I I I vwap fUp ... Lt Mul Neg Mul\"\n  best_exec = \"I I I I I I I vwap fUp ... Lt Mul Neg Mul\"\n  best_rate = -0.006847856495275\n  best_score = -10147.275735904\n  best_serial = 9420\n  best_text = \"fUp(vwap,15) 0.84771650492816 ... Lt Mul Neg Mul\"\n  dataset_last_time = 1533934208\n  epochs_count = 15\n  iterations_count = 300\n  last_iteration_time = 1534031248\n  last_serial = 12200\n  max_drawdown_percent = 0.0038353712914283\n  max_proto_corr = 0.38551992908031\n  name = \"urania\"\n  profit_factor = 1.3446103287993\n  signal_neg = 0.0055944781884644\n  signal_pos = 0.00047131879392101\n  update_count = 1\nUnlocking index\n```\n\nAfter that the script starts one epoch of evolution for selected specie:\n\n```bash\n\n----------\n  URANIA\n----------\nInitializing specie urania\nDatabase has changed since last epoch. Purging cache...\nInitializing competitors:\nProcessing specie 70/70\nInitialized 62 concurrents\nCalculating mean entity length:\nEntities sequence length mean=40.72 std=3.6251344802641\nStarting epoch of evolution (Autosave_Interval=2 Iterations_In_Epoch=20)\nSBRF urania Iter: 301 Epochs: 15 Update: 0 Score: -10057.292644127 Winner: 70\nSBRF urania Iter: 302 Epochs: 15 Update: 0 Score: -10057.292644127 Winner: 71\nAutosaving specie urania\nSBRF urania Iter: 303 Epochs: 15 Update: 0 Score: -10057.292644127 Winner: 72\n...\n```\n\nI recommend to spend more computing time for new species with low scores\nand less time for mature species with high scores.\nI achieve this by **adjusting the probability** of selecting specie for next round.\n\nYou may run several scripts `main.lua` on one system.\nEach scripts uses 2 CPU cores.\nSay if you have 8 virtual cores - you can run 4 processes in parallel to achieve maximum speed of evolution on this PC.\n\nAs stated before, processes interact with each other using file-locking mechanism.\nThat is why no two scripts could ever evolve the same specie at one time.\n\nThere is a special file: [`SBRF/index.dat`](SBRF/index.dat) which is regularly updated by running scripts.\nIt contains the latest information about all species and their best sequences with highest scores.\n\nAfter many weeks of evolution there are in total 50 species (=trading strategies) with positive outcome,\nand 20 species which have not evolved to a positive outcome yet.\n\nFor example:\n\n```\n  {\n    name='aphrodite',\n    best_score=419.3722137887,\n    best_code='pc Val po Val Sub I time Val vol I Delta vwap Val 0.6506024096 Gt Mul ptr Min 0.41427952546279 Gt time Val I I I vwap I I Rank ptr SMA Sub Days po Val pc Val Sub I I Days po Val time Delta 0.32497333249298 Lt Mul Rep po Max Sub Sub -0.0062526521217403 Gt Mul po Val 9.320244570715 pl Val Sub Delta 0 Lt Mul Sub 0.19505921812091 Lt Mul ph Hours Val Days po SMA Sub -0.35232823715884 Lt Mul 0.48087209546602 Lt Mul Mul Days vwap SMA 1.6236306970182 Mul ptr Max I I I I I ptr SMA Gt Mul Gt Mul 0.62809943857406 Gt Mul Rep pc Sum Hours oh Max I I I I I I ph I Val pl Val Sub Abs Add Gt ph Val I Days pc Val I Days po Val Sub 0 Gt Mul Rep Mul I RepN I I I I I I I I I time Val 0.59345924286126 Gt Mul Val ptr SMA Gt Mul I time Delta I time po Val pl Val Sub pc Val pl Prod Sub Log Min2 0.60342306191225 Gt Mul -0.24898811846294 Gt RepM',\n    best_exec='pc Val po Val Sub I time Val I vol Delta vwap Val 0.6506024096 Gt Mul ptr Val 0.41427952546279 Gt time Val I I I I I vwap Rank ptr Val Sub Days po Val pc Val Sub I I Days po Val time Val 0.32497333249298 Lt Mul Rep po Val Sub Sub -0.0062526521217403 Gt Mul po Val 9.320244570715 pl Val Sub Val 0 Lt Mul Sub 0.19505921812091 Lt Mul Hours ph Val Days po Val Sub -0.35232823715884 Lt Mul 0.48087209546602 Lt Mul Mul Days vwap Val 1.6236306970182 Mul ptr Val I I I I I ptr SMA Gt Mul Gt Mul 0.62809943857406 Gt Mul Rep pc Val Hours oh Val I I I I I I I ph Val pl Val Sub Abs Add Gt ph Val I Days pc Val I Days po Val Sub 0 Gt Mul Rep Mul RepN I I I I I I I I I time Val 0.59345924286126 Gt Mul ptr Val Gt Mul I time Delta I po Val pl Val Sub pc Val pl Val Sub Log Min2 0.60342306191225 Gt Mul -0.24898811846294 Gt RepM',\n    best_text='Val(pc,0) Val(po,0) Sub Val(time,1) Delta(vol,1) Val(vwap,0) 0.6506024096 Gt Mul Val(ptr,0) 0.41427952546279 Gt Val(time,0) Rank(vwap,9) Val(ptr,0) Sub Val(Days.po,0) Val(pc,0) Sub Val(Days.po,2) Val(time,0) 0.32497333249298 Lt Mul Rep Val(po,0) Sub Sub -0.0062526521217403 Gt Mul Val(po,0) 9.320244570715 Val(pl,0) Sub Val(0) 0 Lt Mul Sub 0.19505921812091 Lt Mul Val(Hours.ph,0) Val(Days.po,0) Sub -0.35232823715884 Lt Mul 0.48087209546602 Lt Mul Mul Val(Days.vwap,0) 1.6236306970182 Mul Val(ptr,0) SMA(ptr,9) Gt Mul Gt Mul 0.62809943857406 Gt Mul Rep Val(pc,0) Val(Hours.oh,0) Val(ph,15) Val(pl,0) Sub Abs Add Gt Val(ph,0) Val(Days.pc,1) Val(Days.po,1) Sub 0 Gt Mul Rep Mul RepN Val(time,21) 0.59345924286126 Gt Mul Val(ptr,0) Gt Mul Delta(time,1) Val(po,1) Val(pl,0) Sub Val(pc,0) Val(pl,0) Sub Log Min2 0.60342306191225 Gt Mul -0.24898811846294 Gt RepM',\n    best_rate=0.019430623099697,\n    annual_rate=0.050484096291059,\n    annual_rate_std=0.031053473191362,\n    max_drawdown_percent=0.014741135160058,\n    profit_factor=1.9935291706949,\n    max_proto_corr=0.46565787023241,\n    signal_pos=0.17645083955317,\n    signal_neg=0.12332460925269,\n    best_serial=124067,\n    last_serial=124200,\n    dataset_last_time=1533934208, -- 2018-08-11 01:50:08\n    last_iteration_time=1534195215, -- 2018-08-14 02:20:15\n    iterations_count=3100,\n    epochs_count=155,\n    update_count=3,\n  },\n```\n\nThis trading strategy named 'aphrodite' has achieved following results:\n- Annual rate of return: 5%\n- Max drawdown: 1.4%\n- Profit factor: 1.99\n- Long signals: 17.6%\n- Short signals: 12.3%\n\n## Results\n\n### Testing on historical data\n\nAnnual rates for trading strategies are in range 1%...5%, which is not much.\n\nBut if you combine different independent trading strategies your rate of return summarizes\nwhile overall drawdown gets lesser.\n\nIn order to test these strategies all together on a historical data you may run [`analyze.lua`](analyze.lua):\n\n```bash\n$ luajit analyse.lua SBRF\nChanged dir to SBRF\nLoading Dataset...\nLoaded Dataset from dataset.gz\nDataset table: 24 array elements\nIndex has 70 records\nLoading aphrodite...Yes\nLoading vali...Yes\n...\nLoading astraea...Neg\nLoading mokosh...Neg\nLoaded 50 prototypes\nProcessing 24/24\n##      #trades Rate%   DDown%  PF      RF\n01      289     18.91   0.40    3.32    47.57\n02      328     17.85   0.72    1.96    24.60\n03      308     20.79   0.52    2.16    39.69\n04      312     13.99   0.82    1.86    17.01\n05      274     16.43   0.51    2.39    32.36\n06      268     29.58   0.64    2.82    46.26\n07      281     8.30    0.94    1.55    8.80\n08      287     8.83    0.76    1.55    11.59\n09      271     32.46   0.50    2.88    64.59\n10      299     20.06   0.40    2.47    49.72\n11      333     13.39   0.79    1.99    16.97\n12      320     27.22   0.60    2.70    45.23\n13      299     22.69   0.63    2.18    36.10\n14      283     19.30   0.74    2.07    26.00\n15      300     26.53   0.52    2.19    51.32\n16      319     32.60   0.52    2.79    62.40\n17      266     31.46   0.65    2.51    47.98\n18      267     28.53   0.69    2.37    41.27\n19      364     29.42   0.88    2.12    33.33\n20      335     47.23   0.69    2.81    68.32\n21      279     49.96   1.39    2.65    36.03\n22      300     45.65   2.69    2.12    16.98\n23      301     18.46   1.79    1.38    10.29\n24      293     -3.76   3.71    0.94    -1.01\nAverage rate: 23.99\n\n```\n\nPlease note that only mature (=profitable) trading strategies are loaded for this analysis.\n\nThere are 24 lines for 24 futures contracts.\nEach contract was virtually traded for about 3 months.\n\nOn each line you can see:\n- Total number of trades during that period\n- Annual rate of return\n- Maximum drawdown\n- Profit factor\n- Recovery factor\n\nIt shows 24% average annual rate of return for trading with combined profitable trading strategies.\nWhile for some periods it was as high as 50%, and in the last period it is negative.\n\n### Trade testing on a real brokerage account\n\nIn Nov 2017 I have put about $800 onto my brokerage account\nand started a small trading script that executed trades through my broker's API.\n\nThe result was disappointing.\nInitially it shown some good profitability for about 40% annual rate.\nBut later it started to slowly loose money.\n\n**I stopped experiment in July 2018 when there was only 1/4 of initial balance left.**\n\nDuring testing I tried to figure out why strategies show good result on history and fail in real trading.\nI've tested several hypothesis:\n- Possible errors in source code that calculates profitability on historical data.\n- Possible errors in source code that executes strategies on real-time market data.\n- Commissions for trades.\n- Slippage of orders execution due to delay of algorithm and broker's API.\n- Overfitting of strategies to historical data.\n\nI double-checked source code and fixed all possible errors.\n\nCommission for a trade of futures contract is very low: about $0.20 for Moscow Exchange.\nThe number of trades is few orders of magnitude less than in high-frequency-trading.\nSo commissions are not the reason.\n\nThen I tested slippage. **Slippage** occurs when there is a delay from moment, when your algorithm made decision, to a moment,\nwhen this decision was executed. During this period price can change.\nTherefor your order will be executed by another price, usually worse than you expected.\n\nInitially I believed that slippage has a negligible effect as I'm using 10 minute intervals.\nBut when I added 0.1% slippage and tested on historical data again, I've also got negative rate of return!\n**Thus slippage could be the main reason of such disappointing result.**\n\nIndeed, algorithm takes about few seconds to compute the next predicted position.\nAlso there is about 1..2 seconds of my broker's API delay for placing trade order.\n\nThe solution is obvious - you should include expected slippage in scoring function.\nThough it would require a lot of time (weeks) to evolve profitable trading strategies again.\n\n**I also came to a conclusion that genetic algorithm overfits to training data.**\nThis becomes obvious if you think about it:\n1. You run strategies on a particular historical data and compute scores.\n2. On each iteration you select sequences with highest scores.\n3. Thus sequences have no other way but to adapt to historical data they are trained on.\n\nThe question is: how to evolve strategies that generalize knowledge, but don't overfit to the training dataset?\n\nIf you modify scoring function and performance of a trading strategy on a test dataset,\nthen **genetic algorithm will overfit to test dataset the same way**.\n\nI believe we can use here the same approach that is used in training of neural networks: _early-stopping_. \n\nWe can calculate performance of trading strategy on both train and test datasets.\nPerformance on train dataset is used in scoring function, while performance on test dataset is not.\nIf we plot those performances on one graph they should grow over time.\n\nBut if after some time performance on test dataset starts to decline -\nwe should stop evolution of this trading strategy.\nAs it probably starts to overfit.\n\nThis is how _early-stopping_ could be applied to genetic algorithms.\n\n### Conclusion\n\nWorking on this project was really interesting, because I developed some new algorithm.\nBut it was also challenging as there were no ready solutions in Internet.\n\nI have found a way to apply Genetic Algorithm to a search problems,\nwhere you expect to find not one but many good solutions.\n\nA proposed Competing Genetic Algorithm can evolve several solutions at once.\n\nIn order to apply Competing Genetic Algorithm to your problem you need to describe:\n- solution as a sequence of symbols,\n- mutation and crossover operations for sequences,\n- scoring function for any sequence,\n- definition of a 'resource' in your task,\n- competing function for two sequences.\n\nAlgorithm has shown good results in searching for trading strategies\nfor given historical data.\nIt has evolved 50 trading strategies with annual rates from 1% to 5%.\n\nThough discovered trading strategies showed poor results in real trade.\n\nI came to the conclusion that main reasons for this are:\n- slippage due to delay of algorithm and broker's API,\n- overfitting of trading strategies to historical data.\n\nI believe that slippage was the main reason.\nWhile overfitting is not a mistake, but rather a natural behaviour for a genetic algorithm.\n\nSlippage could be addressed by adding expected slippage percent into scoring function.\nThus re-evolving all species again.\n\nOverfitting could be addressed by _early-stopping_ technique.\n\nWhat could be further improved for QGEN project:\n- Modify scoring function to use expected slippage when calculating scores.\n- Split train and test datasets like it is done for neural networks.\nTrack learning curves.\nUse early-stopping technique to address overfitting.\n- Single instrument's price is not enough.\nComplete datasets with global market indicators like: S\u0026P500 index, DAX index, \nmanufacturing activity, retail sales, etc.\n- Optimize operations.\nSome operations are rarely used, for example: `Rank` and `iRank` - and could be easily removed.\n\n\u003e This project is not maintained any more, because Torch/Lua library is no longer developed.\n\u003e It should be re-written in Python with modern and developed frameworks:\n\u003e Pytorch, Pandas, Sklearn, ...\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiovisgood%2Fqgen_lua","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiovisgood%2Fqgen_lua","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiovisgood%2Fqgen_lua/lists"}