{"id":19214777,"url":"https://github.com/garethjns/kaggle-integersequencelearning","last_synced_at":"2026-05-02T23:41:50.736Z","repository":{"id":111936576,"uuid":"69271152","full_name":"garethjns/Kaggle-IntegerSequenceLearning","owner":"garethjns","description":"R scripts for Kaggle's integer sequence learning","archived":false,"fork":false,"pushed_at":"2017-04-05T10:59:14.000Z","size":654,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-04T18:13:24.826Z","etag":null,"topics":["integer-sequence-prediction","kaggle","r"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/garethjns.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-09-26T16:46:11.000Z","updated_at":"2018-12-22T13:26:50.000Z","dependencies_parsed_at":"2023-03-13T13:31:11.108Z","dependency_job_id":null,"html_url":"https://github.com/garethjns/Kaggle-IntegerSequenceLearning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garethjns%2FKaggle-IntegerSequenceLearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garethjns%2FKaggle-IntegerSequenceLearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garethjns%2FKaggle-IntegerSequenceLearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garethjns%2FKaggle-IntegerSequenceLearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/garethjns","download_url":"https://codeload.github.com/garethjns/Kaggle-IntegerSequenceLearning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240288248,"owners_count":19777634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["integer-sequence-prediction","kaggle","r"],"created_at":"2024-11-09T14:11:18.696Z","updated_at":"2025-09-16T04:32:46.968Z","avatar_url":"https://github.com/garethjns.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kaggle Integer Sequence Learning\n[Data and description](https://www.kaggle.com/c/integer-sequence-learning) | [Solutions post](http://blog.kaggle.com/2016/11/21/integer-sequence-learning-competition-solution-write-up-team-1-618-gareth-jones-laurent-borderie/)\n\n- Final position: 17/269 \n\n# Scripts\n\n**classifyingSequences.ipynb** - Notebook proposing automatic classification of basic properties of sequences, which can be used to guide which approaches are more likely to work for a sequence. \n\n**BestRun.R** - Final submission before merging in alternative public kernel solutions. See below and [Kaggle forums](https://www.kaggle.com/c/integer-sequence-learning/forums/t/24971/solutions) for details.\n\n\n# Aim\nPredict the next value in integer sequences from the [Online Encyclopedia of Integer Sequences](http://oeis.org/).\n\n# Methods\nSequentially test sequences with different solvers, ordered approximately by assumed reliability. \n\n## Solvers\nSequence solvers were applied in the order listed, based on assumed reliability. More detailed descriptions of each below. The brackets show the number of apparent solutions provided by each solver.\n\n - [Common differences](http://www.purplemath.com/modules/nextnumb.htm) (**2820**)\n - Common differences with variable step size (**2750** reliable and **9246** \"dodgy\")\n - Pattern search (**1785**)\n - Pattern search on common difference levels (**435**)\n - [Recurrence relation](https://en.wikipedia.org/wiki/Recurrence_relation) (**9849**)\n - Linear fitting using previous points (**37143**)\n - Non-linear fitting using previous points (**9000**)\n - Borrowed fallbacks (**\u003c4000**)\n - [Mode-fallback](https://www.kaggle.com/wcukierski/integer-sequence-learning/mode-benchmark/run/255053/code) (**40196**)\n\n### Common differences (diffTableSolve, diffTablePredict)\nTake the difference between each adjacent term in the sequence, if the differences are all the same, the next term can be predicted. It's a special case of the recurrence relation (and presumably produces redundant predictions when both are used).\n\nFor example: \n```R\nsequence = [2, 4, 6, 8, 10]\nfirstDfferences = [4-2, 6-4, 8-6, 10-8] = [2, 2, 2, 2]\n\nnextTerm = 10 + 2 = 12\n```\n\nIf the differences aren't the same, continue down: \n```R \nsequence = [2, 4, 7, 11, 16] \nfirstDiffs = [4-2, 7-4, 11-7, 16-11] = [2, 3, 4, 5] \nsecondDiffs = [3-2, 4-3, 5-4] = [1, 1, 1]\n\nnextTerm = 16 + 5 + 1 = 22\n```\n\nEach level of differences decreases in length by 1, and false positives are a risk when too few values are available at level to be sure the level is constant.\n\n### Common differences with variable spacing (diffTableSolve2, diffTablePredict2)\nAn extension of the method of common differences is to take differences over a step size \u003e 1, instead of from only adjacent terms.\n\nFor example, with a step of 2: \n```R\nsequence = [2, 6, 8, 12] \nfirstDiffs = [8-2, 12-6] = [6, 6]\n\nnextTerm  = ((n+1)-step) + diff = 8 + 6 = 14\n```\n\nIn this case, the length of each level reduces by the step size, making false positives are a greater risk with this approach. \"Dodgy\" solutions were ones that were possible solutions, but were shorter than some threshold. If no solutions were proposed by pattern search or pattern search on difference levels, the dodgy solution was used.\n\n### Pattern search (patternSearch)\n- Starting with the second half of a sequence, pattern search compares it to the first half.\n - If the proportion of matched values is greater than some threshold, the match is used to extend the sequence.\n - If there's no match, one terms are dropped sequentially from the middle of the sequence (not the end) and compared to the (growing) start of the sequence.\n\n### Pattern search on common difference levels (diffTablePattern)\nThis does a pattern search on each common difference level. In theory it might be able to find patterns in different levels, even if the difference levels never converge to a constant value... Maybe.\n\n### Recurrence relation (RRSolve)\nBased on [https://www.kaggle.com/ncchen/integer-sequence-learning/recurrence-relation/notebook](https://www.kaggle.com/ncchen/integer-sequence-learning/recurrence-relation/notebook)\n\n### Linear fitting (fitModPP)\nAttempts to fits polynomial to sequence using rolling window of n previous points. The last point of the sequence is held out and multiple functions fit. The held out point is used to assess the fits and determine the best, if accuracy is above a certain threshold, the winning function is refit on the entire sequence. The next (unknown term) is then predicted from this fit.\n\n### Mode-fallback\nIf no solution was proposed by a solver, the mode of the sequence was used, as per the [benchmark](https://www.kaggle.com/wcukierski/integer-sequence-learning/mode-benchmark/run/255053).\n\n# Conclusions\nThe application of specific solvers is effective to some extent, but still leaves a large proportion of the sequences unsolved (ie. mode fallback used) or incorrectly predicted.\n\n## Limitations\n\n### False positives\nFalse positives were a significant danger. Fit an infinite number of non-linear functions and an infinite number will perfectly describe the sequence, but it doesn't mean it's correct one, and if it isn't, it's very unlikely to have any predictive value. Competition scoring was binary accuracy - the predicted term exactly correct or wrong.\n\n### Generality\n\nLooking at the number of mode-fallbacks (~40,000) and linear fits (~37,000) it's clear that the majority of the sequences are constructed by still-unknown functions (unknown in a world where the OEIS doesn't exist, that is).\n\nFor the linear fits, 10496 of the fits were scored perfectly, meaning ~27,000 only represent a polynomial estimations of another function. In addition, an unknown proportion of the 10496 \"perfect\" fits will be correct by chance, meaning only \u003c1/3rd of the ~37,000 linear fits are likely to be true linear polynomial functions.\n\nIt might be possible to implement more solvers that would find more of these unknown sequences.\n\n## Improvements\n\n### Reliability\nSolvers were applied on the basis of assumed reliability - this was measured (roughly) by how their implementation and priority affected overall score. A better approach would be to run each solver individually on a set of data and assess in isolation.\n\n### Sequence classification\nIt's also likely that solver reliability varies as a function of sequence class. In fact, a sensible classification mechanism for a sequence is the best approach to predict the next term (although this is outside the scope of the Kaggle challenge). It might be sensible to determine certain basic properties of sequences first, then cluster the sequences in an unsupervised fashion. Solver reliability could then be assessed per-cluster and solvers applied in a guided fashion depending on the clusters properties. See this **classifyingSequences.ipynb** / [here](https://www.kaggle.com/garethjns/integer-sequence-learning/classifying-tagging-sequences) for further discussion and examples.\n\n![Sequence classifcation](Images/figure2.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgarethjns%2Fkaggle-integersequencelearning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgarethjns%2Fkaggle-integersequencelearning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgarethjns%2Fkaggle-integersequencelearning/lists"}