{"id":16511258,"url":"https://github.com/pbenner/double-descent","last_synced_at":"2026-05-12T14:43:36.111Z","repository":{"id":69897584,"uuid":"497564935","full_name":"pbenner/double-descent","owner":"pbenner","description":"Simple examples of double descent (benign overfitting)","archived":false,"fork":false,"pushed_at":"2022-05-30T14:01:10.000Z","size":305,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-12T20:21:51.520Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-29T11:02:30.000Z","updated_at":"2024-09-03T15:21:53.000Z","dependencies_parsed_at":"2023-02-22T03:00:26.255Z","dependency_job_id":null,"html_url":"https://github.com/pbenner/double-descent","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fdouble-descent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fdouble-descent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fdouble-descent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fdouble-descent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/double-descent/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241476420,"owners_count":19968916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:59:31.802Z","updated_at":"2026-05-12T14:43:36.084Z","avatar_url":"https://github.com/pbenner.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv class=\"cell markdown\"\u003e\n\n\u003ch1 align=\"center\"\u003eExamples of Double Descent\u003c/h1\u003e\n\u003chr style=\"border:2px solid gray\"\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n##  Import python packages\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"3\"\u003e\n\n``` python\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom numpy.random import default_rng\nfrom sklearn.model_selection import LeaveOneOut, GridSearchCV\nfrom sklearn.linear_model import LinearRegression\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Intuition behind double descent\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"67\"\u003e\n\n``` python\nimport matplotlib.pyplot as plt\n%matplotlib inline\nplt.axis('off')\nplt.arrow(0, 0, 1.0,  0.0, head_width=0.05, head_length=0.1, fc='k', ec='k'); plt.figtext(0.90, 0.25, r'$f_1$', fontsize=26)\nplt.arrow(0, 0, 0.4,  0.6, head_width=0.05, head_length=0.1, fc='k', ec='k'); plt.figtext(0.50, 0.70, r'$f_2$', fontsize=26)\nplt.arrow(0, 0, 0.9, -0.2, head_width=0.05, head_length=0.1, fc='tab:blue'  , ec='tab:blue'  ); plt.figtext(0.83, 0.10, r'$f_3$', fontsize=26)\nplt.arrow(0, 0, 0.0,  0.8, head_width=0.05, head_length=0.1, fc='tab:blue'  , ec='tab:blue'  ); plt.figtext(0.10, 0.90, r'$f_4$', fontsize=26)\nplt.arrow(0, 0, 0.7,  0.4, head_width=0.05, head_length=0.1, fc='tab:orange', ec='tab:orange'); plt.figtext(0.70, 0.55, r'$y$', fontsize=26)\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/7073d3ea7bf231aa5215cb205aa9b641d5e1ab5b.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nAccording to the above image, we define four feature vectors\n![f\\_1 = (1, 0, 0)^\\\\top](https://latex.codecogs.com/png.latex?f_1%20%3D%20%281%2C%200%2C%200%29%5E%5Ctop \"f_1 = (1, 0, 0)^\\top\"),\n![f\\_2 = (0, 1, 0)^\\\\top](https://latex.codecogs.com/png.latex?f_2%20%3D%20%280%2C%201%2C%200%29%5E%5Ctop \"f_2 = (0, 1, 0)^\\top\"),\n![f\\_3 = (1, -0.2, 0)^\\\\top](https://latex.codecogs.com/png.latex?f_3%20%3D%20%281%2C%20-0.2%2C%200%29%5E%5Ctop \"f_3 = (1, -0.2, 0)^\\top\"),\nand\n![f\\_4 = (0, 0, 1)^\\\\top](https://latex.codecogs.com/png.latex?f_4%20%3D%20%280%2C%200%2C%201%29%5E%5Ctop \"f_4 = (0, 0, 1)^\\top\").\nThe response\n![y = (1, 1, 0)^\\\\top](https://latex.codecogs.com/png.latex?y%20%3D%20%281%2C%201%2C%200%29%5E%5Ctop \"y = (1, 1, 0)^\\top\")\nlies in the plane defined by the first two feature vectors\n![f\\_1](https://latex.codecogs.com/png.latex?f_1 \"f_1\") and\n![f\\_2](https://latex.codecogs.com/png.latex?f_2 \"f_2\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"80\"\u003e\n\n``` python\nf1 = np.array([1.0,  0.0, 0.0])\nf2 = np.array([0.0,  1.0, 0.0])\nf3 = np.array([1.0, -0.2, 0.0])\nf4 = np.array([0.0,  0.0, 1.0])\ny  = np.array([1.0,  1.0, 0.0])\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nWe study the length of the OLS solution, which for\n![n \u0026gt; p](https://latex.codecogs.com/png.latex?n%20%3E%20p \"n \u003e p\") is\ngiven by the parameter vector with minimum\n![\\\\ell\\_2](https://latex.codecogs.com/png.latex?%5Cell_2 \"\\ell_2\")-norm.\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"76\"\u003e\n\n``` python\ndef OLSnorm(X, y):\n    return np.linalg.norm(np.linalg.pinv(X.T@X)@X.T@y)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nAs baseline, we begin with the OLS solution if we are only given\n![f\\_1](https://latex.codecogs.com/png.latex?f_1 \"f_1\") and\n![f\\_2](https://latex.codecogs.com/png.latex?f_2 \"f_2\"):\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"81\"\u003e\n\n``` python\nOLSnorm(np.array([f1, f2]).T, y)\n```\n\n\u003cdiv class=\"output execute_result\" execution_count=\"81\"\u003e\n\n    1.4142135623730951\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nThe length of the solution decreases if we add another feature vector to\nX:\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"82\"\u003e\n\n``` python\nOLSnorm(np.array([f1, f2, f3]).T, y)\n```\n\n\u003cdiv class=\"output execute_result\" execution_count=\"82\"\u003e\n\n    1.2985663286116427\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nHowever, this is only the case if the feature vector is correlated with\n![y](https://latex.codecogs.com/png.latex?y \"y\"):\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"83\"\u003e\n\n``` python\nOLSnorm(np.array([f1, f2, f4]).T, y)\n```\n\n\u003cdiv class=\"output execute_result\" execution_count=\"83\"\u003e\n\n    1.4142135623730951\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Data set\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"4\"\u003e\n\n``` python\ndata = np.array([\n    0.001399613, -0.23436656,\n    0.971629779,  0.64689524,\n    0.579119475, -0.92635765,\n    0.335693937,  0.13000706,\n    0.736736086, -0.89294863,\n    0.492572335,  0.33854780,\n    0.737133774, -1.24171910,\n    0.563693769, -0.22523318,\n    0.877603280, -0.12962722,\n    0.141426545,  0.37632006,\n    0.307203910,  0.30299077,\n    0.024509308, -0.21162739,\n    0.843665029, -0.76468719,\n    0.771206067, -0.90455412,\n    0.149670258,  0.77097952,\n    0.359605608,  0.56466366,\n    0.049612895,  0.18897607,\n    0.409898906,  0.32531750,\n    0.935457898, -0.78703491,\n    0.149476207,  0.80585375,\n    0.234315216,  0.62944986,\n    0.455297119,  0.02353327,\n    0.102696671,  0.27621694,\n    0.715372314, -1.20379729,\n    0.681745393, -0.83059624 ]).reshape(25,2)\ny = data[:,1]\nX = data[:,0:1]\n\nplt.scatter(X[:,0], y)\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/8cc01adbe18ab36d6896cea353ce1220b4f3fdc0.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Linear regressor class\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nWe use the following linear regressor class for all double descent\nexamples. It takes only the first\n![p](https://latex.codecogs.com/png.latex?p \"p\") columns from the\nfeature matrix ![F](https://latex.codecogs.com/png.latex?F \"F\") and\ncomputes the minimum\n![\\\\ell\\_2](https://latex.codecogs.com/png.latex?%5Cell_2 \"\\ell_2\")-norm\nsolution when\n![n \u0026lt; p](https://latex.codecogs.com/png.latex?n%20%3C%20p \"n \u003c p\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"3\"\u003e\n\n``` python\nclass MyRidgeRegressor:\n    def __init__(self, p=3, alpha=0.0):\n        self.p     = p\n        self.theta = None\n        self.alpha = alpha\n    \n    def fit(self, F, y):\n        F = F[:, 0:self.p]\n        self.theta = np.linalg.pinv(F.transpose()@F + self.alpha*np.identity(F.shape[1]))@F.transpose()@y\n\n    def predict(self, F):\n        F = F[:, 0:self.p]\n        return F@self.theta\n\n    def set_params(self, **parameters):\n        for parameter, value in parameters.items():\n            setattr(self, parameter, value)\n        return self\n\n    def get_params(self, deep=True):\n        return {\"p\" : self.p, \"alpha\" : self.alpha}\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Model evaluation\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"4\"\u003e\n\n``` python\ndef evaluate_model(fg, X, y, n, ps, runs=10):\n    estimator = MyRidgeRegressor()\n    result = None\n    for i in range(runs):\n        F, y = fg(X, y, n, np.max(ps), random_state=i)\n\n        clf = GridSearchCV(estimator=estimator,\n                            param_grid=[{ 'p': list(ps) }],\n                            cv=LeaveOneOut(),\n                            scoring=\"neg_mean_squared_error\")\n        clf.fit(F, y)\n\n        if result is None:\n            result  = -clf.cv_results_['mean_test_score']\n        else:\n            result += -clf.cv_results_['mean_test_score']\n    \n    return result / runs\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  1 Random features\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nThis example of double descent uses covariates generated from a\nmultivariate normal distribution, i.e. the\n![j](https://latex.codecogs.com/png.latex?j \"j\")-th column of X is given\nby\n\n![\n    f\\_j \\~ \\\\sim \\~ N(0, I\\_n)\n](https://latex.codecogs.com/png.latex?%0A%20%20%20%20f_j%20~%20%5Csim%20~%20N%280%2C%20I_n%29%0A \"\n    f_j ~ \\sim ~ N(0, I_n)\n\")\n\nIn order to obtain a double descent phenomenon, we need that some\n![f\\_j](https://latex.codecogs.com/png.latex?f_j \"f_j\") are highly\ncorrelated with ![y](https://latex.codecogs.com/png.latex?y \"y\") and the\nremaining features are uncorrelated. Hence, we define\n![\\\\theta\\_j = 1/j](https://latex.codecogs.com/png.latex?%5Ctheta_j%20%3D%201%2Fj \"\\theta_j = 1/j\")\nand generate observations\n![y](https://latex.codecogs.com/png.latex?y \"y\") according to the linear\nmodel\n\n![\n    y = X \\\\theta + \\\\epsilon\n](https://latex.codecogs.com/png.latex?%0A%20%20%20%20y%20%3D%20X%20%5Ctheta%20%2B%20%5Cepsilon%0A \"\n    y = X \\theta + \\epsilon\n\")\n\nwhere\n![\\\\epsilon \\\\sim N(0, \\\\sigma^2 I\\_n)](https://latex.codecogs.com/png.latex?%5Cepsilon%20%5Csim%20N%280%2C%20%5Csigma%5E2%20I_n%29 \"\\epsilon \\sim N(0, \\sigma^2 I_n)\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"22\"\u003e\n\n``` python\nclass RandomFeatures():\n    def __init__(self, scale = 1):\n        self.scale = scale\n\n    def __call__(self, X, y, n, p, random_state=42):\n        rng = default_rng(seed=random_state)\n        mu = np.repeat(0, n)\n        sigma = np.identity(n)\n        F = rng.multivariate_normal(mu, sigma, size=p).T\n        theta = np.array([ 1/(j+1) for j in range(p) ])\n        y = F@theta + rng.normal(0, self.scale, size=n)\n        return F, y\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Results\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"23\"\u003e\n\n``` python\nps = range(3, 151)\nscales = [0.1, 1, 10]\nresult = [ evaluate_model(RandomFeatures(scale=scale), None, None, 20, ps, runs=100) for scale in scales ]\nresult = np.array(result)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"31\"\u003e\n\n``` python\np = plt.plot(ps, result.T)\n[ plt.axvline(x=ps[np.argmin(result[i])], color=p[i].get_color(), alpha=0.5, linestyle='--') for i in range(result.shape[0]) ]\nplt.legend(scales)\nplt.xscale(\"log\")\nplt.yscale(\"log\")\nplt.xlabel(\"p\")\nplt.ylabel(\"average mse\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/30caccee17aa08ef0dc22a7499e15f28c324c949.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  2 Noisy polynomial features\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nIncreasing the number of features not always leads to an increase in\nmodel complexity. There are several cases where adding more features\nactually constraints the model, which we call here *implicit\nregularization*. For instance, adding features to\n![F](https://latex.codecogs.com/png.latex?F \"F\") that are uncorrelated\nwith ![y](https://latex.codecogs.com/png.latex?y \"y\") will generally\nlead to a stronger regularization (e.g. when adding columns to\n![F](https://latex.codecogs.com/png.latex?F \"F\") that are drawn from a\nnormal distribution). Here, we test a different strategy to increase\nimplicit regularization. We implement a function called\n*compute\\_noisy\\_polynomial\\_features* that computes noisy polynomial\nfeatures ![F](https://latex.codecogs.com/png.latex?F \"F\") from\n![X = (x)](https://latex.codecogs.com/png.latex?X%20%3D%20%28x%29 \"X = (x)\").\nThe ![j](https://latex.codecogs.com/png.latex?j \"j\")-th column of\n![F \\\\in \\\\mathbb{R}^{n \\\\times p}](https://latex.codecogs.com/png.latex?F%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D \"F \\in \\mathbb{R}^{n \\times p}\")\nis given by\n\n![\n    f\\_j\n    =\n    \\\\begin{cases}\n        (1, \\\\dots, 1)^\\\\top \u0026 \\\\text{if $j = 1$}\\\\\\\\\n        x^{k(j-2)} + \\\\epsilon\\_j \u0026 \\\\text{if $j \u0026gt; 1$}\n    \\\\end{cases}\n](https://latex.codecogs.com/png.latex?%0A%20%20%20%20f_j%0A%20%20%20%20%3D%0A%20%20%20%20%5Cbegin%7Bcases%7D%0A%20%20%20%20%20%20%20%20%281%2C%20%5Cdots%2C%201%29%5E%5Ctop%20%26%20%5Ctext%7Bif%20%24j%20%3D%201%24%7D%5C%5C%0A%20%20%20%20%20%20%20%20x%5E%7Bk%28j-2%29%7D%20%2B%20%5Cepsilon_j%20%26%20%5Ctext%7Bif%20%24j%20%3E%201%24%7D%0A%20%20%20%20%5Cend%7Bcases%7D%0A \"\n    f_j\n    =\n    \\begin{cases}\n        (1, \\dots, 1)^\\top \u0026 \\text{if $j = 1$}\\\\\n        x^{k(j-2)} + \\epsilon_j \u0026 \\text{if $j \u003e 1$}\n    \\end{cases}\n\")\n\nwhere\n![k(j) = (j\\\\mod m) + 1](https://latex.codecogs.com/png.latex?k%28j%29%20%3D%20%28j%5Cmod%20m%29%20%2B%201 \"k(j) = (j\\mod m) + 1\"),\n![m \\\\le p](https://latex.codecogs.com/png.latex?m%20%5Cle%20p \"m \\le p\")\ndenotes the maximum degree (*max\\_degree* parameter) and\n![\\\\epsilon\\_j](https://latex.codecogs.com/png.latex?%5Cepsilon_j \"\\epsilon_j\")\nis a vector of ![n](https://latex.codecogs.com/png.latex?n \"n\")\nindependent draws from a normal distribution with mean\n![\\\\mu = 0](https://latex.codecogs.com/png.latex?%5Cmu%20%3D%200 \"\\mu = 0\")\nand standard deviation\n![\\\\sigma](https://latex.codecogs.com/png.latex?%5Csigma \"\\sigma\"). With\n![x^k](https://latex.codecogs.com/png.latex?x%5Ek \"x^k\") we denote the\n![k](https://latex.codecogs.com/png.latex?k \"k\")-th power of each\nelement in ![x](https://latex.codecogs.com/png.latex?x \"x\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"26\"\u003e\n\n``` python\nclass NoisyPolynomialFeatures():\n    def __init__(self, max_degree = 15, scale = 0.1):\n        self.max_degree = max_degree\n        self.scale      = scale\n\n    def __call__(self, X, y, n, p, random_state=42):\n        x = X if len(X.shape) == 1 else X[:,0]\n        rng = default_rng(seed=random_state)\n        F = np.array([]).reshape(x.shape[0], 0)\n        F = np.insert(F, 0, np.repeat(1, len(x)), axis=1)\n        for k in range(p):\n            d = (k % self.max_degree)+1\n            f = x**d + rng.normal(size=len(x), scale=self.scale)\n            F = np.insert(F, k+1, f, axis=1)\n        return F, y\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Results\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nEvaluate the performance for\n![p = 3, 4, \\\\dots, 200](https://latex.codecogs.com/png.latex?p%20%3D%203%2C%204%2C%20%5Cdots%2C%20200 \"p = 3, 4, \\dots, 200\"),\nand\n![\\\\sigma \\\\in \\\\{0.01, 0.02, 0.05\\\\}](https://latex.codecogs.com/png.latex?%5Csigma%20%5Cin%20%5C%7B0.01%2C%200.02%2C%200.05%5C%7D \"\\sigma \\in \\{0.01, 0.02, 0.05\\}\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"397\"\u003e\n\n``` python\nps = range(3, 201)\nscales = [0.01, 0.02, 0.05]\nresult = [ evaluate_model(NoisyPolynomialFeatures(scale=scale), X, y, len(y), ps, runs=100) for scale in scales ]\nresult = np.array(result)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"406\"\u003e\n\n``` python\np = plt.plot(ps, result.T)\n[ plt.axvline(x=ps[np.argmin(result[i])], color=p[i].get_color(), alpha=0.5, linestyle='--') for i in range(result.shape[0]) ]\nplt.legend(scales)\nplt.xscale(\"log\")\nplt.xlabel(\"p\")\nplt.yscale(\"log\")\nplt.ylim(0.1,10)\nplt.ylabel(\"average mse\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/a7e914acc29f369cf6bbbfaaa4e246dd88ea744a.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  3 Polynomial features combined with random features\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nAnother possibility to obtain double descent curves is to start off with\na standard polynomial regression task and to add random (uncorrelated)\nfeatures. The ![j](https://latex.codecogs.com/png.latex?j \"j\")-th column\nof\n![F \\\\in \\\\mathbb{R}^{n \\\\times p}](https://latex.codecogs.com/png.latex?F%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D \"F \\in \\mathbb{R}^{n \\times p}\")\nis given by\n\n![\n    f\\_j\n    =\n    \\\\begin{cases}\n        x^{j-1} \u0026 \\\\text{if $j \\\\le m+1$}\\\\\\\\\n        \\\\epsilon\\_j \u0026 \\\\text{if $j \u0026gt; m+1$}\n    \\\\end{cases}\n](https://latex.codecogs.com/png.latex?%0A%20%20%20%20f_j%0A%20%20%20%20%3D%0A%20%20%20%20%5Cbegin%7Bcases%7D%0A%20%20%20%20%20%20%20%20x%5E%7Bj-1%7D%20%26%20%5Ctext%7Bif%20%24j%20%5Cle%20m%2B1%24%7D%5C%5C%0A%20%20%20%20%20%20%20%20%5Cepsilon_j%20%26%20%5Ctext%7Bif%20%24j%20%3E%20m%2B1%24%7D%0A%20%20%20%20%5Cend%7Bcases%7D%0A \"\n    f_j\n    =\n    \\begin{cases}\n        x^{j-1} \u0026 \\text{if $j \\le m+1$}\\\\\n        \\epsilon_j \u0026 \\text{if $j \u003e m+1$}\n    \\end{cases}\n\")\n\nwhere ![m](https://latex.codecogs.com/png.latex?m \"m\") denotes the\nmaximum degree of the polynomial features and\n![\\\\epsilon\\_j \\~ \\\\sim \\~ N(0, \\\\sigma^2 I\\_n)](https://latex.codecogs.com/png.latex?%5Cepsilon_j%20~%20%5Csim%20~%20N%280%2C%20%5Csigma%5E2%20I_n%29 \"\\epsilon_j ~ \\sim ~ N(0, \\sigma^2 I_n)\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"31\"\u003e\n\n``` python\nclass PolynomialWithRandomFeatures():\n    def __init__(self, max_degree = 15, scale = 0.1):\n        self.max_degree = max_degree\n        self.scale      = scale\n\n    def __call__(self, X, y, n, p, random_state=42):\n        x = X if len(X.shape) == 1 else X[:,0]\n        rng = default_rng(seed=random_state)\n        F = np.array([]).reshape(x.shape[0], 0)\n        # Generate polynomial features\n        for deg in range(np.min([p, self.max_degree+1])):\n            F = np.insert(F, deg, x**deg, axis=1)\n        if p \u003c= self.max_degree+1:\n            return F, y\n        # Generate random features\n        for j in range(p - self.max_degree - 1):\n            f = rng.normal(size=F.shape[0], scale=self.scale)\n            F = np.insert(F, F.shape[1], f, axis=1)\n        return F, y\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Results\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"61\"\u003e\n\n``` python\nps = list(range(3, 20)) + [30, 50, 100, 150, 200, 300, 400, 500, 1000]\nscales = [0.01, 0.1, 1.0]\nresult = [ evaluate_model(PolynomialWithRandomFeatures(scale=scale), X, y, len(y), ps, runs=100) for scale in scales ]\nresult = np.array(result)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"62\"\u003e\n\n``` python\np = plt.plot(ps, result.T)\n[ plt.axvline(x=ps[np.argmin(result[i])], color=p[i].get_color(), alpha=0.5, linestyle='--') for i in range(result.shape[0]) ]\nplt.legend(scales)\nplt.xscale(\"log\")\nplt.xlabel(\"p\")\nplt.yscale(\"log\")\n#plt.ylim(0.1,10)\nplt.ylabel(\"average mse\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/4f8963836a2352107cc7e0ffaa89f18e29527ebe.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  4 Legendre polynomial\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"6\"\u003e\n\n``` python\nclass LegendrePolynomialFeatures():\n    def __call__(self, X, y, n, p, random_state=42):\n        x = X if len(X.shape) == 1 else X[:,0]\n        F = np.array([]).reshape(x.shape[0], 0)\n        # Generate polynomial features\n        for deg in range(p):\n            l = np.polynomial.legendre.Legendre([0]*deg + [1], domain=[0,1])\n            F = np.insert(F, deg, l(x), axis=1)\n        return F, y\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Demonstration\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"101\"\u003e\n\n``` python\nF, _ = LegendrePolynomialFeatures()(X, y, len(y), 1000)\ng = np.linspace(0, 1, 10000)\nG, _ = LegendrePolynomialFeatures()(g, y, len(y), 1000)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"107\"\u003e\n\n``` python\nclf = MyRidgeRegressor(p=8)\nclf.fit(F, y)\nplt.plot(g, clf.predict(G))\nplt.scatter(X, y)\nplt.title(\"p = 8\")\nplt.xlabel(\"x\")\nplt.ylabel(\"y\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/15d9abf8bea424141547921f56c959835df26df8.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"114\"\u003e\n\n``` python\nclf = MyRidgeRegressor(p=50)\nclf.fit(F, y)\nplt.plot(g, clf.predict(G))\nplt.scatter(X, y)\nplt.title(\"p = 100\")\nplt.xlabel(\"x\")\nplt.ylabel(\"y\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/845a7b46fa2ef8f5c4694544ed26e772dc1c770c.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"108\"\u003e\n\n``` python\nclf = MyRidgeRegressor(p=1000)\nclf.fit(F, y)\nplt.plot(g, clf.predict(G))\nplt.scatter(X, y)\nplt.title(\"p = 1000\")\nplt.xlabel(\"x\")\nplt.ylabel(\"y\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/aab1daf89cfcc9db85e1405a643d0626ab3c4fb7.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Results\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"115\"\u003e\n\n``` python\nps = list(range(3, 20)) + [30, 50, 100, 150, 200, 300, 400, 500, 1000]\nresult = evaluate_model(LegendrePolynomialFeatures(), X, y, len(y), ps, runs=1)\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"119\"\u003e\n\n``` python\np = plt.plot(ps, result)\nplt.axvline(x=ps[np.argmin(result)], color=p[0].get_color(), alpha=0.5, linestyle='--')\nplt.legend(scales)\nplt.xscale(\"log\")\nplt.xlabel(\"p\")\nplt.yscale(\"log\")\nplt.ylim(0.1,10)\nplt.ylabel(\"mse\")\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/66e43fdad7cb52784d8273bce359a220bb6c30b0.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\n------------------------------------------------------------------------\n\n##  Correlation analysis\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nTo understand why we are seeing a double descent curve for Legendre\npolynomials, we compute the correlation between features\n![f\\_j](https://latex.codecogs.com/png.latex?f_j \"f_j\") and response\n![y](https://latex.codecogs.com/png.latex?y \"y\"):\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"10\"\u003e\n\n``` python\ncor = []\nfor i in range(F.shape[1]):\n    cor.append(np.abs(np.correlate(F[:,i], y)))\n```\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell markdown\"\u003e\n\nA plot of the result shows that the correlation decreases exponentially,\nwhereby we are essentially adding noise to the feature matrix\n![F](https://latex.codecogs.com/png.latex?F \"F\").\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\" execution_count=\"38\"\u003e\n\n``` python\ncor_x = np.log(list(range(1, len(cor)+1)))\ncor_x = np.array(cor_x).reshape(-1, 1)\ncor_y = np.log(cor)\ncor_z = np.linspace(1, len(cor), 100).reshape(-1, 1)\n\nclf = LinearRegression()\nclf.fit(cor_x, cor_y)\n\nplt.plot(cor)\nplt.plot(cor_z, np.exp(clf.predict(np.log(cor_z))))\nplt.xscale('log')\nplt.xlabel('degree')\nplt.yscale('log')\nplt.ylabel('absolute correlation')\nplt.show()\n```\n\n\u003cdiv class=\"output display_data\"\u003e\n\n![](https://raw.githubusercontent.com/pbenner/double-descent/master/README_files/28a48289a3cd6b3edbf6f113cff9c4abda3d6883.png)\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003cdiv class=\"cell code\"\u003e\n\n``` python\n```\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fdouble-descent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbenner%2Fdouble-descent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fdouble-descent/lists"}