{"id":19150736,"url":"https://github.com/mantasu/cs224n","last_synced_at":"2025-05-07T05:23:58.045Z","repository":{"id":131593747,"uuid":"521419609","full_name":"mantasu/cs224n","owner":"mantasu","description":"Solutions for CS224n (2022)","archived":false,"fork":false,"pushed_at":"2024-04-19T20:41:06.000Z","size":37214,"stargazers_count":62,"open_issues_count":2,"forks_count":21,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-19T18:34:09.807Z","etag":null,"topics":["cs224n","deep-learning","natural-language-processing","pytorch","rnn","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mantasu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-04T21:24:00.000Z","updated_at":"2025-04-11T12:35:53.000Z","dependencies_parsed_at":"2025-04-19T16:05:17.213Z","dependency_job_id":null,"html_url":"https://github.com/mantasu/cs224n","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mantasu%2Fcs224n","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mantasu%2Fcs224n/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mantasu%2Fcs224n/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mantasu%2Fcs224n/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mantasu","download_url":"https://codeload.github.com/mantasu/cs224n/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252819412,"owners_count":21809021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cs224n","deep-learning","natural-language-processing","pytorch","rnn","transformers"],"created_at":"2024-11-09T08:12:55.821Z","updated_at":"2025-05-07T05:23:58.012Z","avatar_url":"https://github.com/mantasu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eCS224n: Assignment Solutions\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\u003cb\u003eNatural Language Processing with Deep Learning\u003c/b\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003ci\u003eStanford - Winter 2022\u003c/i\u003e\u003c/p\u003e\n\n## About\n\n### Overview\n\nThese are my solutions for the **CS224n** course assignments offered by _Stanford University_ (Winter 2022). Written questions are explained in detail, the code is brief and commented (see examples below). From what I investigated, these should be the most explained solutions.\n\n\u003e Check out my solutions for **[CS231n](https://github.com/mantasu/cs231n)**. From what I've checked, they should be the shortest.\n\n### Main sources (official)\n* [**Course page**](http://web.stanford.edu/class/cs224n/index.html)\n* [**Assignments**](http://web.stanford.edu/class/cs224n/index.html#schedule)\n* [**Lecture videos** (2021)](https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ)\n\n\u003cbr\u003e\n\n## Requirements\nFor **conda** users, the instructions on how to set-up the environment are given in the handouts. For `pip` users, I've gathered all the requirements in one [file](requirements.txt). Please set up the virtual environment and install the dependencies (for _linux_ users):\n\n```shell\n$ python -m venv venv\n$ source venv/bin/activate\n$ pip install -r requirements.txt\n```\n\nYou can install everything with **conda** too (see [this](https://stackoverflow.com/questions/51042589/conda-version-pip-install-r-requirements-txt-target-lib)). For code that requires **Azure** _Virtual Machines_, I was able to run everything successfully on **Google Colab** with a free account.\n\n\u003e Note: Python 3.8 or newer should be used\n\n\u003cbr\u003e\n\n## Solutions\n\n### Structure\n\nFor every assignment, i.e., for directories `a1` through `a5`, there is coding and written parts. The `solutions.pdf` files are generated from latex directories where the provided templates were filled while completing the questions in `handout.pdf` files and the code.\n\n### Assignments\n\n* [A1](a1): Exploring Word Vectors (_Done_)\n* [A2](a2): word2vec (_Done_)\n* [A3](a3): Dependency Parsing (_Done_)\n* [A4](a4): Neural Machine Translation with RNNs and Analyzing NMT Systems (_Done_)\n* [A5](a5): Self-Attention, Transformers, and Pretraining (_Done_)\n\n\u003cbr\u003e\n\n## Examples\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eWritten (Attention Exploration)\u003c/b\u003e\u003c/summary\u003e\n\u003cbr\u003e\n\n**Question (b) ii.**\n\n\u003chr\u003e\n\n\u003csub\u003e\nAs before, let $v_a$ and $v_b$ be two value vectors corresponding to key vectors $k_a$ and $k_b$, respectively. Assume that \u003cb\u003e(1)\u003c/b\u003e all key vectors are orthogonal, so $k_i^\\top k_j = 0$ for all $i \\neq j$; and \u003cb\u003e(2)\u003c/b\u003e all key vectors have norm $1$ (recall that a vector $x$ has norm 1 iff $x^\\top x = 1$). \u003cb\u003eFind an expression\u003c/b\u003e for a query vector $q$ such that $c \\approx \\frac{1}{2}(v_a + v_b)$.\u003cbr\u003e\n\u003csub\u003e\n\u003cb\u003eHint\u003c/b\u003e: while the \u003ci\u003esoftmax\u003c/i\u003e function will never \u003ci\u003eexactly\u003c/i\u003e average the two vectors, you can get close by using a large scalar multiple in the expression.\n\u003c/sub\u003e\u003c/sub\u003e\n\n\u003chr\u003e\n\n\u003cbr\u003e\n\n**Answer**\n\n\u003chr\u003e\n\n\u003csub\u003e\nAssume that $\\mathbf{c}$ is approximated as follows:\n\u003c/sub\u003e\n\n\u003csub\u003e\n$$\\mathbf{c}\\approx 0.5 \\mathbf{v}_a + 0.5 \\mathbf{v}_b$$\n\u003c/sub\u003e\n\n\u003csub\u003e\nThis means we want $\\alpha_a\\approx0.5$ and $\\alpha_b\\approx0.5$, which can be achieved when (whenever $i\\ne a$ and $i\\ne b$):\n\u003c/sub\u003e\n\n\u003csub\u003e\n$$\\mathbf{k}_a^{\\top}\\mathbf{q}\\approx\\mathbf{k}_b^{\\top}\\mathbf{q} \\gg \\mathbf{k}_i^{\\top}\\mathbf{q}$$\n\u003c/sub\u003e\n\n\u003csub\u003e\nLike explained in the previous question, if the dot product is big, the probability mass will also be big and we want a balanced mass between $\\alpha_a$ and $\\alpha_b$. $\\mathbf{q}$ will be largest for $\\mathbf{k}_a$ and $\\mathbf{k}_b$ when it is a large multiplicative of a vector that contains a component in $\\mathbf{k}_a$ direction and in $\\mathbf{k}_b$ direction:\n\u003c/sub\u003e\n\n\u003csub\u003e\n$$\\mathbf{q}=\\beta(\\mathbf{k}_a + \\mathbf{k}_b),\\quad\\text{where } \\beta \\gg 0$$\n\u003c/sub\u003e\n\n\u003csub\u003e\nNow, since the keys are orthogonal to each other, it is easy to see that:\n\u003c/sub\u003e\n\n\u003csub\u003e\n$$\\mathbf{k}_a^{\\top}\\mathbf{q}=\\beta; \\quad \\mathbf{k}_b^{\\top}\\mathbf{q}=\\beta; \\quad \\mathbf{k}_i^{\\top}\\mathbf{q}=0, \\text{ whever }i\\ne a\\text{ and }i\\ne b$$\n\u003c/sub\u003e\n\n\u003csub\u003e\nThus when we exponentiate, only $\\exp(\\beta)$ will matter, because $\\exp(0)$ will be insignificant to the probability mass. We get that:\n\u003c/sub\u003e\n\n\u003csub\u003e\n$$\\alpha_a=\\alpha_b=\\frac{\\exp(\\beta)}{n-2 + 2\\exp(\\beta)}\\approx\\frac{\\exp(\\beta)}{2\\exp(\\beta)}\\approx\\frac{1}{2}, \\text{ for }\\beta \\gg 0$$\n\u003c/sub\u003e\n\n\u003chr\u003e\n\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eCode (Negative Sampling)\u003c/b\u003e\u003c/summary\u003e\n\u003csub\u003e\n\n```python\ndef negSamplingLossAndGradient(\n    centerWordVec,\n    outsideWordIdx,\n    outsideVectors,\n    dataset,\n    K=10\n):\n    \"\"\" Negative sampling loss function for word2vec models\n\n    Implement the negative sampling loss and gradients for a centerWordVec\n    and a outsideWordIdx word vector as a building block for word2vec\n    models. K is the number of negative samples to take.\n\n    Note: The same word may be negatively sampled multiple times. For\n    example if an outside word is sampled twice, you shall have to\n    double count the gradient with respect to this word. Thrice if\n    it was sampled three times, and so forth.\n\n    Arguments/Return Specifications: same as naiveSoftmaxLossAndGradient\n    \"\"\"\n\n    # Negative sampling of words is done for you. Do not modify this if you\n    # wish to match the autograder and receive points!\n    negSampleWordIndices = getNegativeSamples(outsideWordIdx, dataset, K)\n    indices = [outsideWordIdx] + negSampleWordIndices\n\n    ### YOUR CODE HERE (~10 Lines)\n\n    ### Please use your implementation of sigmoid in here.\n\n    # We will multiply where same words are involved, avoiding recalculations\n    un, idx, n_reps = np.unique(indices, return_index=True, return_counts=True)\n    U_concat = outsideVectors[un]\n    \n    # For convenience\n    n_reps[idx==0] *= -1\n    U_concat[idx!=0] *= -1\n    S = sigmoid(centerWordVec @ U_concat.T)\n    \n    # Find loss and derivatives w.r.t. v_c, U\n    loss = -(np.abs(n_reps) * np.log(S)).sum()\n    gradCenterVec = np.abs(n_reps) * (1 - S) @ -U_concat\n    gradOutsideVecs = np.zeros_like(outsideVectors)\n    gradOutsideVecs[un] = n_reps[:, None] * np.outer(1 - S, centerWordVec)\n\n    ### END YOUR CODE\n\n    return loss, gradCenterVec, gradOutsideVecs\n```\n\n\u003c/sub\u003e\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmantasu%2Fcs224n","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmantasu%2Fcs224n","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmantasu%2Fcs224n/lists"}