{"id":19946115,"url":"https://github.com/quantco/vectorization-tutorial","last_synced_at":"2025-05-03T16:32:55.473Z","repository":{"id":188960949,"uuid":"679772843","full_name":"Quantco/vectorization-tutorial","owner":"Quantco","description":"This repo is used to illustrate the vectorization principle in a tutorial. It was created for a CEOI workshop in August 2023 and might not be kept up-to-date. ","archived":false,"fork":false,"pushed_at":"2025-04-07T04:11:28.000Z","size":85981,"stargazers_count":2,"open_issues_count":2,"forks_count":2,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-07T05:22:55.809Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-17T15:27:01.000Z","updated_at":"2025-01-23T14:31:40.000Z","dependencies_parsed_at":"2025-04-07T05:31:20.302Z","dependency_job_id":null,"html_url":"https://github.com/Quantco/vectorization-tutorial","commit_stats":null,"previous_names":["quantco/vectorization-tutorial"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fvectorization-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fvectorization-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fvectorization-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fvectorization-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantco","download_url":"https://codeload.github.com/Quantco/vectorization-tutorial/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252216099,"owners_count":21713099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T00:28:22.068Z","updated_at":"2025-05-03T16:32:50.459Z","avatar_url":"https://github.com/Quantco.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# introduction to the vectorizing principle\n\nThis repo is used to illustrate the vectorization principle in a tutorial. In data science applications,\nlarge amounts of data are processed, and dynamically typed and interpreted languages like python, R \nor Matlab are often used to perform a job in just a few lines based on a highly dynamic library universe.\nThe heavy lifting is done within those libraries written in C, C++ or Fortran. Thus the user API to those libraries\nmust be based on handing around large amounts of data instead of single values. \nThis is the basis of the vectorization principle.\n\nThis tutorial was created for a CEOI workshop in August 2023 and might not be kept up-to-date.\n\nDisclaimer: The term vectorization is also used for talking about using SIMD based instruction level parallelism \nprovided by CPUs. Here, we talk about vectorization as a library design pattern for structural data transformation \ncode – applying operations to vectors instead of scalars.\n\n## Try it yourself in a GitHub Codespace (VS Code)\n\n[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/Quantco/vectorization-tutorial/)\n\nDefault options to run a small codespace are perfectly fine (free-tier is provided by GitHub).\n\nThe first time you enter the codespace, it takes some time to set up the environment and install all dependencies \n(the script doing that might only run ~1min after starting the codespace). \nPlease, take a coffee break and just wait...\n\nAfter environment is installed, it should be possible to open jupyter notebooks (*.ipynb files) in VS Code. \nInstall extensions in case VS Code is asking, and choose python environment `vectorization`.\n\nTo run code in the terminal, activate the environment via\n\n```bash\nsource .devcontainer/.bashrc\n```\n\n## Try it yourself on your local machine\n\n### Setting up the environment for running the python files and jupyter notebooks in this repository\n\nFollow https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html to download the micromamba executable. \nPlease put it in `~/bin/micromamba` or adjust the instructions below accordingly.\nThen run the following commands to create a new environment and install the required packages:\n\n```bash\nMICROMAMBA=~/bin/micromamba\neval \"$(\"$MICROMAMBA\" shell hook -s bash)\"\nmicromamba create -y -n vectorization -f conda-lock.yml\nmicromamba activate vectorization\n```\n\n## Table of Contents:\n\n- [vectorization01.ipynb](vectorization01.ipynb): moving the loop into the library\n- [vectorization02.ipynb](vectorization02.ipynb): vectorized translation of conditional statements\n- [vectorization03.ipynb](vectorization03.ipynb): a slightly more complex example\n- [vectorization04.ipynb](vectorization04.ipynb): defining a data pipeline\n- [vectorization05a.ipynb](vectorization05a.ipynb): generating SQL and dataframe transformation code with one syntax\n- [vectorization05b.ipynb](vectorization05b.ipynb): generating SQL and dataframe transformation code with one syntax - an example pipeline\n- [vectorization06.ipynb](vectorization06.ipynb): many ways to describe data transformations in python\n- [vectorization07.ipynb](vectorization07.ipynb): aggregation functions\n- [vectorization08.ipynb](vectorization08.ipynb): window functions\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantco%2Fvectorization-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantco%2Fvectorization-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantco%2Fvectorization-tutorial/lists"}