{"id":17030251,"url":"https://github.com/vsoch/codeart","last_synced_at":"2026-03-11T13:13:13.394Z","repository":{"id":62563272,"uuid":"231155855","full_name":"vsoch/codeart","owner":"vsoch","description":"data visualization for code (experimental project)","archived":false,"fork":false,"pushed_at":"2020-01-09T16:14:28.000Z","size":23237,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-26T07:11:16.374Z","etag":null,"topics":["codeart","word2vec"],"latest_commit_sha":null,"homepage":"https://github.com/vsoch/codeart-examples","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vsoch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-31T23:18:22.000Z","updated_at":"2021-10-26T22:49:36.000Z","dependencies_parsed_at":"2022-11-03T15:45:19.929Z","dependency_job_id":null,"html_url":"https://github.com/vsoch/codeart","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fcodeart","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fcodeart/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fcodeart/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fcodeart/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vsoch","download_url":"https://codeload.github.com/vsoch/codeart/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248119285,"owners_count":21050755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codeart","word2vec"],"created_at":"2024-10-14T08:05:26.686Z","updated_at":"2026-03-11T13:13:08.372Z","avatar_url":"https://github.com/vsoch.png","language":"Python","readme":"# Code Art\n\n[![PyPI version](https://badge.fury.io/py/codeart.svg)](https://pypi.org/project/codeart/)\n[![GitHub actions status](https://github.com/vsoch/codeart/workflows/ci/badge.svg?branch=master)](https://github.com/vsoch/codeart/actions?query=branch%3Amaster+workflow%3Aci)\n\nI wanted a way to culminate the last decade of programming that I've done, and realized\nit would be much more fun to create a tool that others could use as well. Specifically, I want to be able to:\n\n 1. Have a class that parses over repos on my local machine, and organizes files based on extension..\n 2. Builds a word2vec model, one per extension, the idea being that each language has it's own model. The word2vec model should have three dimensions so that we can map it to an RGB colorspace. This will mean that the embeddings for each character, along with being unique for the language, will also have a unique color. We should also be able to build one model across extensions, and map each extension to it.\n 3. At the end, I should be able to visualize any particular script (or some other graphic) using these embeddings. I'd like to add in some variable / dimension to represent age / date too.\n\n## Progress\n\nWhat I have so far is an ability to do the above, and then to produce a tiny image for\neach code file, where the words are colored by their word2vec (3 dimensional, RGB) embedding. Here\nis a small example of a Python script from spack:\n\n![spack-repos-builtin-packages-hsakmt-package.py.png](https://vsoch.github.io/codeart/examples/spack/images/tmp-mpcq8hau-var-spack-repos-builtin-packages-hsakmt-package.py.png)\n\nAnd here is a randomly placed grid with all python files from the spack repository:\n\n![examples/img/spack.png](examples/img/spack.png)\n\nAnd see [Example 5: Generate a Color Map](#example-5-generate-a-color-map) for a more organized rendering of just the color space.\nWhat I'm working towards is being able to derive and organize the above images as well,\npossibly based on an average color or similar.\n\n## Usage\n\n### Install\n\nYou can install from pypi\n\n```bash\npip install codeart\n```\n\nor install from the repository directly:\n\n```bash\n$ git clone https://github.com/vsoch/codeart\n$ python setup.py install\n```\n\nThe following examples are also provided in the [codeart examples](https://github.com/vsoch/codeart-examples) repository.\n\n### Example 1: Galleries and Interactive Visualizations\n\n#### Generate an Abstract Gallery\n\nYou can generate a web gallery for a root folder (including all files beneath it) or\na repository:\n\n```python\nfrom codeart.main import CodeBase                                       \ncode = CodeBase()                                                       \n\n# How to add a folder\ncode.add_folder('/home/vanessa/code')  \n\n# How to add a repository\ncode.add_repo(\"https://github.com/spack/spack\")\n\n# See languages with \u003e100 files\ncode.threshold_files(100)\n\n# Generate a web report (one page per language above this threshold)\ngallery = code.make_gallery(groups=['', '.py', '.patch']) \nTraining model with groups |.py|.patch\nGenerating web output for ''\nGenerating web output for '.py'\nGenerating web output for '.patch'\nFinished web files are in /tmp/code-art-xp73v5ji\n```\n\nAnd then the example files for each of:\n\n - [all](https://vsoch.github.io/codeart-examples/parse_repo/codeartall.html)\n - [python](https://vsoch.github.io/codeart-examples/parse_repo/codeart.py.html)\n - [patch](https://vsoch.github.io/codeart-examples/parse_repo/codeart.patch.html)\n - [empty space](https://vsoch.github.io/codeart-examples/parse_repo/codeart.html)\n\nYou can also browse raw images [here](https://github.com/vsoch/codeart-examples/tree/master/parse_repo/images).\n\n### Generate RGB Vectors\n\nIf you want to generate RGB vectors for a code base, you can use these\nfor your own machine learning projects. Here is how to do that\nfor a repository.\n\n```python\nfrom codeart.main import CodeBase\n\ncode = CodeBase()\ncode.add_repo(\"https://github.com/spack/spack\")\n```\n\nThe codebase will have codefiles added, a lookup in code.codefiles for each\nextension found. You'll want to take a look at these and choose some subset above\na given threshold.\n\n```python\n# Look at extractors, one added per extension\ncode.codefiles\n\n# Find those with \u003e100 files\ncode.threshold_files(thresh=100)\n\n# {'': [codeart-files:115],\n# '.py': [codeart-files:4227],\n# '.patch': [codeart-files:531]}\n```\n\nAnd then train a word2vec model, one for each group / extension, with size of 3\nso that we can map to the RGB colorspace.\n\n```python\ncode.train(groups=['.py', '', '.patch'])\n```\n\nYou can also train a single model for those extensions\n\n```python\ncode.train_all(groups=['.py', '', '.patch'])\n```\n\nOr train using all groups.\n\n```python\ncode.train_all()\n```\n\nWe now have a model for each extension (and all)\n\n```python\ncode.models\ncode.models['all']\n```\n\nHere is how to get a panda frame for a particular extension (or all)\n\n```python\nvectors = code.get_vectors(\".py\")\nvectors = code.get_vectors(\"all\")\n```\n\n#### Generate Interactive Colormap\n\nLet's say that we extracted the spack codebase (shown above) and then\nwanted to visualize the vectors. What do I mean? I want to see how the\nterms that are part of the embeddings model (each corresponding to\na word/term found in a code file with some extension) correspond to each\nextension. To do this, I can extract counts and vectors:\n\n```python\nfrom codeart.graphics import generate_interactive_colormap\n\nvectors = code.get_vectors('all')\ncounts = code.get_color_percentages(groups=list(code.codefiles.keys()), vectors=vectors)\n\n# Save if desired!\nvectors.to_csv(\"spack-colormap-vectors.csv\")\ncounts.to_csv(\"spack-color-percentages.csv\")\n\ntmpdir = generate_interactive_colormap(vectors=vectors, counts=counts, width=1000)\n```\n\nThe early interactive version is [here](https://vsoch.github.io/codeart-examples/parse_repo/web/)\n\n![examples/codeart.png](examples/codeart.png)\n\nand this was updated to be better sorted, seen [here](https://vsoch.github.io/codeart-examples/parse_repo/sorted/).\n\n![examples/sorted.png](examples/sorted.png)\n\nBasically, each term in the model is represented with its color, and the color transparency\nis based on the prevalence of each term for any given extension. You can click on a\ndifferent extension to see the colors change.\n\n\nFor the second plot (sorted) since the color RGB values are sorted based on similarity,\nyou can also deduce that similar colors indicate similar terms, which indicates similar\ncontext in the code files.\n\n### Example 2: Generate a Plot\n\nYou might just want to get vectors and plot the RGB space. You can do:\n\n```python\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\n\nfig = plt.figure()\nax = Axes3D(fig)\nax.scatter(vectors[0].tolist(), vectors[1].tolist(), vectors[2].tolist(), c=vectors.to_numpy()/255)\n\n# Optionally add text (a bit crowded)\nfor row in vectors.iterrows():\n    ax.text(row[1][0], row[1][1], row[1][2], row[0])\n\nplt.show()\nplt.savefig(\"img/spack-python.png\")\n```\n\n### Example 3: Generate Code Images\n\nIf you want to generate raw images for some extension, you can do that too.\nThese can be used in some image processing project, to generate other images,\nor some other kind of clustering analysis.\n\n```python\n# Generate images for all files\nif not os.path.exists('images'):\n    os.mkdir('images')\n\n# Create folder of code images (if you want to work with them directly)\ncode.make_art(extension=\".py\", outdir='images', vectors=vectors)\n```\n\n### Example 4: Generate a Color Lookup Grid\n\nI haven't developed this in detail, but you can also generate a grid of colors\nfor a given image. The idea here would be to have a grid that\ncan be used as a legend. Here we again load in the spack code base,\nand generate a lookup for the \".py\" (python) extension.\n\n```python\nfrom codeart.main import CodeBase\nfrom codeart.graphics import save_vectors_gradient_grid\ncode = CodeBase()                      \ncode.add_repo(\"https://github.com/spack/spack\")\nvectors = code.get_vectors(\".py\")\nsave_vectors_gradient_grid(vectors=vectors, outfile='spack-image-gradient.png') \n```\n\nAn example gradient image is the following:\n\n![spack-image-gradient.png](https://raw.githubusercontent.com/vsoch/codeart-examples/master/parse_repo/spack-image-gradient.png)\n\nAnd a much larger one (you'll need to click and zoom in):\n\n![examples/img/colors-gradient.png](https://raw.githubusercontent.com/vsoch/codeart-examples/master/parse_folders/colors-gradient.png)\n\nwhich is generated from [this data](https://github.com/vsoch/codeart-examples/blob/master/parse_folders/vectors-gradients.tsv) that I created by building a model across all the Python code on my computer (note there are many different\nfile extensions in the model beyond Python!).\n\nYou can of course adjust the dimensions, the row height, and the column width\nand font size depending on your needs or matrix size.\n\n### Example 5: Generate a Color Map\n\nYou can [follow this notebook](https://github.com/vsoch/codeart-examples/blob/master/derive_colormap/derive_colormap.ipynb)\nto see how you might generate a colormap. For example, here is an entire colormap for my Python (and associated files)\ncode base, without altering opacity, but just plotting the colors (these are the colors\nthat align with the color lookup grid above).\n\n![colormap-3d.png](https://raw.githubusercontent.com/vsoch/codeart-examples/master/derive_colormap/colormap-3d.png)\n\nAnd if we do dimensionality reduction, we can plot this in 2d:\n\n![colormap-2d.png](https://raw.githubusercontent.com/vsoch/codeart-examples/master/parse_by_year/colormap-2d.png)\n\nAnd finally, we can assess counts for any particular extension across\nthe codebase to derive images that use transparency to show the prevalence of any given term.\nHere is for .yml and .rst (restructured syntax) files.\n\n![colormap-yaml.png](https://raw.githubusercontent.com/vsoch/codeart-examples/master/derive_colormap/colormap-yml.png)\n\nWe can also generate an animation to loop through each language.\n\n![colormap-groups.gif](https://raw.githubusercontent.com/vsoch/codeart-examples/master/parse_by_year/colormap-groups.gif)\n\nBut note that this data would benefit greatly from interactive visualization,\nakin to the [interactive colormap example](#generate-interactive-colormap)\nexample mentioned previously.\n\n### Example 6: Parse Folders by a Custom Function\n\nYou might want to organize groups in some logical way that goes beyond an extension,\nor just using all files. For this reason, each of the functions add_folder, and add_repo\ncan be given a custom \"func\" that will return a group name based on a file.\nFor example, from [utils](codeart/utils.py) we can import a function that will\ngroup files by year:\n\n```python\ndef group_by_year_created(filename):\n    \"\"\"Given a filename, return the year it was created.\n       This function is called after testing the file for read access. \n    \"\"\"\n    stat = os.stat(filename)\n    return datetime.fromtimestamp(stat.st_ctime).year\n```\n\nAnd then when adding a folder or repo, we might do:\n\n```python\ncode.add_folder(folder, func=group_by_year_created)\ncode.add_repo(folder, func=group_by_year_created)\n```\n\nYou can also just provide a string or group directly (takes preference over func):\n\n```python\ncode.add_repo(repo, group=\"2017\")\n```\n\nThe function you provide should take the file name as the main argument.\nA more complex example (parsing GitHub repos by creation date using the GitHub\nAPI) is provided at [parse_by_year.py](https://github.com/vsoch/codeart-examples/blob/master/parse_by_year/parse_by_year.py).\n\n## Example 7: Generate CodeArt Text\n\nA simple use case is to, given a repository or folder, generate a text image using it.\n\n```python\nfrom codeart.graphics import generate_codeart_text\nfrom codeart.colors import generate_color_lookup\n\n# either works here\ncode.add_repo(\"https://github.com/vsoch/codeart\")\ncode.add_folder(\"/path/to/folder\")\n\nimages = code.make_art(group=\"all\", outdir=os.getcwd())\nimages = glob(\"%s/*\" % os.path.join(os.getcwd(), \"images\"))\ncolor_lookup = generate_color_lookup(images)\n\n# Generate an image with text (dinosaur!)\ngenerate_codeart_text(\"dinosaur\", color_lookup, outfile=\"index.html\")\n```\n\nA similar function (not as well used / developed) is to generate a mapping\nto an image.\n\n```python\nfrom codeart.graphics import generate_codeart\ngenerate_codeart('sunset.jpg', color_lookup, sample=10, top=100, outfile=\"index.html\")\n```\n\nFor more details on text generation, see the\n\n\n## Example 8: Basic Client Usage\n\nFor the text generation, I've created a simple client that will drive the interaction:\n\n```bash\n\u003e codeart --help\nusage: codeart [-h] [--version] {textart} ...\n\nCode Art Generator\n\noptional arguments:\n  -h, --help  show this help message and exit\n  --version   print the version and exit.\n\nactions:\n  actions for Code Art generator\n\n  {textart}   codeart actions\n    textart   extract images from GitHub or local file system.\n```\n\nCurrently the only simple command is to generate a code art text graphic. \nBe aware that this generates one image per code file, and should be done\nfor small code bases, or larger ones with caution.\n\n```bash\ncodeart textart --help\nusage: codeart textart [-h] [--github GITHUB] [--root ROOT] [--outdir OUTDIR]\n                       [--text TEXT]\n\noptional arguments:\n  -h, --help       show this help message and exit\n  --github GITHUB  GitHub username to download repositories for.\n  --root ROOT      root directory to parse for files.\n  --outdir OUTDIR  output directory to extract images (defaults to temporary\n                   directory)\n```\n\nFor an example, you can see the .github actions workflow in [codeart-examples](https://github.com/vsoch/codeart-examples).\n\n\nDo you have a question? Or want to suggest a feature to make it better?\nPlease [open an issue!](https://www.github.com/vsoch/codeart)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fcodeart","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvsoch%2Fcodeart","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fcodeart/lists"}