{"id":17087744,"url":"https://github.com/davidbau/baukit","last_synced_at":"2025-05-07T16:09:43.015Z","repository":{"id":47985011,"uuid":"459600155","full_name":"davidbau/baukit","owner":"davidbau","description":null,"archived":false,"fork":false,"pushed_at":"2024-02-22T14:24:02.000Z","size":119,"stargazers_count":182,"open_issues_count":4,"forks_count":14,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-01-03T07:19:59.269Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidbau.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-15T13:54:05.000Z","updated_at":"2025-01-02T14:12:07.000Z","dependencies_parsed_at":"2024-02-22T15:50:02.278Z","dependency_job_id":null,"html_url":"https://github.com/davidbau/baukit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbau%2Fbaukit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbau%2Fbaukit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbau%2Fbaukit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbau%2Fbaukit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidbau","download_url":"https://codeload.github.com/davidbau/baukit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233352332,"owners_count":18663271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T13:34:47.844Z","updated_at":"2025-01-10T13:34:48.199Z","avatar_url":"https://github.com/davidbau.png","language":"Python","funding_links":[],"categories":["Interpretability \u0026 Mechanistic Analysis","Mechanistic interpretability libraries"],"sub_categories":["Libraries \u0026 Frameworks"],"readme":"# baukit\n\nInstall using `pip install git+https://github.com/davidbau/baukit`.\n\nProvides the `baukit` package, a kit of David's secret tools to help\nwith productive research prototyping with pytorch.\n\nIncludes:\n * Methods for tracing and editing internal activations in a network.\n * Interactive UI widgets for quick data exploration in a notebook.\n * Online algorithms for computing running stats in pytorch.\n * Fast and feature-rich data set objects for images and text.\n * Utilities for simplifying the task of running many batch jobs.\n\nFull details can be found by reading the code.\nHere is a partial overview:\n\n## Trace library\n\n`Trace`, `TraceDict`, `subsequence`, `replace_module`; these simplify\nthe work of analyzing and altering internal computations of deep\nnetworks.  A short example of tracing a specific layer in `net`:\n\n```\nfrom baukit import Trace\nwith Trace(net, 'layer.name') as ret:\n    _ = net(inp)\n    representation = ret.output\n```\n\nRead the [nethook Trace source code](https://github.com/davidbau/baukit/blob/main/baukit/nethook.py) for more information.\n\n## Widget library\n\n`show` is a feature-rich alternative to Jupyter notebook `display`;\nit allows for quickly producing HTML layouts by arranging data and\nimages in nested python arrays, and it knows how to directly display\nPIL images, matplotlib figure objects, and interactive widgets.\nHTML elements, attributes, and CSS styles can be controlled with\nfunctions like `show.style(color='red')`.\n\n```\nfrom baukit import show\nshow([[show.style(color=c), c] for c in ['red', 'green', 'blue']])\n```\n\nThere is a [notebook here](https://github.com/davidbau/baukit/blob/main/notebooks/using_show_and_widgets.ipynb) that shows off ways to use `show()`.\n\n`show` works with a set of `Widget` subclasses such as, `Textbox`,\n`Numberbox`, `Range`, `Menu`, `PlotWidget`, `PaintWidget` that provide\ndata-bound reactive objects for quickly making interactive\nHTML visualizations that work in a Jupyter or Colab notebook.  For\nexample, instad of using `matplotlib` directly to just draw a picture\nof a plot, you can lay out interactive widget:\n\n```\nfrom baukit import PlotWidget, Range, show\nimport numpy\ndef how_to_draw_my_plot(fig, amp=1.0, freq=1.0):\n    [ax] = fig.axes\n    ax.clear()\n    x = numpy.linspace(0, 5, 100)\n    ax.plot(x, amp * numpy.sin(freq * x))\n\t\t\t\t\t\t   \nplot = PlotWidget(how_to_draw_my_plot, figsize=(5, 5))\nra = Range(min=0.0, max=2.0, step=0.1, value=plot.prop('amp'))\nrf = Range(min=0.1, max=20.0, step=0.1, value=plot.prop('freq'))\nshow([plot, [show.style(textAlign='right'), 'Amp', ra,\n             show.style(textAlign='right'),  'Freq', rf]])\n```\n\nThis code shows the plot in a layout with two sliders.  If you later\nexecute the code `plot.freq = 5.0`, the plot will update live, in-place,\nto show the new curve, and the freq slider will also move to 5.  And\nof course, dragging the slider will also change the values live.\n\nThe [labwidget source code](https://github.com/davidbau/baukit/blob/main/baukit/labwidget.py) has much more detail.\n\n## Online statistics library\n\n`Covariance`, `Mean`, `Quantile`, `TopK`, and other data summarization\nmethods are provided as online, gpu-optimized algorithms.\n\n```\nfrom baukit import Quantile, Topk, CombinedStat, tally\ncs = CombinedStat(\n    qc=Quantile(),\n    tk=TopK(),\n)\nds = MyDataset()\n# Loads from my_stats.npz if already computed.\nfor [batch] in tally(cs, ds, cache='my_stats.npz', batch_size=50):\n    batch.cuda()\n    # Assumes dim=0 is the sampling axis; stats are per dim=1 feature.\n    stat.add(batch)\ncs.to_('cpu')\nmedian = cs.qc.quantile(0.5)\ntop_values, top_indexes = cs.tk.topk(10)\n```\n\nThe [runningstats source code](https://github.com/davidbau/baukit/blob/main/baukit/runningstats.py) shows other things you can do.\n\n## Improved basic dataset objects\n\n`ImageFolderSet` is faster and provides more features than\npytorch `ImageFolder` including the ability to gather multiple\nstreams of parallel data tensors (such as segmentations and images).\n\n`TokenizedDataset` tokenizes text through a provided tokenizer,\nproducing dictionaries designed to feed directly into `huggingface`\nlanguage models.  It works with `length_collation` for creating\nuniform-length batches for fast training and inference.\n\n## Batch job utilities\n\n`pbar` is a more readable progress bar utility wrapper around `tqdm`\nthat simplifies the display of progress status strings during a\nlong progress operation; it also provides a way for a caller to\nslience progress output.\n\n`reserve_dir` reserves a directory for results of a job and grabs a lock\nso that other proceses running `reserve_dir` will not do the same job.\nThis allows very simple batch parallelism: just run many processes\nthat run all the jobs, and each job will only be done once.\n\n`WorkerPool` simplifies creation of worker threads for consuming output\ndata; this can dramatically speed up writing of many output files\nand is the output analogue of the torch DataLoader utility for inputs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidbau%2Fbaukit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidbau%2Fbaukit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidbau%2Fbaukit/lists"}