https://github.com/hazyresearch/meerkat

Creative interactive views of any dataset.
https://github.com/hazyresearch/meerkat

data-science foundation-models machine-learning ml pandas

Last synced: 4 days ago
JSON representation

Creative interactive views of any dataset.

Host: GitHub
URL: https://github.com/hazyresearch/meerkat
Owner: HazyResearch
License: apache-2.0
Created: 2021-05-07T00:26:35.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-12-24T15:06:30.000Z (5 months ago)
Last Synced: 2025-05-15T02:09:29.131Z (4 days ago)
Topics: data-science, foundation-models, machine-learning, ml, pandas
Language: Python
Homepage:
Size: 66.5 MB
Stars: 839
Watchers: 13
Forks: 43
Open Issues: 11
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md

Awesome Lists containing this project

README

        


    

---

[![GitHub](https://img.shields.io/github/license/HazyResearch/meerkat)](https://img.shields.io/github/license/HazyResearch/meerkat)

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)

Create interactive views of any dataset.

[**Website**](http://meerkat.wiki)

| [**Quickstart**](http://meerkat.wiki/docs/start/quickstart-df.html)

| [**Docs**](http://meerkat.wiki/docs/index.html)

| [**Contributing**](CONTRIBUTING.md)

| [**Discord**](https://discord.gg/pw8E4Q26Tq)

| [**Blogpost**](https://hazyresearch.stanford.edu/blog/2023-03-01-meerkat)



## ⚡️ Quickstart

```bash

pip install meerkat-ml

```

**Next Steps**.

Check out our [Getting Started page](http://meerkat.wiki/docs/start/quickstart-df.html) and our [documentation](http://meerkat.wiki/docs/index.html) to start building with Meerkat.

## Why Meerkat?

Meerkat is an open-source Python library that helps users visualize, explore, and annotate any dataset. It is especially useful when processing unstructured data types (_e.g._ free text, PDFs, images, video) with machine learning models. 

### ✏️ Features and Design Principles

Here are four principles that inform Meerkat's design.

**(1) Low overhead.**  With four lines of Python, start interacting with any dataset. 

- Zero-copy integrations with your preferred data abstractions: Pandas, Arrow, HF Datasets, Ibis, SQL.

- Limited data movement. With Meerkat, you interact with your data where it already lives: no uploads to external databases and no reformatting.

```python

import meerkat as mk

df = mk.from_csv("paintings.csv")

df["image"] = mk.files("image_url")

df

```



  



**(2) Diverse data types.** Visualize and annotate almost any data type in Meerkat interfaces: text, images, audio, video, MRI scans, PDFs, HTML, JSON. 



	



**(3) "Intelligent" user interfaces.** Meerkat makes it easy to embed **machine learning models** (e.g. LLMs) within user interfaces to enable intelligent functionality such as searching, grouping and autocomplete. 

```python

df["embedding"] = mk.embed(df["img"], engine="clip")

match = mk.gui.Match(df,

	against="embedding",

	engine="clip"

)

sorted_df = mk.sort(df,

	by=match.criterion.name,

	ascending=False

)

gallery = mk.gui.Gallery(sorted_df)

mk.gui.html.div([match, gallery])

```



	



**(4) Declarative (think: Seaborn), but also infinitely customizable and composable.**

Meerkat visualization components can be composed and customized to create new interfaces. 

```python

plot = mk.gui.plotly.Scatter(df=plot_df, x="umap_1", y="umap_2",)

@mk.gui.reactive

def filter(selected: list, df: mk.DataFrame):

    return df[df.primary_key.isin(selected)]

filtered_df = filter(plot.selected, plot_df)

table = mk.gui.Table(filtered_df, classes="h-full")

mk.gui.html.flex([plot, table], classes="h-[600px]") 

```



	



### ✨ Use cases where Meerkat shines

- _Exploratory analysis over unstructured data types._ [Demo](https://www.youtube.com/watch?v=a8FBT33QACQ)

- _Spot-checking the behavior of large language models (e.g. GPT-3)._  [Demo](https://www.youtube.com/watch?v=3ItA70qoe-o)

- _Identifying systematic errors made by machine learning models._ [Demo](https://youtu.be/4Kk_LZbNWNs)

- _Rapid labeling of validation data._

### 🤔 Use cases where Meerkat may not be the right fit

- _Are you only working with structured data (e.g. numerical and categorical variables)?_ Popular data visualization libraries (_e.g._ [Seaborn](https://seaborn.pydata.org/), [Matplotlib](https://matplotlib.org/)) are often sufficient. If you're looking for interactivity, [Plotly](https://plotly.com/) and [Streamlit](https://streamlit.io/) work well with structured data. Meerkat is differentiated in how it visualizes unstructured data types: long-form text, PDFs, HTML, images, video, audio...  

- _Are you trying to make a straightforward demo of a machine learning model (single input/output, chatbot) and share with the world?_ [Gradio](https://gradio.app/) is likely a better fit! Though, if your demo involves visualizing lots of data, you may find Meerkat useful.

- _Are you trying to manually label tens of thousands of data points?_  If you are looking for a data labeling tool to use with a labeling team, there are great open source labeling solutions designed for this (_e.g._ [LabelStudio](https://labelstud.io/)). In contrast, Meerkat is great fit for teams/individuals without access to a large labeling workforce who are using pretrained models (_e.g._ GPT-3) and need to label validation data or in-context examples.

## ✉️ About

Meerkat is being built by Machine Learning PhD students in the [Hazy Research](https://hazyresearch.stanford.edu) lab at Stanford. We're excited to build for a future where models will make it easier for teams to sift and reason through large volumes of unstructtured data effortlessly. 

Please reach out to `kgoel [at] cs [dot] stanford [dot] edu, eyuboglu [at] stanford [dot] edu, and arjundd [at] stanford [dot] edu` if you would like to use Meerkat for a project, at your company or if you have any questions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hazyresearch/meerkat

Awesome Lists containing this project

README