Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
https://github.com/renumics/spotlight
audio computer-vision data-centric-ai data-curation data-visualization exploratory-data-analysis hacktoberfest images machine-learning meshes timeseries unstructured-data video
Last synced: about 20 hours ago
JSON representation
Interactively explore unstructured datasets from your dataframe.
- Host: GitHub
- URL: https://github.com/renumics/spotlight
- Owner: Renumics
- License: mit
- Created: 2023-01-29T14:54:14.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-29T08:40:46.000Z (7 months ago)
- Last Synced: 2024-05-29T20:29:04.494Z (7 months ago)
- Topics: audio, computer-vision, data-centric-ai, data-curation, data-visualization, exploratory-data-analysis, hacktoberfest, images, machine-learning, meshes, timeseries, unstructured-data, video
- Language: TypeScript
- Homepage: https://renumics.com
- Size: 45.7 MB
- Stars: 1,020
- Watchers: 18
- Forks: 83
- Open Issues: 3
-
Metadata Files:
- Readme: README-PyPI.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-open-data-centric-ai - Renumics Spotlight - centric AI ecosystem. | ![GitHub stars](https://img.shields.io/github/stars/renumics/spotlight?style=social) | <a href="https://github.com/renumics/spotlight/blob/main/LICENSE"><img src="https://img.shields.io/github/license/renumics/spotlight" height="15"/></a> | (Visualization and Interaction)
README
# Renumics Spotlight
> Spotlight helps you to **identify critical data segments and model failure modes**. It enables you to build and maintain reliable machine learning models by **curating a high-quality datasets**.
## Introduction
Spotlight is built on the idea that you can only truly **understand unstructured datasets** if you can **interactively explore** them. Its core principle is to identify and fix critical data segments by leveraging **data enrichments** (e.g. features, embeddings, uncertainties). We are building Spotlight for cross-functional teams that want to be in **control of their data and data curation processes**. Currently, Spotlight supports many use cases based on image, audio, video and time series data.
## Quickstart
Get started by installing Spotlight and loading your first dataset.
#### What you'll need
- [Python](https://www.python.org/downloads/) version 3.8-3.12
#### Install Spotlight via [pip](https://packaging.python.org/en/latest/key_projects/#pip)
```bash
pip install renumics-spotlight
```> We recommend installing Spotlight and everything you need to work on your data in a separate [virtual environment](https://docs.python.org/3/tutorial/venv.html)
#### Load a dataset and start exploring
```python
import pandas as pd
from renumics import spotlightdf = pd.read_csv("https://spotlight.renumics.com/data/mnist/mnist-tiny.csv")
spotlight.show(df, dtype={"image": spotlight.Image, "embedding": spotlight.Embedding})
```> `pd.read_csv` loads a sample csv file as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
> `spotlight.show` opens up spotlight in the browser with the pandas dataframe ready for you to explore. The `dtype` argument specifies custom column types for the browser viewer.
#### Load a [Hugging Face](https://huggingface.co/) dataset
```python
import datasets
from renumics import spotlightdataset = datasets.load_dataset("olivierdehaene/xkcd", split="train")
df = dataset.to_pandas()
spotlight.show(df, dtype={"image_url": spotlight.Image})
```> The `datasets` package can be installed via pip.