{"id":13596150,"url":"https://github.com/Renumics/spotlight","last_synced_at":"2025-04-09T16:31:38.737Z","repository":{"id":70733293,"uuid":"594756461","full_name":"Renumics/spotlight","owner":"Renumics","description":"Interactively explore unstructured datasets from your dataframe.","archived":false,"fork":false,"pushed_at":"2024-05-29T08:40:46.000Z","size":47890,"stargazers_count":1020,"open_issues_count":3,"forks_count":83,"subscribers_count":18,"default_branch":"main","last_synced_at":"2024-05-29T20:29:04.494Z","etag":null,"topics":["audio","computer-vision","data-centric-ai","data-curation","data-visualization","exploratory-data-analysis","hacktoberfest","images","machine-learning","meshes","timeseries","unstructured-data","video"],"latest_commit_sha":null,"homepage":"https://renumics.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Renumics.png","metadata":{"files":{"readme":"README-PyPI.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-29T14:54:14.000Z","updated_at":"2024-06-07T12:02:24.575Z","dependencies_parsed_at":"2024-03-08T14:50:27.427Z","dependency_job_id":"a2e67e88-3c39-4534-859e-be6f0db96263","html_url":"https://github.com/Renumics/spotlight","commit_stats":{"total_commits":14,"total_committers":7,"mean_commits":2.0,"dds":0.7857142857142857,"last_synced_commit":"74ac7adf64520fe3b7a68c97a2f68eaea11aaff0"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renumics%2Fspotlight","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renumics%2Fspotlight/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renumics%2Fspotlight/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renumics%2Fspotlight/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Renumics","download_url":"https://codeload.github.com/Renumics/spotlight/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247550688,"owners_count":20956987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","computer-vision","data-centric-ai","data-curation","data-visualization","exploratory-data-analysis","hacktoberfest","images","machine-learning","meshes","timeseries","unstructured-data","video"],"created_at":"2024-08-01T16:02:10.184Z","updated_at":"2025-04-09T16:31:38.723Z","avatar_url":"https://github.com/Renumics.png","language":"TypeScript","funding_links":[],"categories":["TypeScript","Industry Strength Visualisation","⚖️ Evaluation"],"sub_categories":[],"readme":"# Renumics Spotlight\n\n\u003e Spotlight helps you to **identify critical data segments and model failure modes**. It enables you to build and maintain reliable machine learning models by **curating a high-quality datasets**.\n\n## Introduction\n\nSpotlight is built on the idea that you can only truly **understand unstructured datasets** if you can **interactively explore** them. Its core principle is to identify and fix critical data segments by leveraging **data enrichments** (e.g. features, embeddings, uncertainties). We are building Spotlight for cross-functional teams that want to be in **control of their data and data curation processes**. Currently, Spotlight supports many use cases based on image, audio, video and time series data.\n\n## Quickstart\n\nGet started by installing Spotlight and loading your first dataset.\n\n#### What you'll need\n\n-   [Python](https://www.python.org/downloads/) version 3.8-3.12\n\n#### Install Spotlight via [pip](https://packaging.python.org/en/latest/key_projects/#pip)\n\n```bash\npip install renumics-spotlight\n```\n\n\u003e We recommend installing Spotlight and everything you need to work on your data in a separate [virtual environment](https://docs.python.org/3/tutorial/venv.html)\n\n#### Load a dataset and start exploring\n\n```python\nimport pandas as pd\nfrom renumics import spotlight\n\ndf = pd.read_csv(\"https://spotlight.renumics.com/data/mnist/mnist-tiny.csv\")\nspotlight.show(df, dtype={\"image\": spotlight.Image, \"embedding\": spotlight.Embedding})\n```\n\n\u003e `pd.read_csv` loads a sample csv file as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).\n\n\u003e `spotlight.show` opens up spotlight in the browser with the pandas dataframe ready for you to explore. The `dtype` argument specifies custom column types for the browser viewer.\n\n#### Load a [Hugging Face](https://huggingface.co/) dataset\n\n```python\nimport datasets\nfrom renumics import spotlight\n\ndataset = datasets.load_dataset(\"olivierdehaene/xkcd\", split=\"train\")\ndf = dataset.to_pandas()\nspotlight.show(df, dtype={\"image_url\": spotlight.Image})\n```\n\n\u003e The `datasets` package can be installed via pip.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRenumics%2Fspotlight","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRenumics%2Fspotlight","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRenumics%2Fspotlight/lists"}