https://github.com/privetin/dataset-viewer

MCP server for interacting with Hugging Face dataset viewer API, providing dataset browsing, filtering, and statistics capabilities
https://github.com/privetin/dataset-viewer

Last synced: 8 months ago
JSON representation

MCP server for interacting with Hugging Face dataset viewer API, providing dataset browsing, filtering, and statistics capabilities

Host: GitHub
URL: https://github.com/privetin/dataset-viewer
Owner: privetin
Created: 2025-01-02T07:55:29.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-01-02T08:29:19.000Z (11 months ago)
Last Synced: 2025-01-02T09:19:22.899Z (11 months ago)
Language: Python
Size: 8.79 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-mcp-zh - Dataset Viewer
Awesome-MCP-Servers-directory - Dataset Viewer - Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export (Data Science Tools)
metorial-index - Dataset Viewer - Interact with Hugging Face datasets by browsing and analyzing their contents, including configurations and splits. Supports searching, filtering, and retrieving dataset statistics, along with authentication for private datasets. (APIs and HTTP Requests)
toolsdk-mcp-registry - ❌ dataset-viewer
awesome-mcp-servers - Dataset Viewer - MCP server for interacting with Hugging Face dataset viewer API, providing dataset browsing, filtering, and statistics capabilities (Table of Contents / Data Science Tools)
awesome-mcp-servers - **Dataset Viewer** - Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export `http` `git` `github` (📦 Other)

README

          # Dataset Viewer MCP Server

An MCP server for interacting with the [Hugging Face Dataset Viewer API](https://huggingface.co/docs/dataset-viewer), providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.

## Features

### Resources

- Uses `dataset://` URI scheme for accessing Hugging Face datasets

- Supports dataset configurations and splits

- Provides paginated access to dataset contents

- Handles authentication for private datasets

- Supports searching and filtering dataset contents

- Provides dataset statistics and analysis

### Tools

The server provides the following tools:

1. **validate**

   - Check if a dataset exists and is accessible

   - Parameters:

     - `dataset`: Dataset identifier (e.g. 'stanfordnlp/imdb')

     - `auth_token` (optional): For private datasets

2. **get_info**

   - Get detailed information about a dataset

   - Parameters:

     - `dataset`: Dataset identifier

     - `auth_token` (optional): For private datasets

3. **get_rows**

   - Get paginated contents of a dataset

   - Parameters:

     - `dataset`: Dataset identifier

     - `config`: Configuration name

     - `split`: Split name

     - `page` (optional): Page number (0-based)

     - `auth_token` (optional): For private datasets

4. **get_first_rows**

   - Get first rows from a dataset split

   - Parameters:

     - `dataset`: Dataset identifier

     - `config`: Configuration name

     - `split`: Split name

     - `auth_token` (optional): For private datasets

5. **get_statistics**

   - Get statistics about a dataset split

   - Parameters:

     - `dataset`: Dataset identifier

     - `config`: Configuration name

     - `split`: Split name

     - `auth_token` (optional): For private datasets

6. **search_dataset**

   - Search for text within a dataset

   - Parameters:

     - `dataset`: Dataset identifier

     - `config`: Configuration name

     - `split`: Split name

     - `query`: Text to search for

     - `auth_token` (optional): For private datasets

7. **filter**

   - Filter rows using SQL-like conditions

   - Parameters:

     - `dataset`: Dataset identifier

     - `config`: Configuration name

     - `split`: Split name

     - `where`: SQL WHERE clause (e.g. "score > 0.5")

     - `orderby` (optional): SQL ORDER BY clause

     - `page` (optional): Page number (0-based)

     - `auth_token` (optional): For private datasets

8. **get_parquet**

   - Download entire dataset in Parquet format

   - Parameters:

     - `dataset`: Dataset identifier

     - `auth_token` (optional): For private datasets

## Installation

### Prerequisites

- Python 3.12 or higher

- [uv](https://github.com/astral-sh/uv) - Fast Python package installer and resolver

### Setup

1. Clone the repository:

```bash

git clone https://github.com/privetin/dataset-viewer.git

cd dataset-viewer

```

2. Create a virtual environment and install:

```bash

# Create virtual environment

uv venv

# Activate virtual environment

# On Unix:

source .venv/bin/activate

# On Windows:

.venv\Scripts\activate

# Install in development mode

uv add -e .

```

## Configuration

### Environment Variables

- `HUGGINGFACE_TOKEN`: Your Hugging Face API token for accessing private datasets

### Claude Desktop Integration

Add the following to your Claude Desktop config file:

On Windows: `%APPDATA%\Claude\claude_desktop_config.json`

On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`

```json

{

  "mcpServers": {

    "dataset-viewer": {

      "command": "uv",

      "args": [

        "run",

        "dataset-viewer"

      ]

    }

  }

}

```

## Usage Examples

1. Validate a dataset:

```json

{

  "dataset": "stanfordnlp/imdb"

}

```

2. Get dataset information:

```json

{

  "dataset": "stanfordnlp/imdb"

}

```

3. Search dataset contents:

```json

{

  "dataset": "stanfordnlp/imdb",

  "config": "plain_text",

  "split": "train",

  "query": "great movie"

}

```

4. Filter and sort rows:

```json

{

  "dataset": "stanfordnlp/imdb",

  "config": "plain_text",

  "split": "train",

  "where": "label = 'positive'",

  "orderby": "text DESC",

  "page": 0

}

```

5. Get dataset statistics:

```json

{

  "dataset": "stanfordnlp/imdb",

  "config": "plain_text",

  "split": "train"

}

```

## License

MIT License - see [LICENSE](LICENSE) for details

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/privetin/dataset-viewer

Awesome Lists containing this project

README