https://github.com/pacoccino/ai-mask

Serve local ML inference engines to web apps
https://github.com/pacoccino/ai-mask

ai artificial-intelligence chrome-extension machine-learning wasm webgpu

Last synced: 6 months ago
JSON representation

Serve local ML inference engines to web apps

Host: GitHub
URL: https://github.com/pacoccino/ai-mask
Owner: pacoccino
Created: 2024-02-22T21:34:48.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-09T08:42:49.000Z (over 1 year ago)
Last Synced: 2024-12-29T06:51:33.317Z (7 months ago)
Topics: ai, artificial-intelligence, chrome-extension, machine-learning, wasm, webgpu
Language: TypeScript
Homepage: https://chromewebstore.google.com/detail/lkfaajachdpegnlpikpdajccldcgfdde
Size: 510 KB
Stars: 23
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-local-ai - AI-Mask - Browser extension to provide model inference to web apps. Backed by web-llm and transformers.js (Inference UI)

README

        
# AI-Mask

  ![AI-Mask Logo](/packages/extension/icons/icon-128.png)

**Bring local inference into web apps !**

[Download extension](https://chromewebstore.google.com/detail/lkfaajachdpegnlpikpdajccldcgfdde) | [Supported apps](#supported-apps) | [Integration guide](#for-integrators) | [SDK Documentation](/packages/sdk)

[![npm version](https://badge.fury.io/js/@ai-mask%2Fsdk.svg)](https://badge.fury.io/js/@ai-mask%2Fsdk)

> This is an experimental project at an MVP stage. 

> **[Feedback](https://github.com/pacoccino/ai-mask/discussions) will be greatly appreciated for the future of this project.**

 

## What ?

AI-Mask is a chrome **web extension** that serves as a local provider to  **AI** models execution. It runs model **on-device** for web apps which needs it, for **free**, and with full-**privacy**. 

See it as the [Metamask](https://metamask.io/) of AI. 

**Try it !**

[Install the extension](https://pacoccino.github.io/ai-mask/), then open the [Chat app](https://chatbot.opac.me)

[AI-Mask Demo.mp4](https://github.com/pacoccino/ai-mask/assets/1371207/f75e8b27-c91a-4bc6-bd14-8eae0d68050f)

## Why ?

On-device AI inference is getting quite a traction recently. Most of our devices are already capable of executing machine learning models and software compatibility is ready.

Thanks to some [amazing](https://github.com/mlc-ai/web-llm) [libraries](https://github.com/xenova/transformers.js), running **machine learning** models **in the browser** has become ridiculously easy, accelerated with **WASM** and **WebGPU**. This means they'll work and run nearly at **full-performance** on virtually **any device**, hardware and operating system.

**But** State-of-the-art web inference libraries store models in the browser cache, which have been, for security reason, [domain partitionned](https://developer.chrome.com/blog/http-cache-partitioning). This means that if multiple web apps use the same models, it needs to be downloaded once per domain, which can use a **lot of disk space**.

With this extension, the models are **cached only once** and served to the websites conveniently though an **unified SDK**.

## Future

This is a test to see if it's interesting and getting traction from users and app developers.

Another major feature planned is also to proxy requests to OpenAI-like APIs. Users would store their API keys in the extension, and apps would query the extension to run models.

This would solve:

- Users won't have to share API keys with non-trusted apps anymore

- Users won't share private data with apps

- App developers won't need to have a backend server which proxies API request to alleviate CORS issues and manipulate responses

## Supported Apps

Web apps that are compatible with this extension for local inference:

 

- [Demo App](https://pacoccino.github.io/ai-mask/) ([code](/examples/demo-app/))

- [chatbot-ui](https://chatbot.opac.me) ([code](https://github.com/mckaywrigley/chatbot-ui/pull/1590))

- [fully-local-pdf-chatbot](https://fully-local-pdf-chatbot-topaz.vercel.app/) ([code](https://github.com/jacoblee93/fully-local-pdf-chatbot/pull/19))

## Usage

### For Users

Enjoy **free** and **private** execution of AI models !

Do not pay for using models again, do not leak private data, and do not give your API keys to third-party apps. 

**How To:**

1. [Install the extension](https://chromewebstore.google.com/detail/lkfaajachdpegnlpikpdajccldcgfdde)

2. Use a [Supported app](#supported-apps)

### For Integrators

Easily support AI-Mask in your AI apps, and bring free and private local-inference to your users ! Do not store API keys again, and get rid of your backend and server costs. 

**Quick Start**

Install package:

```shell

npm install -S @ai-mask/sdk

```

Run inference:

```typescript

import { AIMaskClient } from '@ai-mask'

const messages = [{ role: 'user', content: 'What is the capital of France ? ' }]

const aiMaskClient = new AIMaskClient()

const response = await aiMaskClient.chat(

	{ messages },

  { modelId: 'gemma-2b-it-q4f32_1' },

)

```

For full reference, see [AI-Mask SDK Documentation](/packages/sdk)

You can see the [demo app code](/examples/demo-app/) and an [example pull request](https://github.com/pacoccino/chatbot-ui/pull/1/files) to see how it's easy to integrate into existing apps

**Note**: App users must have the extension installed for this to work. 

## Technology

AI-Mask is a ManifestV3 extension, heavily relying on the work of third party libraries to execute model inference:

  

- [web-llm](https://github.com/mlc-ai/web-llm) Inference with WASM/WebGPU via Apache TVM

- [transformers.js](https://github.com/mlc-ai/web-llm) Inference with WASM via ONNX Runtime

Issues with service workers:

- [WebGPU is not exposed to service workers](https://github.com/gpuweb/gpuweb/issues/4197) 

- For [some reasons](https://github.com/xenova/transformers.js/pull/462), transformers.js can only run monothreaded in service workers

To solve these issues, the engines runs in an offscreen document

## Contribute

### Developpement

Requirements:

- Node 18+

- pnpm 8+ (for monorepo workspace management)

#### Start development server for all packages (sdk/extension/demo-app)

```

pnpm dev

```

#### Typecheck and build for production

```

pnpm build

```

## Roadmap

- [X] Documentation

- [x] Deploy demo app

- [x] Deploy extension

- [x] SDK Working in web workers

- [x] ReadableStream option

- [ ] Bring back computation in service worker from chrome 124 thx to webgpu support

- [ ] Proxy OpenAI-like API requests and store user keys

- [ ] Create Langchain community libs

- [ ] Interrupts

- [ ] Include react Hooks/utilities in SDK

- [ ] Pull request in one popular AI app

- [ ] Implement more tasks

- [ ] Add more models

- [ ] Unload model from memory after being inactive for a while

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pacoccino/ai-mask

Awesome Lists containing this project

README