An open API service indexing awesome lists of open source software.

https://github.com/pak-app/ai-powered-code-snippet-search

A AI-Powered Code Snippet Search Engine: Explain and Find :mag_right:
https://github.com/pak-app/ai-powered-code-snippet-search

grpc mongoose nodejs typescript

Last synced: 3 months ago
JSON representation

A AI-Powered Code Snippet Search Engine: Explain and Find :mag_right:

Awesome Lists containing this project

README

          

# AI-Powered Code Snippet Search

This project is a powerful, AI-driven code snippet search engine. It allows users to store and search for code snippets using natural language descriptions, thanks to vector embeddings generated by a sentence-transformer model. The system is built with a microservices architecture, featuring a Node.js/Express API for client interactions and a Python gRPC service for AI-powered embedding generation.

## Project Status

**This project is currently under development and is not yet complete.**

- The `GET /api/snippets` route for searching snippets is not yet implemented and contains a bug.

## Features

- **Natural Language Search**: Find code snippets by describing what you're looking for, not just by keyword matching.
- **Store Code Snippets**: Save your most-used code snippets, complete with descriptions, language, and tags.
- **Microservices Architecture**: A scalable and maintainable design with separate services for the client-facing API and the AI embedding model.
- **gRPC Communication**: Efficient and high-performance communication between the Node.js and Python services.
- **MongoDB Database**: A flexible and scalable NoSQL database for storing code snippets and their vector embeddings.

## Architecture

The project is composed of two main microservices:

1. **Client API (`client-api`)**: A Node.js/Express application that provides a RESTful API for creating and searching for code snippets. It handles user requests, validates input, and communicates with the `embedding-api` to get vector embeddings for code descriptions. It then stores the snippets, along with their embeddings, in a MongoDB database.

2. **Embedding API (`embedding-api`)**: A Python application that exposes a gRPC service for generating vector embeddings. It uses the `sentence-transformers` library to convert text descriptions into high-dimensional vectors.

Here is a diagram illustrating the architecture:

```
+-----------------+ +-----------------+ +-------------------+
| Client |----->| Client API |----->| Embedding API |
| (e.g., Postman) | | (Node.js) | | (Python) |
+-----------------+ +-------+---------+ +-------------------+
|
|
v
+-----------------+
| MongoDB |
| Database |
+-----------------+
```

## Technologies

- **Client API**:
- Node.js
- Express.js
- TypeScript
- Mongoose (for MongoDB)
- gRPC
- Joi (for validation)

- **Embedding API**:
- Python
- gRPC
- `sentence-transformers`

- **Database**:
- MongoDB

- **API Specification**:
- Protobuf (for gRPC)

## Sentence Transformer Model

The `embedding-api` uses the `all-MiniLM-L6-v2` model from the `sentence-transformers` library to generate vector embeddings. This model is a high-performance, lightweight model that is well-suited for a variety of sentence and text embedding tasks.

- **Model**: `all-MiniLM-L6-v2`
- **Hugging Face URL**: [https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)

## Getting Started

### Prerequisites

- Node.js and npm
- Python and pip
- MongoDB

### Installation

1. **Clone the repository:**

```bash
git clone https://github.com/your-username/ai-powered-code-snippet-search.git
cd ai-powered-code-snippet-search
```

2. **Set up the Client API:**

```bash
cd client-api
npm install
```

3. **Set up the Embedding API:**

```bash
cd ../embedding-api
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### Running the Services

1. **Start the Embedding API:**

```bash
cd embedding-api
python server.py
```

The gRPC server will start on `localhost:50051`.

2. **Start the Client API:**

In a new terminal window:

```bash
cd client-api
npm start
```

The REST API will be available at `http://localhost:3000`.

## API Reference

### Create a Snippet

- **Endpoint**: `POST /api/snippets/create`
- **Description**: Creates a new code snippet.
- **Request Body**:

```json
{
"code": "console.log('Hello, World!');",
"language": "javascript",
"description": "A simple hello world program in JavaScript.",
"tags": ["hello-world", "javascript"]
}
```

- **Response**:

- **201 Created**: If the snippet is created successfully.
- **400 Bad Request**: If the request body is invalid.

### Search for a Snippet

- **Endpoint**: `GET /api/snippets`
- **Description**: Searches for code snippets based on a natural language query.
- **Note**: This endpoint is currently under development and is not functional.
- **Query Parameters**:

- `q` (string, required): The natural language search query.

- **Response**:

- **200 OK**: Returns an array of matching snippets.
- **400 Bad Request**: If the `q` query parameter is missing.

## gRPC Service (`embedding.proto`)

The `embedding-api` exposes a gRPC service for generating vector embeddings.

### Service Definition

```proto
service EmbedService {
rpc GenerateEmbedding (EmbedRequest) returns (EmbedResponse);
}
```

### Messages

```proto
message EmbedRequest {
string code = 1;
string language = 2;
string description = 3;
repeated string tags = 4;
}

message EmbedResponse {
repeated float embedding = 1;
}
```

## Project Structure

```
.
├── client-api
│ ├── src
│ │ ├── controller
│ │ ├── middlewares
│ │ ├── models
│ │ ├── routes
│ │ └── ...
│ ├── package.json
│ └── ...
├── embedding-api
│ ├── gRPCMethods
│ │ ├── embedding_pb2.py
│ │ └── embedding_pb2_grpc.py
│ ├── server.py
│ └── ...
├── protos
│ └── embedding.proto
└── README.md
```

## Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.