https://github.com/pak-app/ai-powered-code-snippet-search
A AI-Powered Code Snippet Search Engine: Explain and Find :mag_right:
https://github.com/pak-app/ai-powered-code-snippet-search
grpc mongoose nodejs typescript
Last synced: 3 months ago
JSON representation
A AI-Powered Code Snippet Search Engine: Explain and Find :mag_right:
- Host: GitHub
- URL: https://github.com/pak-app/ai-powered-code-snippet-search
- Owner: pak-app
- License: mit
- Created: 2025-07-27T15:25:22.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-09-06T11:58:17.000Z (10 months ago)
- Last Synced: 2025-09-06T13:23:04.538Z (10 months ago)
- Topics: grpc, mongoose, nodejs, typescript
- Language: TypeScript
- Homepage:
- Size: 167 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI-Powered Code Snippet Search
This project is a powerful, AI-driven code snippet search engine. It allows users to store and search for code snippets using natural language descriptions, thanks to vector embeddings generated by a sentence-transformer model. The system is built with a microservices architecture, featuring a Node.js/Express API for client interactions and a Python gRPC service for AI-powered embedding generation.
## Project Status
**This project is currently under development and is not yet complete.**
- The `GET /api/snippets` route for searching snippets is not yet implemented and contains a bug.
## Features
- **Natural Language Search**: Find code snippets by describing what you're looking for, not just by keyword matching.
- **Store Code Snippets**: Save your most-used code snippets, complete with descriptions, language, and tags.
- **Microservices Architecture**: A scalable and maintainable design with separate services for the client-facing API and the AI embedding model.
- **gRPC Communication**: Efficient and high-performance communication between the Node.js and Python services.
- **MongoDB Database**: A flexible and scalable NoSQL database for storing code snippets and their vector embeddings.
## Architecture
The project is composed of two main microservices:
1. **Client API (`client-api`)**: A Node.js/Express application that provides a RESTful API for creating and searching for code snippets. It handles user requests, validates input, and communicates with the `embedding-api` to get vector embeddings for code descriptions. It then stores the snippets, along with their embeddings, in a MongoDB database.
2. **Embedding API (`embedding-api`)**: A Python application that exposes a gRPC service for generating vector embeddings. It uses the `sentence-transformers` library to convert text descriptions into high-dimensional vectors.
Here is a diagram illustrating the architecture:
```
+-----------------+ +-----------------+ +-------------------+
| Client |----->| Client API |----->| Embedding API |
| (e.g., Postman) | | (Node.js) | | (Python) |
+-----------------+ +-------+---------+ +-------------------+
|
|
v
+-----------------+
| MongoDB |
| Database |
+-----------------+
```
## Technologies
- **Client API**:
- Node.js
- Express.js
- TypeScript
- Mongoose (for MongoDB)
- gRPC
- Joi (for validation)
- **Embedding API**:
- Python
- gRPC
- `sentence-transformers`
- **Database**:
- MongoDB
- **API Specification**:
- Protobuf (for gRPC)
## Sentence Transformer Model
The `embedding-api` uses the `all-MiniLM-L6-v2` model from the `sentence-transformers` library to generate vector embeddings. This model is a high-performance, lightweight model that is well-suited for a variety of sentence and text embedding tasks.
- **Model**: `all-MiniLM-L6-v2`
- **Hugging Face URL**: [https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
## Getting Started
### Prerequisites
- Node.js and npm
- Python and pip
- MongoDB
### Installation
1. **Clone the repository:**
```bash
git clone https://github.com/your-username/ai-powered-code-snippet-search.git
cd ai-powered-code-snippet-search
```
2. **Set up the Client API:**
```bash
cd client-api
npm install
```
3. **Set up the Embedding API:**
```bash
cd ../embedding-api
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
### Running the Services
1. **Start the Embedding API:**
```bash
cd embedding-api
python server.py
```
The gRPC server will start on `localhost:50051`.
2. **Start the Client API:**
In a new terminal window:
```bash
cd client-api
npm start
```
The REST API will be available at `http://localhost:3000`.
## API Reference
### Create a Snippet
- **Endpoint**: `POST /api/snippets/create`
- **Description**: Creates a new code snippet.
- **Request Body**:
```json
{
"code": "console.log('Hello, World!');",
"language": "javascript",
"description": "A simple hello world program in JavaScript.",
"tags": ["hello-world", "javascript"]
}
```
- **Response**:
- **201 Created**: If the snippet is created successfully.
- **400 Bad Request**: If the request body is invalid.
### Search for a Snippet
- **Endpoint**: `GET /api/snippets`
- **Description**: Searches for code snippets based on a natural language query.
- **Note**: This endpoint is currently under development and is not functional.
- **Query Parameters**:
- `q` (string, required): The natural language search query.
- **Response**:
- **200 OK**: Returns an array of matching snippets.
- **400 Bad Request**: If the `q` query parameter is missing.
## gRPC Service (`embedding.proto`)
The `embedding-api` exposes a gRPC service for generating vector embeddings.
### Service Definition
```proto
service EmbedService {
rpc GenerateEmbedding (EmbedRequest) returns (EmbedResponse);
}
```
### Messages
```proto
message EmbedRequest {
string code = 1;
string language = 2;
string description = 3;
repeated string tags = 4;
}
message EmbedResponse {
repeated float embedding = 1;
}
```
## Project Structure
```
.
├── client-api
│ ├── src
│ │ ├── controller
│ │ ├── middlewares
│ │ ├── models
│ │ ├── routes
│ │ └── ...
│ ├── package.json
│ └── ...
├── embedding-api
│ ├── gRPCMethods
│ │ ├── embedding_pb2.py
│ │ └── embedding_pb2_grpc.py
│ ├── server.py
│ └── ...
├── protos
│ └── embedding.proto
└── README.md
```
## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
## License
This project is licensed under the MIT License. See the `LICENSE` file for details.