An open API service indexing awesome lists of open source software.

https://github.com/hasnaintypes/lawbotics-v2

LawBotics v2 is an AI-powered legal contract analysis platform that combines machine learning with modern web technologies to automate legal document review and clause extraction.
https://github.com/hasnaintypes/lawbotics-v2

ai authentication clerk convex cuad-dataset document-processing fine-tuning full-stack langchain legal-document-analyzer legal-tech monorepo nextjs shadcn tailwindcss typescript

Last synced: about 2 months ago
JSON representation

LawBotics v2 is an AI-powered legal contract analysis platform that combines machine learning with modern web technologies to automate legal document review and clause extraction.

Awesome Lists containing this project

README

          

# LawBotics v2

A sophisticated AI-powered legal contract analysis platform built with Next.js and advanced machine learning models. This platform leverages the Contract Understanding Atticus Dataset (CUAD) to provide intelligent contract review and clause extraction capabilities.

## ๐Ÿ“‹ Table of Contents

- [Overview](#overview)
- [Features](#features)
- [Architecture](#architecture)
- [Installation](#installation)
- [Usage](#usage)
- [AI Model Training](#ai-model-training)
- [Project Structure](#project-structure)
- [Technology Stack](#technology-stack)
- [Development](#development)
- [API Documentation](#api-documentation)
- [Contributing](#contributing)
- [License](#license)

## ๐ŸŽฏ Overview

LawBotics v2 is an advanced legal technology platform that combines artificial intelligence with legal expertise to streamline contract review processes. The platform utilizes fine-tuned language models trained on the CUAD dataset to identify and extract key contract clauses, helping legal professionals save time and reduce errors in contract analysis.

## โœจ Features

### Core Functionality

- **AI-Powered Contract Analysis**: Automated clause identification and extraction from legal contracts
- **Multi-Format Support**: Process contracts in PDF and text formats
- **Real-time Analysis**: Instant contract review with immediate results
- **Clause Categorization**: Identify 41+ different types of legal clauses
- **Interactive UI**: Modern, responsive web interface built with Next.js 15

### AI & Machine Learning

- **Fine-tuned LLaMA Models**: Custom-trained models on legal contract data
- **CUAD Dataset Integration**: Leverages 13,000+ labeled contract examples
- **LangChain Integration**: Advanced AI orchestration and processing
- **Google GenAI Support**: Integration with Google's generative AI models

### User Experience

- **Authentication**: Secure user management with Clerk
- **Dark/Light Mode**: Customizable theme support
- **Responsive Design**: Optimized for desktop and mobile devices
- **Real-time Notifications**: Toast notifications and progress tracking
- **PDF Viewer**: Built-in PDF document viewer and processor

## ๐Ÿ—๏ธ Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Web Frontend โ”‚ โ”‚ AI Processing โ”‚ โ”‚ Data Storage โ”‚
โ”‚ (Next.js 15) โ”‚โ—„โ”€โ”€โ–บโ”‚ (Python/ML) โ”‚โ—„โ”€โ”€โ–บโ”‚ (Convex DB) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚ โ”‚
โ–ผ โ–ผ โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Authentication โ”‚ โ”‚ Model Training โ”‚ โ”‚ Contract Storageโ”‚
โ”‚ (Clerk) โ”‚ โ”‚ (Jupyter) โ”‚ โ”‚ (Files) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

## ๐Ÿš€ Installation

### Prerequisites

- **Node.js** (v18 or higher)
- **Python** (v3.8 or higher)
- **Git**
- **npm** or **yarn**

### Quick Start

1. **Clone the repository**

```bash
git clone https://github.com/hasnaintypes/lawbotics-v2.git
cd lawbotics-v2
```

2. **Install dependencies**

```bash
# Install web UI dependencies
cd apps/web-ui
npm install

# Return to root
cd ../..
```

3. **Set up environment variables**

```bash
# Copy environment template
cp apps/web-ui/.env.example apps/web-ui/.env.local
```

4. **Configure environment variables**
Edit `apps/web-ui/.env.local` with your keys:

```env
# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key

# Convex Database
NEXT_PUBLIC_CONVEX_URL=your_convex_url

# Google AI
GOOGLE_GENERATIVE_AI_API_KEY=your_google_ai_key

# Other services
SVIX_SECRET=your_svix_secret
```

5. **Start the development server**

```bash
cd apps/web-ui
npm run dev
```

6. **Access the application**
Open [http://localhost:3000](http://localhost:3000) in your browser

## ๐ŸŽฎ Usage

### Web Interface

1. **Sign Up/Login**: Create an account or log in using the Clerk authentication system
2. **Upload Contract**: Upload a PDF or text file containing the legal contract
3. **AI Analysis**: The system will automatically process the contract and identify key clauses
4. **Review Results**: Examine the extracted clauses and their categorizations
5. **Export/Save**: Save results or export analysis reports

### Contract Analysis Features

- **Clause Detection**: Automatically identifies 41+ types of legal clauses
- **Risk Assessment**: Highlights potentially problematic clauses
- **Comparison**: Compare multiple contracts side by side
- **Search**: Full-text search within contracts and extracted clauses

## ๐Ÿง  AI Model Training

The project includes comprehensive AI model training capabilities using the CUAD dataset.

### Dataset Information

- **CUAD v1**: 13,000+ labeled examples across 510 commercial contracts
- **41 Clause Categories**: Comprehensive coverage of legal contract elements
- **Multiple Formats**: CSV, JSON, Excel, PDF, and TXT formats available

### Training Process

1. **Navigate to AI model directory**

```bash
cd ai-model
```

2. **Open Jupyter Notebook**

```bash
jupyter notebook Fine_tuning_code.ipynb
```

3. **Follow the training steps**:
- Data preparation and preprocessing
- Model fine-tuning with LLaMA architecture
- Evaluation and validation
- Model export and deployment

### Supported Models

- **LLaMA 3.2**: Primary model for instruction tuning
- **Google GenAI**: Integration for additional AI capabilities
- **Custom Fine-tuned Models**: Specialized legal contract models

## ๐Ÿ“ Project Structure

```
lawbotics-v2/
โ”œโ”€โ”€ ๐Ÿ“ ai-model/ # AI/ML model training and data
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ Fine_tuning_code.ipynb # Jupyter notebook for model training
โ”‚ โ””โ”€โ”€ ๐Ÿ“ data-set/ # CUAD dataset and training data
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ CUAD_v1.json # Main dataset file
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ master_clauses.csv # Clause categorization data
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ full_contract_pdf/ # Original contract PDFs
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ full_contract_txt/ # Text versions of contracts
โ”‚ โ””โ”€โ”€ ๐Ÿ“ label_group_xlsx/ # Excel files with labeled data
โ”œโ”€โ”€ ๐Ÿ“ apps/ # Application modules
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ dashboard/ # Admin dashboard (planned)
โ”‚ โ””โ”€โ”€ ๐Ÿ“ web-ui/ # Main web application
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ package.json # Dependencies and scripts
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ next.config.ts # Next.js configuration
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ tailwind.config.js # Tailwind CSS configuration
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ middleware.ts # Authentication middleware
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ src/ # Source code
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ app/ # Next.js app router pages
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ components/ # Reusable UI components
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ lib/ # Utility functions and configs
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ hooks/ # Custom React hooks
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ services/ # API and external service integrations
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ store/ # State management (Zustand)
โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“ convex/ # Convex database functions
โ”‚ โ””โ”€โ”€ ๐Ÿ“ public/ # Static assets
โ””โ”€โ”€ ๐Ÿ“ docs/ # Documentation (planned)
```

## ๐Ÿ› ๏ธ Technology Stack

### Frontend

- **Next.js 15**: React framework with App Router
- **React 19**: Latest React with concurrent features
- **TypeScript**: Type-safe development
- **Tailwind CSS 4**: Modern utility-first CSS framework
- **Radix UI**: Accessible component primitives
- **Lucide React**: Modern icon library

### Backend & Services

- **Convex**: Real-time database and backend
- **Clerk**: Authentication and user management
- **SVIX**: Webhook management
- **Axios**: HTTP client for API requests

### AI & Machine Learning

- **LangChain**: AI application framework
- **Google Generative AI**: AI model integration
- **Python**: Model training and processing
- **Jupyter**: Interactive development environment

### State Management & Utils

- **Zustand**: Lightweight state management
- **React PDF**: PDF processing and viewing
- **Recharts**: Data visualization
- **Sonner**: Toast notifications

## ๐Ÿ‘ฉโ€๐Ÿ’ป Development

### Available Scripts

```bash
# Development server
npm run dev

# Production build
npm run build

# Start production server
npm run start

# Lint code
npm run lint
```

### Code Style

- **ESLint**: Code linting with Next.js configuration
- **TypeScript**: Strict type checking enabled
- **Prettier**: Code formatting (recommended)

### Development Guidelines

1. **Component Structure**: Use functional components with TypeScript
2. **State Management**: Utilize Zustand for global state
3. **Styling**: Implement Tailwind CSS classes with component variants
4. **API Integration**: Use services directory for external API calls
5. **Error Handling**: Implement comprehensive error boundaries

## ๐Ÿ“š API Documentation

### Authentication Endpoints

- `POST /api/auth/signin` - User sign in
- `POST /api/auth/signup` - User registration
- `POST /api/auth/signout` - User sign out

### Contract Analysis Endpoints

- `POST /api/contracts/upload` - Upload contract for analysis
- `GET /api/contracts/:id` - Retrieve contract analysis
- `POST /api/contracts/analyze` - Perform AI analysis
- `GET /api/contracts/history` - User's contract history

### AI Model Endpoints

- `POST /api/ai/extract-clauses` - Extract clauses from contract
- `POST /api/ai/classify` - Classify contract clauses
- `GET /api/ai/models` - Available AI models

## ๐Ÿค Contributing

We welcome contributions to LawBotics v2! Please follow these guidelines:

### Getting Started

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/your-feature-name`
3. Make your changes and add tests
4. Commit your changes: `git commit -m 'Add some feature'`
5. Push to the branch: `git push origin feature/your-feature-name`
6. Submit a pull request

### Contribution Guidelines

- Follow the existing code style and conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
- Provide clear commit messages and PR descriptions

### Issues and Bug Reports

Please use the GitHub issue tracker to report bugs or request features. Include:

- Detailed description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Screenshots (if applicable)
- Environment details

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- **The Atticus Project**: For providing the CUAD dataset
- **Next.js Team**: For the excellent React framework
- **Clerk**: For seamless authentication solutions
- **Radix UI**: For accessible component primitives
- **Tailwind CSS**: For the utility-first CSS framework

## ๐Ÿ“ž Support

For support and questions:

- ๐Ÿ“ง Email: support@lawbotics.com
- ๐Ÿ’ฌ Discord: [LawBotics Community](https://discord.gg/lawbotics)
- ๐Ÿ“– Documentation: [docs.lawbotics.com](https://docs.lawbotics.com)
- ๐Ÿ› Issues: [GitHub Issues](https://github.com/hasnaintypes/lawbotics-v2/issues)

---

**Built with โค๏ธ by the LawBotics Team**

_Empowering legal professionals with AI-driven contract analysis_