An open API service indexing awesome lists of open source software.

https://github.com/Azure-Samples/ARGUS

Automated Retrieval and GPT Understanding System by utilizing Azure Document Intelligence in combination with GPT models.
https://github.com/Azure-Samples/ARGUS

Last synced: 16 days ago
JSON representation

Automated Retrieval and GPT Understanding System by utilizing Azure Document Intelligence in combination with GPT models.

Awesome Lists containing this project

README

          

# ๐Ÿ‘๏ธ ARGUS: The All-Seeing Document Intelligence Platform

[![Azure](https://img.shields.io/badge/Azure-0078D4?style=for-the-badge&logo=microsoft-azure&logoColor=white)](https://azure.microsoft.com)
[![OpenAI](https://img.shields.io/badge/GPT--4-412991?style=for-the-badge&logo=openai&logoColor=white)](https://openai.com)
[![FastAPI](https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)

*Named after Argus Panoptes, the mythological giant with a hundred eyesโ€”ARGUS never misses a detail in your documents.*

## ๐Ÿš€ Transform Document Processing with AI Intelligence

**ARGUS** revolutionizes how organizations extract, understand, and act on document data. By combining the precision of **Azure Document Intelligence** with the contextual reasoning of **GPT-4 Vision**, ARGUS doesn't just read documentsโ€”it *understands* them.

### ๐Ÿ’ก Why ARGUS?

Traditional OCR solutions extract text but miss the context. AI-only approaches struggle with complex layouts. **ARGUS bridges this gap**, delivering enterprise-grade document intelligence that:

- **๐ŸŽฏ Extracts with Purpose**: Understands document context, not just text
- **โšก Scales Effortlessly**: Process thousands of documents with cloud-native architecture
- **๐Ÿ”’ Secures by Design**: Enterprise security with managed identities and RBAC
- **๐Ÿง  Learns Continuously**: Configurable datasets adapt to your specific document types
- **๐Ÿ“Š Measures Success**: Built-in evaluation tools ensure consistent accuracy

---

## ๐ŸŒŸ Key Capabilities

### ๐Ÿ” **Intelligent Document Understanding**
- **Hybrid AI Pipeline**: Combines OCR precision with LLM reasoning
- **Context-Aware Extraction**: Understands relationships between data points
- **Multi-Format Support**: PDFs, images, forms, invoices, medical records
- **Zero-Shot Learning**: Works on new document types without training

### โšก **Enterprise-Ready Performance**
- **Cloud-Native Architecture**: Built on Azure Container Apps
- **Scalable Processing**: Handle document floods with confidence
- **Real-Time Processing**: API-driven workflows for immediate results
- **Event-Driven Automation**: Automatic processing on document upload

### ๐ŸŽ›๏ธ **Advanced Control & Customization**
- **Dynamic Configuration**: Runtime settings without redeployment
- **Custom Datasets**: Tailor extraction for your specific needs
- **Interactive Chat**: Ask questions about processed documents
- **Concurrency Management**: Fine-tune performance for your workload

### ๐Ÿ“ˆ **Comprehensive Analytics**
- **Built-in Evaluation**: Multiple accuracy metrics and comparisons
- **Performance Monitoring**: Application Insights integration
- **Custom Evaluators**: Fuzzy matching, semantic similarity, and more
- **Visual Analytics**: Jupyter notebooks for deep analysis

---

## ๐Ÿ—๏ธ Architecture: Built for Scale and Security

ARGUS employs a modern, cloud-native architecture designed for enterprise workloads:

```mermaid
graph TB
subgraph "๐Ÿ“ฅ Document Input"
A[๐Ÿ“„ Documents] --> B[๐Ÿ“ Azure Blob Storage]
C[๐ŸŒ Direct Upload API] --> D[๐Ÿš€ FastAPI Backend]
end

subgraph "๐Ÿง  AI Processing Engine"
B --> D
D --> E[๐Ÿ” Azure Document Intelligence]
D --> F[๐Ÿค– GPT-4 Vision]
E --> G[โš™๏ธ Hybrid Processing Pipeline]
F --> G
end

subgraph "๐Ÿ’ก Intelligence & Analytics"
G --> H[๐Ÿ“Š Custom Evaluators]
G --> I[๐Ÿ’ฌ Interactive Chat]
H --> J[๐Ÿ“ˆ Results & Analytics]
end

subgraph "๐Ÿ’พ Data Layer"
G --> K[๐Ÿ—„๏ธ Azure Cosmos DB]
J --> K
I --> K
K --> L[๐Ÿ“ฑ Streamlit Frontend]
end

style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style C fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
style D fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style E fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
style G fill:#fff8e1,stroke:#ffa000,stroke-width:2px
style H fill:#f1f8e9,stroke:#558b2f,stroke-width:2px
style I fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
style J fill:#fdf2e9,stroke:#e65100,stroke-width:2px
style K fill:#e0f7fa,stroke:#0097a7,stroke-width:2px
style L fill:#f9fbe7,stroke:#827717,stroke-width:2px
```

### ๐Ÿ”ง Infrastructure Components

| Component | Technology | Purpose |
|-----------|------------|---------|
| **๐Ÿš€ Backend API** | Azure Container Apps + FastAPI | High-performance document processing engine |
| **๐Ÿ“ฑ Frontend UI** | Streamlit (Optional) | Interactive document management interface |
| **๐Ÿ“ Document Storage** | Azure Blob Storage | Secure, scalable document repository |
| **๐Ÿ—„๏ธ Metadata Database** | Azure Cosmos DB | Results, configurations, and analytics |
| **๐Ÿ” OCR Engine** | Azure Document Intelligence | Structured text and layout extraction |
| **๐Ÿง  AI Reasoning** | Azure OpenAI (GPT-4 Vision) | Contextual understanding and extraction |
| **๐Ÿ—๏ธ Container Registry** | Azure Container Registry | Private, secure container images |
| **๐Ÿ”’ Security** | Managed Identity + RBAC | Zero-credential architecture |
| **๐Ÿ“Š Monitoring** | Application Insights | Performance and health monitoring |

---

## โšก Quick Start: Deploy in Minutes

### ๐Ÿ“‹ Prerequisites

๐Ÿ› ๏ธ Required Tools (Click to expand)

1. **Docker**
```bash
# Install Docker (required for containerization during deployment)
# Visit https://docs.docker.com/get-docker/ for installation instructions
```

2. **Azure Developer CLI (azd)**
```bash
curl -fsSL https://aka.ms/install-azd.sh | bash
```

3. **Azure CLI**
```bash
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
```

4. **Azure OpenAI Resource**
- Create an Azure OpenAI resource in a [supported region](https://docs.microsoft.com/azure/cognitive-services/openai/overview#regional-availability)
- Deploy a vision-capable model: `gpt-4o`, `gpt-4-turbo`, or `gpt-4` (with vision)
- Collect: endpoint URL, API key, and deployment name

### ๐Ÿš€ One-Command Deployment

```bash
# 1. Clone the repository
git clone https://github.com/Azure-Samples/ARGUS.git
cd ARGUS

# 2. Login to Azure
az login

# 3. Deploy everything with a single command
azd up
```

**That's it!** ๐ŸŽ‰ Your ARGUS instance is now running in the cloud.

### โœ… Verify Your Deployment

```bash
# Check system health
curl "$(azd env get-value BACKEND_URL)/health"

# Expected response:
{
"status": "healthy",
"services": {
"cosmos_db": "โœ… connected",
"blob_storage": "โœ… connected",
"document_intelligence": "โœ… connected",
"azure_openai": "โœ… connected"
}
}

# View live application logs
azd logs --follow
```

---

## ๐ŸŽฎ Usage Examples: See ARGUS in Action

### ๐Ÿ“„ Method 1: Upload via Frontend Interface (Recommended)

The easiest way to process documents is through the user-friendly web interface:

1. **Access the Frontend**:
```bash
# Get the frontend URL after deployment
azd env get-value FRONTEND_URL
```

2. **Upload and Process Documents**:
- Navigate to the **"๐Ÿง  Process Files"** tab
- Select your dataset from the dropdown (e.g., "default-dataset", "medical-dataset")
- Use the **file uploader** to select PDF, image, or Office documents
- Click **"Submit"** to upload files
- Files are automatically processed using the selected dataset's configuration
- Monitor processing status in the **"๐Ÿ” Explore Data"** tab

### ๐Ÿ“ค Method 2: Direct Blob Storage Upload

For automation or bulk processing, upload files directly to Azure Blob Storage:

```bash
# Upload a document to be processed automatically
az storage blob upload \
--account-name "$(azd env get-value STORAGE_ACCOUNT_NAME)" \
--container-name "datasets" \
--name "default-dataset/invoice-2024.pdf" \
--file "./my-invoice.pdf" \
--auth-mode login

# Files uploaded to blob storage are automatically detected and processed
# Results can be viewed in the frontend or retrieved via API
```

### ๐Ÿ’ฌ Example 3: Interactive Document Chat

Ask questions about any processed document through the API:

```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"blob_url": "https://mystorage.blob.core.windows.net/datasets/default-dataset/contract.pdf",
"question": "What are the key terms and conditions in this contract?"
}' \
"$(azd env get-value BACKEND_URL)/api/chat"

# Get intelligent answers:
{
"answer": "The key terms include: 1) 12-month service agreement, 2) $5000/month fee, 3) 30-day termination clause...",
"confidence": 0.91,
"sources": ["page 1, paragraph 3", "page 2, section 2.1"]
}
```

---

## ๐ŸŽ›๏ธ Advanced Configuration

### ๐Ÿ“Š Dataset Management

ARGUS uses **datasets** to define how different types of documents should be processed. A dataset contains:
- **Model Prompt**: Instructions telling the AI how to extract data from documents
- **Output Schema**: The target structure for extracted data (can be empty to let AI determine the structure)
- **Processing Options**: Settings for OCR, image analysis, summarization, and evaluation

**When to create custom datasets**: Create a new dataset when you have a specific document type that requires different extraction logic than the built-in datasets (e.g., contracts, medical reports, financial statements).

๐Ÿ—‚๏ธ Built-in Datasets

- **`default-dataset/`**: Invoices, receipts, general business documents
- **`medical-dataset/`**: Medical forms, prescriptions, healthcare documents

๐Ÿ”ง Create Custom Datasets

Datasets are managed through the Streamlit frontend interface (deployed automatically with azd):

1. **Access the frontend** (URL provided after azd deployment)
2. **Navigate to the Process Files tab**
3. **Scroll to "Add New Dataset" section**
4. **Configure your dataset**:
- Enter dataset name (e.g., "legal-contracts")
- Define model prompt with extraction instructions
- Specify output schema (JSON format) or leave empty
- Set processing options (OCR, images, evaluation)
5. **Click "Add New Dataset"** - it's saved directly to Cosmos DB

---

## ๐Ÿ–ฅ๏ธ Frontend Interface: User-Friendly Document Management

The Streamlit frontend is **automatically deployed** with `azd up` and provides a user-friendly interface for document management.


ARGUS Frontend Interface

### ๐ŸŽฏ Frontend Features

| Tab | Functionality |
|-----|---------------|
| **๐Ÿง  Process Files** | Drag-and-drop document upload with real-time processing status |
| **๐Ÿ” Explore Data** | Browse processed documents, search results, view extraction details |
| **โš™๏ธ Settings** | Configure datasets, adjust processing parameters, manage connections |
| **๐Ÿ“‹ Instructions** | Interactive help, API documentation, and usage examples |

---

## ๏ธ Development & Customization

### ๐Ÿ—๏ธ Project Structure Deep Dive

```
ARGUS/
โ”œโ”€โ”€ ๐Ÿ“‹ azure.yaml # Azure Developer CLI configuration
โ”œโ”€โ”€ ๐Ÿ“„ README.md # Project documentation & setup guide
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE # MIT license file
โ”œโ”€โ”€ ๐Ÿ“„ CONTRIBUTING.md # Contribution guidelines
โ”œโ”€โ”€ ๐Ÿ“„ sample-invoice.pdf # Sample document for testing
โ”œโ”€โ”€ ๐Ÿ”ง .env.template # Environment variables template
โ”œโ”€โ”€ ๐Ÿ“‚ .github/ # GitHub Actions & workflows
โ”œโ”€โ”€ ๐Ÿ“‚ .devcontainer/ # Development container configuration
โ”œโ”€โ”€ ๐Ÿ“‚ .vscode/ # VS Code settings & extensions
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ infra/ # ๐Ÿ—๏ธ Azure Infrastructure as Code
โ”‚ โ”œโ”€โ”€ โš™๏ธ main.bicep # Primary Bicep template for Azure resources
โ”‚ โ”œโ”€โ”€ โš™๏ธ main.parameters.json # Infrastructure parameters & configuration
โ”‚ โ”œโ”€โ”€ โš™๏ธ main-containerapp.bicep # Container App specific infrastructure
โ”‚ โ”œโ”€โ”€ โš™๏ธ main-containerapp.parameters.json # Container App parameters
โ”‚ โ””โ”€โ”€ ๐Ÿ“‹ abbreviations.json # Azure resource naming abbreviations
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ src/ # ๐Ÿš€ Core Application Source Code
โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ containerapp/ # FastAPI Backend Service
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿš€ main.py # FastAPI app lifecycle & configuration
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ”Œ api_routes.py # HTTP endpoints & request handlers
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ”ง dependencies.py # Azure client initialization & management
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‹ models.py # Pydantic data models & schemas
โ”‚ โ”‚ โ”œโ”€โ”€ โš™๏ธ blob_processing.py # Document processing pipeline orchestration
โ”‚ โ”‚ โ”œโ”€โ”€ ๐ŸŽ›๏ธ logic_app_manager.py # Azure Logic Apps concurrency management
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿณ Dockerfile # Container image definition
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ฆ requirements.txt # Python dependencies
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ REFACTORING_SUMMARY.md # Architecture documentation
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ ai_ocr/ # ๐Ÿง  AI Processing Engine
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ” process.py # Main processing orchestration & workflow
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ”— chains.py # LangChain integration & AI workflows
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿค– model.py # Configuration models & data structures
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ โฑ๏ธ timeout.py # Processing timeout management
โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ azure/ # โ˜๏ธ Azure Service Integrations
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ โš™๏ธ config.py # Environment & configuration management
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ doc_intelligence.py # Azure Document Intelligence OCR
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ–ผ๏ธ images.py # PDF to image conversion utilities
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿค– openai_ops.py # Azure OpenAI API operations
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ example-datasets/ # ๐Ÿ“Š Default Dataset Configurations
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ datasets/ # ๐Ÿ“ Runtime dataset storage
โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ evaluators/ # ๐Ÿ“ˆ Data quality evaluation modules
โ”‚ โ”‚
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ evaluators/ # ๐Ÿงช Evaluation Framework
โ”‚ โ”œโ”€โ”€ ๐Ÿ“‹ field_evaluator_base.py # Abstract base class for evaluators
โ”‚ โ”œโ”€โ”€ ๐Ÿ”ค fuzz_string_evaluator.py # Fuzzy string matching evaluation
โ”‚ โ”œโ”€โ”€ ๐ŸŽฏ cosine_similarity_string_evaluator.py # Semantic similarity evaluation
โ”‚ โ”œโ”€โ”€ ๐ŸŽ›๏ธ custom_string_evaluator.py # Custom evaluation logic
โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š json_evaluator.py # JSON structure validation
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ tests/ # Unit tests for evaluators
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ frontend/ # ๐Ÿ–ฅ๏ธ Streamlit Web Interface
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ฑ app.py # Main Streamlit application entry point
โ”‚ โ”œโ”€โ”€ ๐Ÿ”„ backend_client.py # API client for backend communication
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ค process_files.py # File upload & processing interface
โ”‚ โ”œโ”€โ”€ ๐Ÿ” explore_data.py # Document browsing & analysis UI
โ”‚ โ”œโ”€โ”€ ๐Ÿ’ฌ document_chat.py # Interactive document Q&A interface
โ”‚ โ”œโ”€โ”€ ๐Ÿ“‹ instructions.py # Help & documentation tab
โ”‚ โ”œโ”€โ”€ โš™๏ธ settings.py # Configuration management UI
โ”‚ โ”œโ”€โ”€ ๐ŸŽ›๏ธ concurrency_management.py # Performance tuning interface
โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š concurrency_settings.py # Concurrency configuration utilities
โ”‚ โ”œโ”€โ”€ ๐Ÿณ Dockerfile # Frontend container definition
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ฆ requirements.txt # Python dependencies for frontend
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ static/ # Static assets (logos, images)
โ”‚ โ””โ”€โ”€ ๐Ÿ–ผ๏ธ logo.png # ARGUS brand logo
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ demo/ # ๐Ÿ“‹ Sample Datasets & Examples
โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ default-dataset/ # General business documents dataset
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ system_prompt.txt # AI extraction instructions
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š output_schema.json # Expected data structure
โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ ground_truth.json # Validation reference data
โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ Invoice Sample.pdf # Sample document for testing
โ”‚ โ”‚
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ medical-dataset/ # Healthcare documents dataset
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ system_prompt.txt # Medical-specific extraction rules
โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š output_schema.json # Medical data structure
โ”‚ โ””โ”€โ”€ ๐Ÿ“„ eyes_surgery_pre_1_4.pdf # Sample medical document
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ notebooks/ # ๐Ÿ“ˆ Analytics & Evaluation Tools
โ”‚ โ”œโ”€โ”€ ๐Ÿงช evaluator.ipynb # Comprehensive evaluation dashboard
โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š output.json # Evaluation results & metrics
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ฆ requirements.txt # Jupyter notebook dependencies
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ README.md # Notebook usage instructions
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ outputs/ # Historical evaluation results
โ”‚
โ””โ”€โ”€ ๐Ÿ“‚ docs/ # ๐Ÿ“š Documentation & Assets
โ””โ”€โ”€ ๐Ÿ–ผ๏ธ ArchitectureOverview.png # System architecture diagram
```

### ๐Ÿงช Local Development Setup

```bash
# Setup development environment
cd src/containerapp
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

# Configure local environment
cp ../../.env.template .env
# Edit .env with your development credentials

# Run with hot reload
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Access API documentation
open http://localhost:8000/docs
```

### ๐Ÿ”ง Key Technologies & Libraries

| Category | Technologies |
|----------|-------------|
| **๐Ÿš€ API Framework** | FastAPI, Uvicorn, Pydantic |
| **๐Ÿง  AI/ML** | LangChain, OpenAI SDK, Azure AI SDK |
| **โ˜๏ธ Azure Services** | Azure SDK (Blob, Cosmos, Document Intelligence) |
| **๐Ÿ“„ Document Processing** | PyMuPDF, Pillow, PyPDF2 |
| **๐Ÿ“Š Data & Analytics** | Pandas, NumPy, Matplotlib |
| **๐Ÿ”’ Security** | Azure Identity, managed identities |

---

## API Reference: Complete Documentation

### ๐Ÿš€ Core Processing Endpoints

๐Ÿ“„ POST /api/process-blob - Process Document from Storage

**Request**:
```json
{
"blob_url": "https://storage.blob.core.windows.net/datasets/default-dataset/invoice.pdf",
"dataset_name": "default-dataset",
"priority": "normal",
"webhook_url": "https://your-app.com/webhooks/argus",
"metadata": {
"source": "email_attachment",
"user_id": "user123"
}
}
```

**Response**:
```json
{
"status": "success",
"job_id": "job_12345",
"extraction_results": {
"invoice_number": "INV-2024-001",
"total_amount": "$1,250.00",
"confidence_score": 0.94
},
"processing_time": "2.3s",
"timestamp": "2024-01-15T10:30:00Z"
}
```

๐Ÿ“ค POST /api/process-file - Direct File Upload

**Request** (multipart/form-data):
```
file: [PDF/Image file]
dataset_name: "default-dataset"
priority: "high"
```

**Response**:
```json
{
"status": "success",
"job_id": "job_12346",
"blob_url": "https://storage.blob.core.windows.net/temp/uploaded_file.pdf",
"extraction_results": {...},
"processing_time": "1.8s"
}
```

๐Ÿ’ฌ POST /api/chat - Interactive Document Q&A

**Request**:
```json
{
"blob_url": "https://storage.blob.core.windows.net/datasets/contract.pdf",
"question": "What are the payment terms and penalties for late payment?",
"context": "focus on financial obligations",
"temperature": 0.1
}
```

**Response**:
```json
{
"answer": "Payment terms are Net 30 days. Late payment penalty is 1.5% per month on outstanding balance...",
"confidence": 0.91,
"sources": [
{"page": 2, "section": "Payment Terms"},
{"page": 5, "section": "Default Provisions"}
],
"processing_time": "1.2s"
}
```

### โš™๏ธ Configuration Management

๐Ÿ”ง GET/POST /api/configuration - System Configuration

**GET Response**:
```json
{
"openai_settings": {
"endpoint": "https://your-openai.openai.azure.com/",
"model": "gpt-4o",
"temperature": 0.1,
"max_tokens": 4000
},
"processing_settings": {
"max_concurrent_jobs": 5,
"timeout_seconds": 300,
"retry_attempts": 3
},
"datasets": ["default-dataset", "medical-dataset", "financial-reports"]
}
```

**POST Request**:
```json
{
"openai_settings": {
"temperature": 0.05,
"max_tokens": 6000
},
"processing_settings": {
"max_concurrent_jobs": 8
}
}
```

### ๐Ÿ“Š Monitoring & Analytics

๐Ÿ“ˆ GET /api/metrics - Performance Metrics

**Response**:
```json
{
"period": "last_24h",
"summary": {
"total_documents": 1247,
"successful_extractions": 1198,
"failed_extractions": 49,
"success_rate": 96.1,
"avg_processing_time": "2.3s"
},
"performance": {
"p50_processing_time": "1.8s",
"p95_processing_time": "4.2s",
"p99_processing_time": "8.1s"
},
"errors": {
"ocr_failures": 12,
"ai_timeouts": 8,
"storage_issues": 3,
"other": 26
}
}
```

---

## Contributing & Community

### ๐ŸŽฏ How to Contribute

We welcome contributions! Here's how to get started:

1. **๐Ÿด Fork & Clone**:
```bash
git clone https://github.com/your-username/ARGUS.git
cd ARGUS
```

2. **๐ŸŒฟ Create Feature Branch**:
```bash
git checkout -b feature/amazing-improvement
```

3. **๐Ÿงช Develop & Test**:
```bash
# Setup development environment
./scripts/setup-dev.sh

# Run tests
pytest tests/ -v

# Lint code
black src/ && flake8 src/
```

4. **๐Ÿ“ Document Changes**:
```bash
# Update documentation
# Add examples to README
# Update API documentation
```

5. **๐Ÿš€ Submit PR**:
```bash
git commit -m "feat: add amazing improvement"
git push origin feature/amazing-improvement
# Create pull request on GitHub
```

### ๐Ÿ“‹ Contribution Guidelines

| Type | Guidelines |
|------|------------|
| **๐Ÿ› Bug Fixes** | Include reproduction steps, expected vs actual behavior |
| **โœจ New Features** | Discuss in issues first, include tests and documentation |
| **๐Ÿ“š Documentation** | Clear examples, practical use cases, proper formatting |
| **๐Ÿ”ง Performance** | Benchmark results, before/after comparisons |

### ๐Ÿ† Recognition

Contributors will be recognized in:
- ๐Ÿ“ Release notes for significant contributions
- ๐ŸŒŸ Contributors section (with permission)
- ๐Ÿ’ฌ Community showcase for innovative use cases

---

## ๐Ÿ“ž Support & Resources

### ๐Ÿ’ฌ Getting Help

| Resource | Description | Link |
|----------|-------------|------|
| **๐Ÿ“š Documentation** | Complete setup and usage guides | [docs/](docs/) |
| **๐Ÿ› Issue Tracker** | Bug reports and feature requests | [GitHub Issues](https://github.com/Azure-Samples/ARGUS/issues) |
| **๐Ÿ’ก Discussions** | Community Q&A and ideas | [GitHub Discussions](https://github.com/Azure-Samples/ARGUS/discussions) |
| **๐Ÿ“ง Team Contact** | Direct contact for enterprise needs | See team section below |

### ๐Ÿ”— Additional Resources

- **๐Ÿ“– Azure Document Intelligence**: [Official Documentation](https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/)
- **๐Ÿค– Azure OpenAI**: [Service Documentation](https://docs.microsoft.com/azure/cognitive-services/openai/)
- **โšก FastAPI**: [Framework Documentation](https://fastapi.tiangolo.com/)
- **๐Ÿ LangChain**: [Integration Guides](https://python.langchain.com/)

---

## ๐Ÿ‘ฅ Team

- **Alberto Gallo**
- **Petteri Johansson**
- **Christin Pohl**
- **Konstantinos Mavrodis**

## License

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

---

## ๐Ÿš€ Ready to Transform Your Document Processing?

**Deploy ARGUS in minutes and start extracting intelligence from your documents today!**

```bash
git clone https://github.com/Azure-Samples/ARGUS.git && cd ARGUS && azd up
```


[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template)
[![Open in Dev Container](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/Azure-Samples/ARGUS)


**โญ Star this repo if ARGUS helps your document processing needs!**