https://github.com/arakiss/cortex-poc

A proof of concept to validate the viability of security log analysis using LLMs within the RODELA platform.
https://github.com/arakiss/cortex-poc

Last synced: 4 months ago
JSON representation

A proof of concept to validate the viability of security log analysis using LLMs within the RODELA platform.

Host: GitHub
URL: https://github.com/arakiss/cortex-poc
Owner: Arakiss
Created: 2024-11-10T22:19:01.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-12T17:22:54.000Z (8 months ago)
Last Synced: 2025-03-10T15:19:35.261Z (4 months ago)
Language: Python
Homepage:
Size: 128 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Security: docs/security_specs.md

Awesome Lists containing this project

README

# Cortex POC

A proof of concept to validate the viability of security log analysis using LLMs within the RODELA platform.

## Overview

This POC aims to demonstrate the effectiveness of using GPT OpenAI models to analyze security logs and generate actionable intelligence. The focus is on processing historical log data and generating structured analysis that can be consumed by downstream systems.

## Development Framework & Scaffolding

### CoreSight Framework
The project utilizes CoreSight, a lightweight and transparent micro-framework for orchestrating LLM-based agents. CoreSight was chosen for its:
- Direct developer control
- Minimal complexity
- Transparent operation
- Efficient agent orchestration

CoreSight was created by Petru Arakiss ([email protected]) and is used under private license for this project.

### Development Abstractions
In addition to CoreSight, this POC leverages several personal abstractions and utilities developed by Petru Arakiss. These components serve as development scaffolding to:
- Accelerate initial development and prototyping
- Provide temporary structuring patterns
- Enable rapid iteration and testing
- Facilitate proof of concept validation

These additional abstractions are used purely for development purposes and are not intended to be part of the final production implementation. They serve as a temporary framework to validate concepts and patterns that will be properly implemented in the final version.

## Dataset Specifications

### Test Dataset
- Total events: 400 events
- Event types distribution:
- SQL injections (15.50%)
- Port scans (14.75%)
- Protocol abuse (14.75%)
- Unusual traffic (14.75%)
- DDoS attacks (14.75%)
- Brute force attempts (13.25%)
- Geographic pattern anomalies (12.25%)
- Multiple protocols (HTTP, TCP, UDP, ICMP, FILE)
- Various severity levels (1-5)

### Batch Processing
- Recommended batch size: 50-100 events
- Estimated tokens per event: ~300-400
- Total batches: 4-8

## POC Scope

### Core Objectives
- Validate OpenAI's GPT-4 LLM's model capabilities to understand and analyze security logs
- Demonstrate generation of useful, actionable insights
- Evaluate quality and structure of generated recommendations
- Assess processing costs and performance metrics
- Validate CoreSight framework effectiveness in security analysis

### Out of Scope
- Real-time processing
- External system integrations (except CoreSight)
- User interface implementation
- Automated rule deployment

## Key Features

### Log Analysis
- Processing of security events from JSONL files using CoreSight agents
- Pattern recognition across event types:
- Port scanning detection
- Brute force attack identification
- SQL injection attempts
- DDoS attack patterns
- Protocol abuse detection
- Geographic anomaly detection
- Unusual traffic analysis
- Severity assessment and prioritization
- Attack pattern identification

### Intelligence Generation
- Structured JSON output through CoreSight agents
- Attack pattern summaries
- Mitigation recommendations
- Risk level assessments
- Critical IP identification
- Protocol-specific analysis
- Geographic pattern analysis

## Technical Implementation

### Core Components
- CoreSight Framework: Orchestrates LLM-based agents
- Log Parser: Processes JSONL security logs
- Batch Processor: Groups related events (50-100 per batch)
- LLM Interface: Manages GPT-4 interactions
- Output Formatter: Structures analysis results

### Agent Architecture
- Orchestration Agent: Manages workflow and task distribution
- Analysis Agents: Specialized in different attack patterns
- Synthesis Agent: Combines findings and generates recommendations
- Validation Agent: Ensures output quality and consistency

### Data Flow
1. Input: JSONL security logs (400 events)
2. Processing: Event grouping and context building
3. Analysis: CoreSight-managed LLM-based pattern detection
4. Output: Structured JSON intelligence

## System Requirements

### Minimum Requirements
- Python 3.11+
- OpenAI API access with GPT-4 capabilities
- CoreSight framework installation
- ~2GB RAM
- Storage for log files

### Dependencies
- CoreSight micro-framework
- OpenAI Python library
- Basic JSON processing utilities
- Minimal logging framework

## POC Metrics

### Success Criteria
- Accurate pattern detection for all event types
- Actionable recommendations for identified threats
- Structured, parseable JSON output
- Processing cost under budget constraints
- Efficient agent orchestration via CoreSight

### Performance Targets
- Batch size: 50-100 events/request
- Response time: <30s per batch
- Cost efficiency: <$0.02 per event
- Total processing time: <5 minutes for full dataset
- Agent orchestration overhead: <5% of total processing time

## Getting Started

1. Clone the repository
2. Install CoreSight framework (private distribution)
3. Install dependencies: `pip install -r requirements.txt`
4. Set OpenAI API key: `export OPENAI_API_KEY="your-key"`
5. Run analysis: `python analyze_logs.py input.jsonl`

## Sample Output

```json
{
"analysis": {
"patterns": [
{
"type": "port_scan",
"severity_distribution": {"high": 20, "medium": 45, "low": 35},
"notable_ips": ["x.x.x.x", "y.y.y.y"],
"protocols": ["TCP", "UDP", "HTTP"]
},
{
"type": "ddos",
"protocols": ["ICMP", "UDP", "TCP"],
"severity_levels": [1, 2, 3, 4, 5]
}
],
"recommendations": [
{
"threat": "port_scanning",
"mitigation": "Implement rate limiting",
"priority": "high"
},
{
"threat": "geographic_pattern",
"mitigation": "Review and update geofencing rules",
"priority": "medium"
}
],
"risk_level": "medium",
"critical_ips": ["x.x.x.x", "y.y.y.y"]
},
"metrics": {
"events_processed": 50,
"processing_time": "25s",
"token_cost": "$0.10",
"batch_number": "1/4",
"agent_orchestration_time": "1.2s"
}
}
```

## Limitations

### Technical Limitations
- Batch processing only (no real-time)
- Fixed log format (JSONL only)
- Support for specific event types:
- Port scan
- Brute force
- SQL injection
- DDoS
- Protocol abuse
- Geographic pattern
- Unusual traffic
- Basic output structure
- No persistent storage
- No authentication/authorization

### Development Dependencies
- CoreSight framework required (private distribution)
- Temporary development abstractions and utilities
- Development scaffolding will be replaced in production
- Proof of concept patterns require production implementation

## Development Timeline

- Week 1: Core implementation
- Day 1: CoreSight framework setup and configuration
- Day 2: Agent development and orchestration setup
- Day 3-4: LLM integration and prompt engineering
- Day 5: Output formatting and testing
- Week 2: Testing and refinement
- Day 1-2: Performance testing and optimization
- Day 3-4: Cost analysis and batch size optimization
- Day 5: Documentation and final adjustments

## Security Note

This POC processes sensitive security logs. Ensure proper data handling procedures are followed.

## Framework Credits

CoreSight micro-framework created by Petru Arakiss ([email protected]). Used under private license.

## Copyright

---

**Note**: This is a proof of concept implementation. Not intended for production use.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arakiss/cortex-poc

Awesome Lists containing this project

README