An open API service indexing awesome lists of open source software.

https://github.com/areebahmeddd/cognito.ai

πŸ”Ž Natural Language Interface for Digital Forensic Evidence
https://github.com/areebahmeddd/cognito.ai

agentic-ai digital-forensics elasticsearch fastapi langchain neo4j t3-stack ufdr-tool

Last synced: about 1 month ago
JSON representation

πŸ”Ž Natural Language Interface for Digital Forensic Evidence

Awesome Lists containing this project

README

          


Project Logo

# 🧠 Project Description

**cognito.ai** is a natural‑language forensic evidence discovery engine for UFDR (Universal Forensic Extraction Device Report) data. It ingests UFDR exports, normalizes heterogeneous schemas with deterministic IDs, and indexes into Elasticsearch search engine using category‑aware mappings. [ [Project Demo](https://www.youtube.com/watch?v=nPmozZFyn9Q) | [Project Abstract](https://docs.google.com/document/d/1MFKM0IF8x_RIVlebnfSWHOk4HerEkuZhx3HQc1r5wIs/edit?usp=sharing) | [Project PPT](https://docs.google.com/presentation/d/1n7_xvl8xx3r6QOR7TCP-oiH9NknoiszoKVyfP6TVlBM/edit?usp=sharing) ]

**Built for** [Smart India Hackathon - 2025](https://sih.gov.in)

### Key Features

- **Real-time Search**: Elasticsearch with category-aware mappings and faceted filters.
- **UFDR Ingestion**: normalized schemas and deterministic IDs across heterogeneous exports.
- **Natural-Language Querying**: Gemini-powered NLQ translated to Elasticsearch DSL.
- **Secure Persistence**: MongoDB storage with JWT-based authentication.
- **Visual Analytics**: timeline and network views, case drill-downs, exportable reports.
- **Dev-Friendly Setup**: Docker Compose for ES + Mongo; FastAPI + Next.js local dev.

**[πŸ“„ Sample AI-Generated Report](https://trycognito-ai.vercel.app/sample_report.pdf)** - See an example of our platform's comprehensive forensic analysis output.

## πŸ—‚οΈ Project Structure

```
.
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ app/
β”‚ β”‚ β”œβ”€β”€ core/ # Settings, DB/ES bootstrap
β”‚ β”‚ β”œβ”€β”€ models/ # Pydantic models (UFDRDocument)
β”‚ β”‚ β”œβ”€β”€ routes/ # FastAPI routes
β”‚ β”‚ β”œβ”€β”€ services/ # Elasticsearch, parser, classifier, AI intent planner, auth
β”‚ β”‚ β”œβ”€β”€ utils/ # Helpers
β”‚ β”‚ └── main.py # FastAPI app entrypoint
β”‚ β”œβ”€β”€ data/ # Sample data (e.g., ufdr.jsonl)
β”‚ β”œβ”€β”€ pyproject.toml # Python deps
β”‚ └── uv.lock # Locked dependency resolution
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ app/ # Next.js app routes/pages
β”‚ β”œβ”€β”€ components/ # UI & dashboard components
β”‚ β”œβ”€β”€ hooks/, lib/, styles/ # Frontend utilities
β”‚ └── package.json # Web deps
β”œβ”€β”€ scripts/ # Utilities (reindex, test API)
β”œβ”€β”€ docker/ # Docker configuration files
β”œβ”€β”€ docs/ # Misc docs
└── docker-compose.yaml # Local Elasticsearch
```

## πŸ—οΈ Project Design


System Architecture


System Architecture


Sequence Diagram


Sequence Diagram

## 🎯 Project Milestones

### Completed:

- [x] Add Elasticsearch indexing support for UFDR documents ([@areeb](https://github.com/areebahmeddd))
- [x] Define schema support in Elasticsearch for multiple file data types ([@areeb](https://github.com/areebahmeddd))
- [x] Build Gemini-powered NLQ layer to dynamically generate Elasticsearch DSL queries ([@areeb](https://github.com/areebahmeddd))
- [x] Integrate MongoDB ([@shivansh](https://github.com/SpaceTesla))
- [x] Implement JWT-based authentication system ([@shivansh](https://github.com/SpaceTesla))
- [x] Develop pipeline to extract TSV and convert to JSON from UFDR files ([@hamad](https://github.com/therealhamad))
- [x] Add Neo4j visualization support ([@avantika](https://github.com/avii09))
- [x] Enable report export generation ([@bhavana](https://github.com/bhaaaav))
- [x] Design and implement web UI ([@areeb](https://github.com/areebahmeddd), [@shivansh](https://github.com/SpaceTesla))
- [x] Set up CI/CD pipeline for DigitalOcean + Cloudflare Pages deployment ([@areeb](https://github.com/areebahmeddd))

### In Progress:

- [ ] Upgrade NLQ layer with Mixtral NeMo (12B SLM) support ([@anish](https://github.com/Av7danger))
- [ ] Upgrade ETL pipeline to better sync with multiple services ([@areeb](https://github.com/areebahmeddd))
- [ ] Develop ETL pipeline for additional file types ([@hamad](https://github.com/therealhamad))
- [ ] Write Pytest tests ([@avantika](https://github.com/avii09))
- [ ] Write Cypress tests ([@shivansh](https://github.com/SpaceTesla))
- [ ] Configure NGINX ([@areeb](https://github.com/areebahmeddd))
- [ ] Set up CI workflow to generate dynamic docs on merges to `testing` branch ([@bhavana](https://github.com/bhaaaav))
- [ ] Add Redis caching for search results ([@avantika](https://github.com/avii09))

## πŸ–ΌοΈ Project Preview


Landing Page


Landing Page


Home Page


Home Page


Your Cases Page


Your Cases Page


Create Modal


Create Case Modal


Artifact Modal


UFDR Artifact Inspector


Results Page


Search Results Page


Timeline Page


Timeline Analysis Page


Nework Page


Network Correlation Page


Summary Page


Case Summary Page

## βš™οΈ Setup for Development

1. Clone the repo:

```bash
git clone https://github.com/areebahmeddd/cognito.ai.git
cd cognito.ai
```

2. Start Elasticsearch (single-node) via Docker Compose:

```bash
docker compose up -d
```

3. Configure environment variables:

Create a `.env` file in `backend/`:

```
ELASTICSEARCH_URL=http://localhost:9200
ELASTICSEARCH_INDEX=cognito
JWT_SECRET_KEY=
JWT_ALGORITHM=
JWT_EXPIRE_MINUTES=
MONGODB_CONNECTION_STRING=mongodb://localhost:27017/cognito
GEMINI_API_KEY=
```

Create a `.env` file in `frontend/`:

```
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=
NEXT_PUBLIC_API_URL=http://127.0.0.1:8000/api/v1
```

## πŸ–₯️ Backend (FastAPI)

Install and run API:

```bash
cd backend
uv sync
uv run app.main:app --host 0.0.0.0 --port 8000 --reload
```

Swagger UI: `http://localhost:8000/docs`

## 🌐 Frontend (Next.js)

```bash
cd frontend
npm clean-install
npm run dev
```

## 🧰 Scripts

### 1. Nuke Infra (Fresh Start)

Wipes Elasticsearch index (and wildcard) and then MongoDB database.

- Uses env vars with these defaults:
- `ELASTICSEARCH_URL=http://localhost:9200`
- `ELASTICSEARCH_INDEX=cognito`
- `MONGODB_CONNECTION_STRING=mongodb://localhost:27017/cognito`

```bash
python scripts/nuke_infra.py
```

### 2. Generate Mock UFDR ZIPs (for testing)

Creates synthetic UFDR-like TSV bundles as ZIPs at the project root: `Test_UFDR-1.zip`, `Test_UFDR-2.zip`, `Test_UFDR-3.zip`.

```bash
python scripts/mock_zip.py
```

## πŸ“œ License

This project is licensed under the [MIT License](LICENSE).

## πŸ‘₯ Authors

- [Areeb Ahmed](https://github.com/areebahmeddd)
- [Hamad Hussain](https://github.com/therealhamad)
- [Shivansh Karan](https://github.com/SpaceTesla)
- [Anish Varma](https://github.com/Av7danger)
- [Avantika Kesarwani](https://github.com/avii09)
- [Bhavana Subramani](https://github.com/bhaaaav)