https://github.com/areebahmeddd/cognito.ai
π Natural Language Interface for Digital Forensic Evidence
https://github.com/areebahmeddd/cognito.ai
agentic-ai digital-forensics elasticsearch fastapi langchain neo4j t3-stack ufdr-tool
Last synced: about 1 month ago
JSON representation
π Natural Language Interface for Digital Forensic Evidence
- Host: GitHub
- URL: https://github.com/areebahmeddd/cognito.ai
- Owner: areebahmeddd
- License: mit
- Created: 2025-09-17T21:29:43.000Z (9 months ago)
- Default Branch: testing
- Last Pushed: 2025-12-01T12:42:32.000Z (6 months ago)
- Last Synced: 2025-12-04T01:15:05.157Z (6 months ago)
- Topics: agentic-ai, digital-forensics, elasticsearch, fastapi, langchain, neo4j, t3-stack, ufdr-tool
- Language: TypeScript
- Homepage: https://trycognito-ai.vercel.app
- Size: 5.27 MB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: docs/CODE_OF_CONDUCT.md
- Security: docs/SECURITY.md
Awesome Lists containing this project
README
# π§ Project Description
**cognito.ai** is a naturalβlanguage forensic evidence discovery engine for UFDR (Universal Forensic Extraction Device Report) data. It ingests UFDR exports, normalizes heterogeneous schemas with deterministic IDs, and indexes into Elasticsearch search engine using categoryβaware mappings. [ [Project Demo](https://www.youtube.com/watch?v=nPmozZFyn9Q) | [Project Abstract](https://docs.google.com/document/d/1MFKM0IF8x_RIVlebnfSWHOk4HerEkuZhx3HQc1r5wIs/edit?usp=sharing) | [Project PPT](https://docs.google.com/presentation/d/1n7_xvl8xx3r6QOR7TCP-oiH9NknoiszoKVyfP6TVlBM/edit?usp=sharing) ]
**Built for** [Smart India Hackathon - 2025](https://sih.gov.in)
### Key Features
- **Real-time Search**: Elasticsearch with category-aware mappings and faceted filters.
- **UFDR Ingestion**: normalized schemas and deterministic IDs across heterogeneous exports.
- **Natural-Language Querying**: Gemini-powered NLQ translated to Elasticsearch DSL.
- **Secure Persistence**: MongoDB storage with JWT-based authentication.
- **Visual Analytics**: timeline and network views, case drill-downs, exportable reports.
- **Dev-Friendly Setup**: Docker Compose for ES + Mongo; FastAPI + Next.js local dev.
**[π Sample AI-Generated Report](https://trycognito-ai.vercel.app/sample_report.pdf)** - See an example of our platform's comprehensive forensic analysis output.
## ποΈ Project Structure
```
.
βββ backend/
β βββ app/
β β βββ core/ # Settings, DB/ES bootstrap
β β βββ models/ # Pydantic models (UFDRDocument)
β β βββ routes/ # FastAPI routes
β β βββ services/ # Elasticsearch, parser, classifier, AI intent planner, auth
β β βββ utils/ # Helpers
β β βββ main.py # FastAPI app entrypoint
β βββ data/ # Sample data (e.g., ufdr.jsonl)
β βββ pyproject.toml # Python deps
β βββ uv.lock # Locked dependency resolution
βββ frontend/
β βββ app/ # Next.js app routes/pages
β βββ components/ # UI & dashboard components
β βββ hooks/, lib/, styles/ # Frontend utilities
β βββ package.json # Web deps
βββ scripts/ # Utilities (reindex, test API)
βββ docker/ # Docker configuration files
βββ docs/ # Misc docs
βββ docker-compose.yaml # Local Elasticsearch
```
## ποΈ Project Design
System Architecture
Sequence Diagram
## π― Project Milestones
### Completed:
- [x] Add Elasticsearch indexing support for UFDR documents ([@areeb](https://github.com/areebahmeddd))
- [x] Define schema support in Elasticsearch for multiple file data types ([@areeb](https://github.com/areebahmeddd))
- [x] Build Gemini-powered NLQ layer to dynamically generate Elasticsearch DSL queries ([@areeb](https://github.com/areebahmeddd))
- [x] Integrate MongoDB ([@shivansh](https://github.com/SpaceTesla))
- [x] Implement JWT-based authentication system ([@shivansh](https://github.com/SpaceTesla))
- [x] Develop pipeline to extract TSV and convert to JSON from UFDR files ([@hamad](https://github.com/therealhamad))
- [x] Add Neo4j visualization support ([@avantika](https://github.com/avii09))
- [x] Enable report export generation ([@bhavana](https://github.com/bhaaaav))
- [x] Design and implement web UI ([@areeb](https://github.com/areebahmeddd), [@shivansh](https://github.com/SpaceTesla))
- [x] Set up CI/CD pipeline for DigitalOcean + Cloudflare Pages deployment ([@areeb](https://github.com/areebahmeddd))
### In Progress:
- [ ] Upgrade NLQ layer with Mixtral NeMo (12B SLM) support ([@anish](https://github.com/Av7danger))
- [ ] Upgrade ETL pipeline to better sync with multiple services ([@areeb](https://github.com/areebahmeddd))
- [ ] Develop ETL pipeline for additional file types ([@hamad](https://github.com/therealhamad))
- [ ] Write Pytest tests ([@avantika](https://github.com/avii09))
- [ ] Write Cypress tests ([@shivansh](https://github.com/SpaceTesla))
- [ ] Configure NGINX ([@areeb](https://github.com/areebahmeddd))
- [ ] Set up CI workflow to generate dynamic docs on merges to `testing` branch ([@bhavana](https://github.com/bhaaaav))
- [ ] Add Redis caching for search results ([@avantika](https://github.com/avii09))
## πΌοΈ Project Preview
Landing Page
Home Page
Your Cases Page
Create Case Modal
UFDR Artifact Inspector
Search Results Page
Timeline Analysis Page
Network Correlation Page
Case Summary Page
## βοΈ Setup for Development
1. Clone the repo:
```bash
git clone https://github.com/areebahmeddd/cognito.ai.git
cd cognito.ai
```
2. Start Elasticsearch (single-node) via Docker Compose:
```bash
docker compose up -d
```
3. Configure environment variables:
Create a `.env` file in `backend/`:
```
ELASTICSEARCH_URL=http://localhost:9200
ELASTICSEARCH_INDEX=cognito
JWT_SECRET_KEY=
JWT_ALGORITHM=
JWT_EXPIRE_MINUTES=
MONGODB_CONNECTION_STRING=mongodb://localhost:27017/cognito
GEMINI_API_KEY=
```
Create a `.env` file in `frontend/`:
```
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=
NEXT_PUBLIC_API_URL=http://127.0.0.1:8000/api/v1
```
## π₯οΈ Backend (FastAPI)
Install and run API:
```bash
cd backend
uv sync
uv run app.main:app --host 0.0.0.0 --port 8000 --reload
```
Swagger UI: `http://localhost:8000/docs`
## π Frontend (Next.js)
```bash
cd frontend
npm clean-install
npm run dev
```
## π§° Scripts
### 1. Nuke Infra (Fresh Start)
Wipes Elasticsearch index (and wildcard) and then MongoDB database.
- Uses env vars with these defaults:
- `ELASTICSEARCH_URL=http://localhost:9200`
- `ELASTICSEARCH_INDEX=cognito`
- `MONGODB_CONNECTION_STRING=mongodb://localhost:27017/cognito`
```bash
python scripts/nuke_infra.py
```
### 2. Generate Mock UFDR ZIPs (for testing)
Creates synthetic UFDR-like TSV bundles as ZIPs at the project root: `Test_UFDR-1.zip`, `Test_UFDR-2.zip`, `Test_UFDR-3.zip`.
```bash
python scripts/mock_zip.py
```
## π License
This project is licensed under the [MIT License](LICENSE).
## π₯ Authors
- [Areeb Ahmed](https://github.com/areebahmeddd)
- [Hamad Hussain](https://github.com/therealhamad)
- [Shivansh Karan](https://github.com/SpaceTesla)
- [Anish Varma](https://github.com/Av7danger)
- [Avantika Kesarwani](https://github.com/avii09)
- [Bhavana Subramani](https://github.com/bhaaaav)