An open API service indexing awesome lists of open source software.

https://github.com/lostluffyz/irtiqa-intelligence

AI-powered lead intelligence platform for technographic analysis, intent detection, and personalized outreach generation.
https://github.com/lostluffyz/irtiqa-intelligence

ai automation b2b fastapi lead-intelligence python sales-intelligence technographics

Last synced: about 5 hours ago
JSON representation

AI-powered lead intelligence platform for technographic analysis, intent detection, and personalized outreach generation.

Awesome Lists containing this project

README

          

# Irtiqa Intelligence

[![CI](https://github.com/Luffyz/irtiqa-intelligence/actions/workflows/ci.yml/badge.svg)](https://github.com/Luffyz/irtiqa-intelligence/actions/workflows/ci.yml)
[![Tests](https://img.shields.io/badge/tests-633%20passing-success)](https://github.com/Luffyz/irtiqa-intelligence)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue)](https://www.python.org/)
[![FastAPI](https://img.shields.io/badge/FastAPI-0.115%2B-009688)](https://fastapi.tiangolo.com/)

**Production-grade lead intelligence platform for B2B sales teams.**

Irtiqa Intelligence automates lead discovery, enrichment, and prioritization through intelligent web scraping, technographic analysis, intent signal detection, and personalized outreach generation. Built with FastAPI, SQLAlchemy, and a modular agent-based architecture.

---

## What is Irtiqa?

Irtiqa transforms raw company data into actionable sales intelligence through a **multi-agent pipeline**:

1. **Discover** companies matching your ideal customer profile (ICP)
2. **Scrape** and analyze company websites for technology signals
3. **Detect** buying intent from hiring, funding, and technology changes
4. **Score** leads using multi-factor intelligence scoring
5. **Personalize** outreach messages based on company intelligence

**Result:** Prioritized, scored leads with context-aware outreach recommendations.

---

## Architecture

```mermaid
graph TD
subgraph Client["Client Layer"]
UI[Web UI / API Client]
end

subgraph API["API Layer (FastAPI)"]
Routes[REST Endpoints
70+ endpoints]
Auth[JWT Authentication
Multi-Tenancy]
end

subgraph Service["Service Layer"]
Services[Business Logic
Transaction Boundaries]
end

subgraph Intelligence["Intelligence Layer"]
Workflows[3 Workflows
Orchestration]
Agents[6 Agents
Specialized Intelligence]
end

subgraph Data["Data Layer"]
Repos[Repositories
Tenant Filtering]
ORM[SQLAlchemy ORM
19 Tables]
end

subgraph Storage["Storage Layer"]
SQLite[(SQLite Dev)]
Postgres[(PostgreSQL Prod)]
end

subgraph Jobs["Background Jobs"]
Scheduler[Job Scheduler]
Runner[Job Runner]
end

UI -->|HTTP + JWT| Routes
Routes -->|Verify & Inject| Auth
Auth -->|Call Methods| Services
Services -->|Orchestrate| Workflows
Services -->|Execute| Agents
Workflows -->|Chain| Agents
Agents -->|Query/Persist| Repos
Services -->|Query/Persist| Repos
Repos -->|Map Entities| ORM
ORM -->|Connect| SQLite
ORM -->|Connect| Postgres

Services -.->|Schedule Async| Scheduler
Scheduler -->|Poll & Execute| Runner
Runner -->|Invoke| Workflows
Runner -->|Invoke| Agents

style API fill:#e1f5ff
style Service fill:#fff4e1
style Intelligence fill:#ffe1e1
style Data fill:#f0e1ff
style Storage fill:#e1ffe1
style Jobs fill:#ffe1f5
```

**[๐Ÿ“– Complete Architecture Guide](docs/architecture_overview.md)**

---

## Features

### ๐Ÿ” Authentication & Multi-Tenancy
- **RS256 JWT Authentication** with JWKS endpoint for secure API access
- **Email Verification** and password reset workflows
- **Organization Management** with role-based access control (Owner, Admin, Member, Viewer)
- **Tenant Isolation** across all data and API endpoints
- **Rate Limiting** with database-backed tracking

**[๐Ÿ“– Authentication Design](docs/authentication_multitenancy_v2_design.md)**

### ๐Ÿ” Lead Discovery Engine

```mermaid
flowchart LR
ICP[ICP Search
Industry + Size + Tech] --> Discovery[Discovery Agent]

Discovery -->|Query| EDGAR[SEC EDGAR
US Companies]
Discovery -->|Query| News[Google News RSS
Funding Signals]
Discovery -->|Query| OC[OpenCorporates
Global Registry]

EDGAR --> Dedupe[Deduplication
Domain Matching]
News --> Dedupe
OC --> Dedupe

Dedupe --> Score[Discovery Score
0.0-1.0]
Score --> Companies[(Companies
needs_review)]

Companies -.->|Manual Trigger| Pipeline[Intelligence Pipeline]

style Discovery fill:#e1f5ff
style Dedupe fill:#fff4e1
style Score fill:#ffe1e1
style Companies fill:#e1ffe1
```

- **ICP Search Management**: Define and save ideal customer profile criteria
- **Multi-Source Discovery**: Automated searches across SEC EDGAR, Google News RSS, and OpenCorporates
- **Smart Deduplication**: Domain-based duplicate detection with fuzzy matching
- **Discovery Scoring**: Lightweight match quality scores (0.0-1.0) for prioritization
- **Evidence Provenance**: Full audit trail of discovery sources

**[๐Ÿ“– Discovery Engine Design](docs/lead_discovery_engine_final.md)**

### ๐Ÿค– Intelligence Pipeline

```mermaid
flowchart LR
Input[Company Domain] --> DS[Deep Scraper
Web Extraction]
DS -->|HTML + Text| Tech[Technographic
Tech Detection]
Tech -->|40+ Signatures| Intent[Intent Signal
Buying Signals]
Intent -->|8 Signal Types| Score[Intelligence Scoring
Multi-Factor]
Score -->|Weighted Score| Person[Personalization
Outreach Generation]
Person --> Output[Scored Lead
+ Messages]

style DS fill:#e1f5ff
style Tech fill:#fff4e1
style Intent fill:#ffe1e1
style Score fill:#f0e1ff
style Person fill:#e1ffe1
```

**6 Production Agents:**
1. **Deep Scraper Agent**: Web content extraction and parsing
2. **Technographic Agent**: Technology detection (40+ signatures across 8 categories)
3. **Intent Signal Agent**: Buying signal detection (8 signal families with deterministic rules)
4. **Intelligence Scoring Agent**: Multi-factor lead scoring (fit, intent, technographic, engagement)
5. **Personalization Agent**: Multi-variant outreach message generation
6. **Discovery Agent**: ICP-based company discovery from external sources

**[๐Ÿ“– Agent System](docs/agents.md)** | **[๐Ÿ“– Workflow System](docs/workflows.md)**

### ๐Ÿ“Š Lead Retrieval API
- **Aggregated Intelligence**: Single endpoint returns companies with technologies, intent signals, scores, and outreach messages
- **Smart Filtering**: Filter by minimum score, pagination support
- **Tenant-Scoped**: Automatic organization isolation
- **N+1 Prevention**: Batch loading strategy for optimal performance

### โš™๏ธ Background Job System
- **Async Execution**: Agent and workflow scheduling with status tracking
- **Retry Policies**: Exponential backoff with configurable limits
- **Job Management**: Schedule, cancel, retry, and monitor background tasks

**[๐Ÿ“– Background Jobs Design](docs/background_job_foundation_design.md)**

### ๐Ÿ“ˆ Evidence Records
- **Provenance Tracking**: Full audit trail for all intelligence data
- **Source Linking**: Evidence tied to agent runs, URLs, and API responses
- **Confidence Scoring**: Evidence quality metrics

### ๐Ÿ”„ Workflow Orchestration
- **Score Refresh**: Deterministic intelligence score recomputation
- **Intelligence Pipeline**: End-to-end enrichment (scrape โ†’ analyze โ†’ score โ†’ personalize)
- **Discovery Pipeline**: Company discovery orchestration (search โ†’ discover โ†’ deduplicate โ†’ create)

---

## Database Schema

```mermaid
erDiagram
organizations ||--o{ companies : owns
organizations ||--o{ contacts : owns
companies ||--o{ websites : has
companies ||--o{ technologies : uses
companies ||--o{ intent_signals : emits
companies ||--o{ intelligence_scores : receives
companies ||--o{ outreach_messages : targeted_by

agent_runs ||--o{ technologies : detects
agent_runs ||--o{ intent_signals : finds
agent_runs ||--o{ intelligence_scores : computes
agent_runs ||--o{ outreach_messages : generates
agent_runs ||--o{ evidence_records : produces

discovery_searches ||--o{ discovery_runs : executes
discovery_searches ||--o{ companies : discovers

jobs ||--o{ agent_runs : triggers
```

**19 Tables:**
- 8 core intelligence tables (companies, contacts, websites, technologies, intent_signals, intelligence_scores, outreach_messages, evidence_records)
- 2 system tables (agent_runs, jobs)
- 2 discovery tables (discovery_searches, discovery_runs)
- 3 auth tables (users, organizations, memberships)
- 4 token tables (refresh_tokens, email_verification_tokens, password_reset_tokens, failed_login_attempts)

**[๐Ÿ“– Database Design](docs/database.md)** | **[๐Ÿ“– Entity Relationships](docs/entity_relationships.md)**

---

## Project Structure

```
irtiqa-intelligence/
โ”œโ”€โ”€ app/
โ”‚ โ”œโ”€โ”€ agents/ # 6 production agents
โ”‚ โ”‚ โ”œโ”€โ”€ deep_scraper/ # Web scraping & content extraction
โ”‚ โ”‚ โ”œโ”€โ”€ technographic/ # Technology detection (40+ signatures)
โ”‚ โ”‚ โ”œโ”€โ”€ intent_signal/ # Buying signal detection (8 families)
โ”‚ โ”‚ โ”œโ”€โ”€ intelligence_scoring/ # Lead scoring
โ”‚ โ”‚ โ”œโ”€โ”€ personalization/ # Outreach generation
โ”‚ โ”‚ โ””โ”€โ”€ discovery/ # Company discovery (3 sources)
โ”‚ โ”œโ”€โ”€ api/ # REST API endpoints (70+)
โ”‚ โ”œโ”€โ”€ core/ # Configuration, logging, errors
โ”‚ โ”œโ”€โ”€ database/ # Engine, session management
โ”‚ โ”œโ”€โ”€ jobs/ # Background job system
โ”‚ โ”œโ”€โ”€ models/ # SQLAlchemy ORM models (19 tables)
โ”‚ โ”œโ”€โ”€ repositories/ # Data access layer (15 repositories)
โ”‚ โ”œโ”€โ”€ schemas/ # Pydantic request/response schemas
โ”‚ โ”œโ”€โ”€ services/ # Business logic layer (15 services)
โ”‚ โ””โ”€โ”€ workflows/ # Multi-agent orchestration (3 workflows)
โ”œโ”€โ”€ database/
โ”‚ โ””โ”€โ”€ migrations/ # Alembic migration scripts (8 revisions)
โ”œโ”€โ”€ docs/ # Architecture & design documentation
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ integration/ # End-to-end tests
โ”‚ โ””โ”€โ”€ unit/ # Component tests
โ””โ”€โ”€ README.md
```

---

## Tech Stack

| Layer | Technology | Purpose |
|-------|-----------|---------|
| **Framework** | FastAPI 0.115+ | Async web framework with OpenAPI |
| **ORM** | SQLAlchemy 2.0 | Database abstraction & query building |
| **Migrations** | Alembic 1.18+ | Schema versioning & evolution |
| **Validation** | Pydantic v2 | Request/response schemas |
| **Database (Dev)** | SQLite 3.x | Local development with WAL mode |
| **Database (Prod)** | PostgreSQL 18+ | Production-grade relational database |
| **HTTP Client** | httpx | Async HTTP for external API calls |
| **Parsing** | BeautifulSoup4, feedparser | HTML & RSS feed parsing |
| **Testing** | pytest, pytest-asyncio | Test framework with async support |
| **CI/CD** | GitHub Actions | Automated testing & validation |

---

## Testing

```mermaid
flowchart LR
Push[Git Push/PR] --> CI[GitHub Actions]

CI --> Validate[Validation]
CI --> Test[Testing]

Validate --> Ruff[Ruff Lint
Advisory]
Validate --> Mypy[Mypy Types
Advisory]
Validate --> Compile[compileall
BLOCKING]

Test --> Migrate[Alembic Upgrade
BLOCKING]
Test --> Drift[Schema Drift Check
BLOCKING]
Test --> SQLiteTests[SQLite Tests
606 tests
BLOCKING]
Test --> PGTests[PostgreSQL Tests
27 tests
BLOCKING]

Compile --> Result{All Pass?}
Migrate --> Result
Drift --> Result
SQLiteTests --> Result
PGTests --> Result

Result -->|Yes| Success[โœ“ CI Pass]
Result -->|No| Failure[โœ— CI Fail]

style Success fill:#e1ffe1
style Failure fill:#ffe1e1
```

**633 Tests** (606 SQLite, 27 PostgreSQL compatibility)
**100% Pass Rate** on main branch

**Test Coverage:**
- Unit tests for agents, services, schemas, workflows
- Integration tests for API endpoints, repositories, pipelines
- Database tests for migrations, constraints, transactions
- PostgreSQL compatibility verification

---

## Development

### Installation

```bash
# Clone repository
git clone https://github.com/Luffyz/irtiqa-intelligence.git
cd irtiqa-intelligence

# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows

# Install dependencies
pip install -e .[dev]

# For PostgreSQL support
pip install "psycopg[binary]>=3.2.0"

# Configure environment
cp .env.example .env
# Edit .env with your settings
```

### Run Migrations

```bash
# Apply database schema
python -m alembic upgrade head

# Check for schema drift
python -m alembic check
```

### Run Development Server

```bash
# Start FastAPI server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

**API Documentation:**
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

### Run Tests

```bash
# Execute full test suite
python -m pytest

# Run with coverage
python -m pytest --cov=app --cov-report=html

# Run specific test categories
python -m pytest tests/unit/
python -m pytest tests/integration/
```

---

## Documentation Map

### Getting Started
- **[README](README.md)** โ€” This file (overview, setup, quick start)
- **[Architecture Overview](docs/architecture_overview.md)** โ€” System layers, request lifecycle, patterns

### Core Systems
- **[Database Design](docs/database.md)** โ€” Schema, tables, constraints, migrations
- **[Entity Relationships](docs/entity_relationships.md)** โ€” FK relationships, cascade rules
- **[Agent System](docs/agents.md)** โ€” All 6 agents, responsibilities, lifecycles
- **[Workflow System](docs/workflows.md)** โ€” Workflow orchestration, implementations
- **[Background Jobs](docs/background_job_foundation_design.md)** โ€” Async execution, retry policies

### Features
- **[Discovery Engine](docs/lead_discovery_engine_final.md)** โ€” ICP search, external sources, deduplication
- **[Authentication](docs/authentication_multitenancy_v2_design.md)** โ€” JWT, multi-tenancy, RBAC
- **[Evidence System](docs/evidence_records_system_design.md)** โ€” Provenance tracking

### Specialized Documentation
- **[Agent Interface Design](docs/agent_interface_design.md)** โ€” BaseAgent pattern details
- **[Deep Scraper Design](docs/deep_scraper_design.md)** โ€” Web scraping architecture
- **[Technographic Agent Design](docs/technographic_agent_design.md)** โ€” Technology detection
- **[Intent Signal Agent Design](docs/intent_signal_agent_design.md)** โ€” Buying signal rules
- **[Personalization Agent Design](docs/personalization_agent_design.md)** โ€” Outreach generation

---

## Roadmap

### โœ… Current Status: Backend Complete

The backend is production-ready with all planned features implemented:
- โœ… Authentication & multi-tenancy
- โœ… Lead discovery engine
- โœ… Intelligence pipeline (6 agents)
- โœ… Workflow orchestration
- โœ… Background job system
- โœ… REST API (70+ endpoints)
- โœ… 633 automated tests

### ๐ŸŽฏ Next Milestones

**Phase 1: Frontend Development**
- React/Vue.js web application
- ICP search builder UI
- Discovery run monitoring dashboard
- Lead review & enrichment interface
- Intelligence score visualization

**Phase 2: Production Deployment**
- Docker containerization
- PostgreSQL database migration
- Kubernetes/cloud deployment manifests
- CI/CD pipeline for releases
- Monitoring & observability (Grafana, Prometheus)

**Phase 3: Advanced Features**
- Scheduled discovery runs (daily/weekly ICP searches)
- ML-based lead scoring models
- CRM integrations (Salesforce, HubSpot)
- Email automation & outreach tracking
- Advanced analytics & reporting

---

## Contributing

Contributions are welcome! Please follow these guidelines:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Ensure all tests pass (`python -m pytest`)
5. Check for schema drift (`python -m alembic check`)
6. Commit with descriptive messages
7. Push to your fork and submit a pull request

---

## License

This project is proprietary software. All rights reserved.

---

## Acknowledgments

Built with:
- [FastAPI](https://fastapi.tiangolo.com/) โ€” Modern Python web framework
- [SQLAlchemy](https://www.sqlalchemy.org/) โ€” Python SQL toolkit
- [Alembic](https://alembic.sqlalchemy.org/) โ€” Database migrations
- [Pydantic](https://docs.pydantic.dev/) โ€” Data validation
- [pytest](https://pytest.org/) โ€” Testing framework

---

**Production-Ready Backend ยท 633 Tests ยท 19 Database Tables ยท 6 Intelligence Agents ยท 70+ API Endpoints**