{"id":50884179,"url":"https://github.com/8infinitecloud/aws-ug-databases-for-ia","last_synced_at":"2026-06-15T15:30:37.402Z","repository":{"id":360125951,"uuid":"1248357108","full_name":"8infinitecloud/aws-ug-databases-for-ia","owner":"8infinitecloud","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-25T06:28:20.000Z","size":11886,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-25T06:41:54.463Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/8infinitecloud.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-24T14:37:41.000Z","updated_at":"2026-05-25T06:28:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/8infinitecloud/aws-ug-databases-for-ia","commit_stats":null,"previous_names":["8infinitecloud/aws-ug-databases-for-ia"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/8infinitecloud/aws-ug-databases-for-ia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/8infinitecloud%2Faws-ug-databases-for-ia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/8infinitecloud%2Faws-ug-databases-for-ia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/8infinitecloud%2Faws-ug-databases-for-ia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/8infinitecloud%2Faws-ug-databases-for-ia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/8infinitecloud","download_url":"https://codeload.github.com/8infinitecloud/aws-ug-databases-for-ia/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/8infinitecloud%2Faws-ug-databases-for-ia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34369836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-15T15:30:36.224Z","updated_at":"2026-06-15T15:30:37.391Z","avatar_url":"https://github.com/8infinitecloud.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Financial Compliance Assistant — RAG + Memory Lab\n\n\u003e **AWS Community Day Lab** | Embeddings, RAG y Memoria en Sistemas con LLM  \n\u003e Stack: Amazon Bedrock · Titan Embeddings · Aurora pgvector · DynamoDB · ElastiCache Redis · LangChain · Streamlit\n\n## Caso de Estudio\n\nUna empresa de servicios financieros necesita un asistente inteligente que responda preguntas sobre sus procesos internos de **compliance**, **onboarding de clientes** y **políticas internas**. El asistente debe recordar preferencias del usuario entre sesiones y aprender de conversaciones pasadas.\n\nEste laboratorio muestra visualmente las tres capas de memoria y el pipeline RAG completo en tiempo real.\n\n---\n\n## Arquitectura\n\n```\n┌─────────────────────────────────────────────────────────────────────────┐\n│                         USUARIO (Streamlit UI)                          │\n└──────────────────────────────────┬──────────────────────────────────────┘\n                                   │ pregunta\n                    ┌──────────────▼──────────────┐\n                    │      LangChain RAG Chain     │\n                    │  (query/chain.py)            │\n                    └──┬──────────┬──────────┬────┘\n                       │          │          │\n          ┌────────────▼──┐  ┌────▼─────┐  ┌▼───────────────┐\n          │  CAPA 1:      │  │  CAPA 2: │  │  CAPA 3:       │\n          │  Memoria de   │  │  Memoria │  │  Memoria       │\n          │  Sesión       │  │  Usuario │  │  Semántica     │\n          │               │  │          │  │                │\n          │ ElastiCache   │  │ DynamoDB │  │ Aurora         │\n          │ Redis         │  │          │  │ pgvector       │\n          │ (últimos N    │  │(prefs,   │  │(embeddings de  │\n          │  mensajes)    │  │ historial│  │ conversaciones │\n          │               │  │ resumen) │  │ pasadas)       │\n          └───────────────┘  └──────────┘  └────────────────┘\n                                   │\n                    ┌──────────────▼──────────────┐\n                    │     VECTOR STORE (RAG)       │\n                    │   Aurora PostgreSQL           │\n                    │   + pgvector extension       │\n                    │                              │\n                    │  Similarity Search →         │\n                    │  Top-K chunks + scores       │\n                    └──────────────┬───────────────┘\n                                   │\n                    ┌──────────────▼──────────────┐\n                    │      AMAZON BEDROCK          │\n                    │                             │\n                    │  Titan Embeddings V2        │\n                    │  (embed query + docs)       │\n                    │                             │\n                    │  Claude Sonnet 3.5          │\n                    │  (generación final)         │\n                    └─────────────────────────────┘\n```\n\n### Flujo de Ingesta (una sola vez, pre-demo)\n\n```\nS3 (PDFs/Markdown)\n    │\n    ▼\nDocumentLoader (LangChain)\n    │\n    ▼\nRecursiveCharacterTextSplitter\n  chunk_size=1000, overlap=200\n    │\n    ▼\nAmazon Titan Embeddings V2\n  (1024 dimensiones)\n    │\n    ▼\nAurora pgvector\n  (tabla: document_chunks)\n```\n\n---\n\n## Servicios AWS Utilizados\n\n| Servicio | Rol | Documentación |\n|----------|-----|---------------|\n| [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/) | LLM (Claude) + Embeddings (Titan) | [Bedrock Docs](https://docs.aws.amazon.com/bedrock/latest/userguide/) |\n| [Aurora PostgreSQL Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html) | Vector store con pgvector | [pgvector extension](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/PostgreSQL_pg_vector.html) |\n| [Amazon ElastiCache for Redis](https://docs.aws.amazon.com/elasticache/) | Memoria de sesión (TTL corto) | [ElastiCache Docs](https://docs.aws.amazon.com/elasticache/latest/red-ug/) |\n| [Amazon DynamoDB](https://docs.aws.amazon.com/dynamodb/) | Memoria persistente de usuario | [DynamoDB Docs](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/) |\n| [Amazon S3](https://docs.aws.amazon.com/s3/) | Almacenamiento de documentos fuente | [S3 Docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/) |\n| [AWS IAM](https://docs.aws.amazon.com/iam/) | Roles y permisos | [IAM Docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/) |\n\n---\n\n## Prerrequisitos\n\n- Cuenta AWS con acceso a `us-east-1` (o `us-west-2`)\n- [AWS CLI v2](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) configurado (`aws configure`)\n- Python 3.11+\n- Acceso habilitado a modelos en Amazon Bedrock:\n  - `amazon.titan-embed-text-v2:0`\n  - `anthropic.claude-3-5-sonnet-20241022-v2:0`\n- Permisos IAM: `AmazonBedrockFullAccess`, `AmazonDynamoDBFullAccess`, `AmazonElastiCacheFullAccess`, `AmazonRDSFullAccess`, `AmazonS3FullAccess`\n\n### Habilitar modelos en Bedrock\n\n```bash\n# Verificar acceso a modelos (us-east-1)\naws bedrock list-foundation-models \\\n  --by-provider amazon \\\n  --query \"modelSummaries[?modelId=='amazon.titan-embed-text-v2:0']\" \\\n  --region us-east-1\n```\n\nSi no tienes acceso, ve a **Amazon Bedrock Console → Model access → Request access**.\n\n---\n\n## Deployment\n\nHay dos formas de desplegar el lab: **scripts locales** (recomendado para la primera vez) o **GitHub Actions** (CI/CD automatizado).\n\n---\n\n### Opción A — Scripts locales\n\n#### Paso 1: Clonar el repositorio\n\n```bash\ngit clone https://github.com/aws-samples/financial-compliance-rag-lab\ncd financial-compliance-rag-lab\n```\n\n#### Paso 2: Crear el stack de infraestructura (~20 min)\n\n```bash\ncd infrastructure\nchmod +x setup.sh\n./setup.sh\n```\n\nEl script crea: VPC, Aurora PostgreSQL Serverless v2 con `pgvector`, ElastiCache Redis, DynamoDB, S3, EC2 App Server y Secrets Manager.\n\n\u003e **Nota para el lab:** Ejecuta este paso 20 minutos antes de la demo en vivo.\n\n#### Paso 3: Desplegar la aplicación (~5 min)\n\n```bash\nchmod +x deploy-app.sh\n./deploy-app.sh\n\n# Para re-ingestar documentos desde cero:\n./deploy-app.sh --reingest\n```\n\nEl script empaqueta el código, lo sube a S3 y lo ejecuta en el EC2 vía SSM Run Command. Al terminar, imprime la URL de Streamlit.\n\n#### Paso 4 (opcional): Desarrollo local\n\nPara iterar en el código localmente sin desplegar en EC2:\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate   # Windows: .venv\\Scripts\\activate\npip install -r requirements.txt\ncp .env.example .env        # completar con outputs del stack\npython scripts/init_db.py\npython scripts/ingest_documents.py --source data/\nstreamlit run app/app.py    # http://localhost:8501\n```\n\n---\n\n### Opción B — GitHub Actions (CI/CD)\n\nPrerequisito: seguir los pasos de [`docs/github-actions-setup.md`](docs/github-actions-setup.md) para crear el OIDC Provider y el IAM Role en AWS. Solo se hace una vez por cuenta.\n\n#### Workflows disponibles\n\n| Workflow | Trigger | Qué hace |\n|----------|---------|----------|\n| **CI — Validate** | Push / PR a `main` | Python lint, shellcheck, cfn-lint, verifica corpus |\n| **Deploy Infrastructure** | Manual | CloudFormation deploy → dispara Deploy App al terminar |\n| **Deploy App** | Push a `main` (cambios en `app/`, `data/`, etc.) | Empaqueta código → S3 → SSM Run Command en EC2 |\n| **Cleanup** | Manual (requiere escribir `ELIMINAR`) | Vacía S3, elimina DynamoDB y el stack completo |\n\n#### Flujo completo\n\n```\nPR abierto\n    └─► CI — Validate (automático)\n\nPush a main (cambios en app/ o data/)\n    └─► Deploy App (automático, ~5 min)\n\nPrimera vez o cambio de infra\n    └─► Deploy Infrastructure (manual, ~20 min)\n            └─► Deploy App (automático al terminar)\n\nAl terminar el lab\n    └─► Cleanup (manual, escribir \"ELIMINAR\" para confirmar)\n```\n\n---\n\n## Estructura del Repositorio\n\n```\nfinancial-compliance-rag-lab/\n├── README.md                    # Este archivo\n├── CONTRIBUTING.md              # Guía de contribución\n├── DECISIONS.md                 # Architecture Decision Records (ADRs)\n├── COST_ESTIMATE.md             # Estimación de costos\n├── LICENSE                      # Apache 2.0\n├── .gitignore\n├── requirements.txt             # Dependencias principales\n├── .env.example                 # Template de variables de entorno\n│\n├── .github/workflows/           # GitHub Actions CI/CD\n│   ├── ci.yml                  # Validación en PRs y push a main\n│   ├── deploy-infra.yml        # Deploy de infraestructura (manual)\n│   ├── deploy-app.yml          # Deploy de app (automático en push)\n│   └── cleanup.yml             # Eliminar recursos AWS (manual)\n│\n├── data/                        # Documentos de ejemplo (corpus)\n│   ├── compliance/\n│   │   ├── aml_policy.md        # Política Anti-Lavado de Dinero\n│   │   └── kyc_procedures.md   # Procedimientos KYC\n│   ├── onboarding/\n│   │   ├── client_onboarding.md\n│   │   └── employee_onboarding.md\n│   └── policies/\n│       ├── data_privacy.md\n│       ├── risk_management.md\n│       └── acceptable_use.md\n│\n├── ingestion/                   # Pipeline de ingesta\n│   ├── config.py               # Configuración centralizada\n│   ├── loader.py               # Carga de documentos\n│   ├── chunker.py              # Estrategia de chunking\n│   ├── embedder.py             # Embeddings con Titan V2\n│   ├── store.py                # Escritura en pgvector\n│   └── pipeline.py             # Orquestación completa\n│\n├── query/                       # Pipeline de consulta (RAG)\n│   ├── retriever.py            # Búsqueda por similitud coseno\n│   ├── memory.py               # Gestión de las 3 capas de memoria\n│   ├── prompt_builder.py       # Construcción del prompt final\n│   └── chain.py                # Orquestador RAG completo\n│\n├── app/\n│   └── app.py                  # Interfaz Streamlit (3 columnas)\n│\n├── infrastructure/              # Infraestructura AWS\n│   ├── README.md\n│   ├── cloudformation.yaml     # Stack completo (VPC, Aurora, Redis, EC2...)\n│   ├── setup.sh                # Provisiona infraestructura (~20 min)\n│   ├── deploy-app.sh           # Despliega solo la app (~5 min)\n│   └── cleanup.sh              # Elimina todos los recursos\n│\n├── scripts/                     # Scripts de utilidad\n│   ├── init_db.py              # Inicializar schema pgvector\n│   └── ingest_documents.py     # Ingesta de documentos al vector store\n│\n└── docs/\n    └── github-actions-setup.md # Setup OIDC + IAM Role para CI/CD\n```\n\n---\n\n## Cleanup — Evitar Costos\n\n\u003e **IMPORTANTE:** Siempre elimina los recursos después del lab. Aurora + ElastiCache cuestan ~$15-25 USD/día si se dejan corriendo.\n\n**Script local:**\n```bash\ncd infrastructure \u0026\u0026 ./cleanup.sh\n```\n\n**GitHub Actions:** ve a `Actions → Cleanup → Run workflow` y escribe `ELIMINAR` en el campo de confirmación.\n\nAmbas opciones eliminan: CloudFormation stack (VPC, Aurora, ElastiCache, EC2), bucket S3, tabla DynamoDB y secrets.\n\nVer estimación de costos detallada en [COST_ESTIMATE.md](COST_ESTIMATE.md).\n\n---\n\n## Preguntas de Ejemplo para el Demo\n\n```\n\"¿Cuáles son los pasos del proceso KYC para clientes corporativos?\"\n\"¿Qué documentos necesito para el onboarding de un cliente de alto riesgo?\"\n\"¿Cuál es la política de retención de datos para registros de transacciones?\"\n\"¿Cuándo es obligatorio hacer un Reporte de Actividad Sospechosa (SAR)?\"\n\"¿Qué entrenamiento de compliance deben completar los empleados nuevos?\"\n```\n\n---\n\n## Contribuir\n\nVer [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## Seguridad\n\nVer [CONTRIBUTING.md#security-issue-notifications](CONTRIBUTING.md#security-issue-notifications).\n\n## Licencia\n\nEste proyecto está licenciado bajo Apache-2.0. Ver [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F8infinitecloud%2Faws-ug-databases-for-ia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F8infinitecloud%2Faws-ug-databases-for-ia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F8infinitecloud%2Faws-ug-databases-for-ia/lists"}