https://github.com/jonesrussell/north-cloud
A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.
https://github.com/jonesrussell/north-cloud
content crawler publisher
Last synced: 5 months ago
JSON representation
A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.
- Host: GitHub
- URL: https://github.com/jonesrussell/north-cloud
- Owner: jonesrussell
- Created: 2025-12-13T09:38:30.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-01-11T00:24:37.000Z (5 months ago)
- Last Synced: 2026-01-11T04:29:32.611Z (5 months ago)
- Topics: content, crawler, publisher
- Language: Go
- Homepage: https://northcloud.biz
- Size: 8.19 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# North Cloud
A microservices-based content pipeline for crawling, classifying, and distributing news articles.
## Pipeline
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ NORTH CLOUD PIPELINE │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────┐ ┌───────────────┐ ┌────────────┐ ┌───────────┐
│ Source │ │ │ │ │ │ │
│ Manager │─────▶│ Crawler │─────▶│ Classifier │─────▶│ Publisher │
│ │ │ │ │ │ │ │
└──────────┘ └───────────────┘ └────────────┘ └─────┬─────┘
│ │ │ │
│ ▼ ▼ ▼
│ ┌───────────────────────────────────┐ ┌───────────┐
│ │ ELASTICSEARCH │ │ REDIS │
│ │ ┌─────────────┐ ┌─────────────┐ │ │ Pub/Sub │
│ │ │ raw_content │ │ classified │ │ └─────┬─────┘
│ │ │ indexes │ │ indexes │ │ │
│ └──┴─────────────┴─┴─────────────┴──┘ │
│ │
▼ ▼
┌──────────┐ ┌────────────────────┐
│PostgreSQL│ │ EXTERNAL CONSUMERS │
│ (5 DBs) │ │ Drupal, Laravel, │
└──────────┘ │ Node.js, Python │
└────────────────────┘
```
## Services
| Service | Port | Description |
|---------|------|-------------|
| **crawler** | 8060 | Web crawler with interval-based job scheduling |
| **source-manager** | 8050 | Manage content sources and crawl configurations |
| **classifier** | 8071 | Classify content with quality scores and topics |
| **publisher** | 8070 | Route articles to Redis pub/sub channels |
| **index-manager** | 8090 | Elasticsearch index management |
| **search** | 8092 | Full-text search across classified content |
| **auth** | 8040 | JWT authentication |
| **dashboard** | 3002 | Unified management UI |
## Quick Start
```bash
# 1. Clone and configure
git clone
cd north-cloud
cp .env.example .env
# 2. Start development environment
docker compose -f docker-compose.base.yml -f docker-compose.dev.yml up -d
# 3. Access the dashboard
open http://localhost:3002
```
## Development
```bash
# Start dev environment
docker compose -f docker-compose.base.yml -f docker-compose.dev.yml up -d
# View logs
docker compose -f docker-compose.base.yml -f docker-compose.dev.yml logs -f
# Stop
docker compose -f docker-compose.base.yml -f docker-compose.dev.yml down
```
## Production
```bash
# Build and start
docker compose -f docker-compose.base.yml -f docker-compose.prod.yml up -d --build
# Stop
docker compose -f docker-compose.base.yml -f docker-compose.prod.yml down
```
## Environment Variables
Key variables (see `.env.example` for full list):
```bash
# Authentication (required)
AUTH_USERNAME=admin
AUTH_PASSWORD=your-password
AUTH_JWT_SECRET=$(openssl rand -hex 32)
# Debug mode
APP_DEBUG=true # false in production
```
## Documentation
- **CLAUDE.md** - Comprehensive architecture guide
- **DOCKER.md** - Docker reference
- **Service READMEs** - Individual service docs in each directory