https://github.com/diabahmed/sykell-crawler
A robust, scalable, and production-ready web crawler full stack application built with Go and Next.js
https://github.com/diabahmed/sykell-crawler
docker golang jwt-auth mysql nextjs playwright rest-api websockets
Last synced: 3 months ago
JSON representation
A robust, scalable, and production-ready web crawler full stack application built with Go and Next.js
- Host: GitHub
- URL: https://github.com/diabahmed/sykell-crawler
- Owner: diabahmed
- License: mit
- Created: 2025-07-16T01:19:40.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-17T18:29:16.000Z (12 months ago)
- Last Synced: 2025-07-17T22:08:29.306Z (12 months ago)
- Topics: docker, golang, jwt-auth, mysql, nextjs, playwright, rest-api, websockets
- Language: TypeScript
- Homepage: https://sykell.com
- Size: 265 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sykell Web Crawler Platform








A comprehensive, full-stack web crawling platform that provides powerful website analysis capabilities through a modern web interface. Built with Go backend and Next.js frontend, this platform offers real-time crawling, detailed analytics, and an exceptional user experience.
> **⚡ Rapid Development Achievement**: This entire full-stack application was built after learning Go in just one day! It showcases the power of modern development tools, clean architecture patterns, and the effectiveness of well-structured frameworks for building robust applications quickly.
## 🌟 Platform Overview
Sykell is a multi-tenant web crawling platform that combines:
- **Powerful Backend**: High-performance Go API with clean architecture
- **Modern Frontend**: React-based dashboard with real-time updates
- **Scalable Infrastructure**: Docker-containerized deployment ready for production
- **Real-time Features**: WebSocket integration for live status updates
- **Comprehensive Analytics**: Detailed website analysis and reporting
## 🏗️ Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Dashboard │ │ Real-time │ │ Authentication │ │
│ │ UI │ │ Updates │ │ UI │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
HTTP/WebSocket
│
┌─────────────────────────────────────────────────────────────┐
│ Backend API (Go) │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ RESTful │ │ WebSocket │ │ Authentication │ │
│ │ API │ │ Hub │ │ & Authorization │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Crawler │ │ Business │ │ Data Access │ │
│ │ Engine │ │ Logic │ │ Layer │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Database (MySQL) │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Users │ │ Crawls │ │ Audit Logs │ │
│ │ Tables │ │ Results │ │ & Sessions │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## 🚀 Features
### 🕷️ Web Crawling
- **Comprehensive Analysis**: HTML version detection, title extraction, heading structure analysis
- **Link Analysis**: Internal vs. external link classification with broken link detection
- **Form Detection**: Login form presence identification
- **Performance Metrics**: Processing time tracking and optimization insights
- **Real-time Processing**: Background job processing with live status updates
### 👨💻 User Experience
- **Multi-tenant System**: Complete user registration and authentication
- **Modern Dashboard**: Responsive design with dark mode support
- **Real-time Updates**: WebSocket integration for live crawl notifications
- **Data Visualization**: Interactive tables with advanced filtering and sorting
- **Bulk Operations**: Manage multiple crawls efficiently
### 🛠️ Technical Excellence
- **Clean Architecture**: Domain-driven design with clear separation of concerns
- **Type Safety**: Full TypeScript coverage across the frontend
- **Security**: JWT authentication with secure session management
- **Performance**: Optimized concurrent processing and caching systems
- **Scalability**: Docker containerization ready for production deployment
## 📦 Repository Structure
```
sykell-crawler/
├── 📁 client/ # Next.js Frontend Application
│ ├── 📁 app/ # Next.js App Router
│ ├── 📁 components/ # React Components
│ ├── 📁 store/ # State Management (Zustand)
│ ├── 📁 hooks/ # Custom React Hooks
│ ├── 📁 lib/ # Utility Libraries
│ ├── 📁 types/ # TypeScript Definitions
│ ├── 📁 tests/ # E2E Tests (Playwright)
│ ├── 📄 Dockerfile # Frontend Container Config
│ └── 📄 README.md # Frontend Documentation
├── 📁 server/ # Go Backend API
│ ├── 📁 cmd/api/ # Application Entry Point
│ ├── 📁 internal/ # Private Application Code
│ │ ├── 📁 application/ # Business Logic Services
│ │ ├── 📁 domain/ # Domain Entities & Interfaces
│ │ ├── 📁 infrastructure/ # External Integrations
│ │ └── 📁 presentation/ # HTTP/WebSocket Handlers
│ ├── 📁 tests/ # API Tests & Test Utilities
│ ├── 📄 Dockerfile # Backend Container Config
│ └── 📄 README.md # Backend Documentation
├── 📄 docker-compose.yml # Multi-service Orchestration
├── 📄 .env.example # Environment Configuration Template
├── 📄 LICENSE # MIT License
└── 📄 README.md # This File
```
## 🚀 Quick Start
### Prerequisites
- **Docker & Docker Compose** (Recommended)
- **Go 1.24.2+** (for local development)
- **Node.js 20+** (for local development)
- **MySQL 8.0** (if running locally)
### 🐳 Docker Deployment (Recommended)
1. **Clone the repository**
```bash
git clone https://github.com/diabahmed/sykell-crawler.git
cd sykell-crawler
```
2. **Configure environment**
```bash
cp .env.example .env
```
Edit `.env` with your configuration:
```env
# Database Configuration
DB_PASSWORD=your_secure_password
DB_NAME=web_crawler_db
DB_SOURCE="root:your_secure_password@tcp(db:3306)/web_crawler_db?charset=utf8mb4&parseTime=True&loc=Local"
# Frontend Configuration
NEXT_PUBLIC_API_BASE_URL=http://localhost:8088/api/v1
NEXT_PUBLIC_WS_BASE_URL=ws://localhost:8088/api/v1/ws
# JWT Configuration
TOKEN_SYMMETRIC_KEY="your_32_character_secret_key_here"
ACCESS_TOKEN_DURATION="24h"
```
3. **Launch the platform**
```bash
docker-compose up --build -d
```
4. **Access the application**
- **Frontend**: http://localhost:3000
- **Backend API**: http://localhost:8088
- **Database**: localhost:3306
### 🔧 Local Development
For detailed local development instructions, refer to the component-specific READMEs:
- **[Backend Development Guide](./server/README.md)** - Go API setup, testing, and development
- **[Frontend Development Guide](./client/README.md)** - Next.js setup, components, and testing
## 📖 Documentation
### Component Documentation
- **[📚 Backend API Documentation](./server/README.md)**
- Architecture overview
- API endpoints
- Database schema
- Configuration options
- Development guide
- **[📚 Frontend Documentation](./client/README.md)**
- Component architecture
- State management
- UI components
- Testing strategy
- Performance optimizations
### API Documentation
- **API Endpoints**: Detailed in [Backend README](./server/README.md#-api-endpoints)
## 🔐 Security
### Authentication & Authorization
- JWT-based authentication with HTTP-only cookies
- Multi-tenant user isolation
- Secure password hashing with bcrypt
- Session management and automatic logout
### API Security
- Input validation and sanitization
- CORS configuration
- Rate limiting capabilities
- SQL injection prevention via ORM
### Infrastructure Security
- Container security best practices
- Secure environment variable handling
- Network isolation with Docker
### Environment Configurations
- **Development**: Local development with hot reload
- **Staging**: Production-like environment for testing
- **Production**: Optimized for performance and security
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
For detailed component documentation, please refer to:
- [🔧 Backend Documentation](./server/README.md)
- [🎨 Frontend Documentation](./client/README.md)