https://github.com/zichkoding/log_analyzer
https://github.com/zichkoding/log_analyzer
Last synced: 26 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/zichkoding/log_analyzer
- Owner: ZichKoding
- Created: 2025-10-29T00:29:24.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-30T01:06:57.000Z (8 months ago)
- Last Synced: 2025-10-30T01:25:02.937Z (8 months ago)
- Language: Python
- Size: 22.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Challenge 1: Log Analyzer Web Dashboard (30 minutes, Beginner)
## Problem Statement
Build a simple HTTP server that reads server log files, analyzes them, and displays statistics through a web interface. This challenge introduces basic web server concepts, simple data structures, and file processing.
## Specific Requirements
Create a program that:
1. **Reads log files** from a `logs/` directory containing entries like:
```
2025-10-20 14:23:15 INFO User login: john@example.com
2025-10-20 14:24:03 ERROR Database connection failed
2025-10-20 14:25:10 INFO User logout: jane@example.com
```
2. **Analyzes logs** to extract:
- Total number of entries
- Count by log level (INFO, WARNING, ERROR)
- Most recent 5 log entries
- Count of unique users mentioned
3. **Serves a simple web interface** with:
- Homepage displaying statistics as HTML
- `/api/stats` endpoint returning JSON data
- Basic error handling for missing files
4. **Uses only standard library**: `http.server`, `pathlib`, `json`, `collections`, `datetime`
## Implementation Requirements
```python
from http.server import HTTPServer, BaseHTTPRequestHandler
from pathlib import Path
from collections import Counter
import json
class LogAnalyzerHandler(BaseHTTPRequestHandler):
def do_GET(self):
# Implement routing for / and /api/stats
pass
def analyze_logs(self) -> dict:
# Read and analyze log files
pass
```
## Success Criteria
- ✅ Server starts on port 8000 without errors
- ✅ Homepage displays log statistics in readable HTML
- ✅ `/api/stats` returns valid JSON with all required metrics
- ✅ Handles missing log directory gracefully (no crashes)
- ✅ Uses `pathlib` for all file operations
- ✅ Properly counts log levels using `Counter`
- ✅ Code includes type hints on main functions
## Example Use Cases
**Test Setup:**
```python
# Create logs directory with sample data
Path('logs').mkdir(exist_ok=True)
Path('logs/server.log').write_text("""
2025-10-20 14:23:15 INFO User login: john@example.com
2025-10-20 14:24:03 ERROR Database connection failed
2025-10-20 14:25:10 INFO User logout: jane@example.com
2025-10-20 14:26:22 WARNING High memory usage detected
2025-10-20 14:27:05 INFO User login: alice@example.com
""")
```
**Expected Output at `/api/stats`:**
```json
{
"total_entries": 5,
"by_level": {
"INFO": 3,
"ERROR": 1,
"WARNING": 1
},
"unique_users": 3,
"recent_entries": [
"2025-10-20 14:27:05 INFO User login: alice@example.com",
"2025-10-20 14:26:22 WARNING High memory usage detected",
"2025-10-20 14:25:10 INFO User logout: jane@example.com",
"2025-10-20 14:24:03 ERROR Database connection failed",
"2025-10-20 14:23:15 INFO User login: john@example.com"
]
}
```
## Hints and Guidance
1. **File Reading Pattern**:
```python
log_files = Path('logs').glob('*.log')
for log_file in log_files:
content = log_file.read_text()
lines = content.strip().splitlines()
```
2. **Parsing Log Lines**: Use `str.split()` with maxsplit to separate timestamp, level, and message:
```python
parts = line.split(maxsplit=3) # ['2025-10-20', '14:23:15', 'INFO', 'message...']
```
3. **Counting**: Use `collections.Counter` for efficient counting:
```python
from collections import Counter
levels = Counter()
levels[log_level] += 1
```
4. **HTML Response Template**:
```python
html = f"""
Log Analyzer
Server Log Statistics
Total Entries: {stats['total_entries']}
By Level:
- INFO: {stats['by_level'].get('INFO', 0)}
- WARNING: {stats['by_level'].get('WARNING', 0)}
- ERROR: {stats['by_level'].get('ERROR', 0)}
"""
```
5. **JSON Serialization**: Convert Counter to dict for JSON:
```python
json.dumps({'by_level': dict(counter)})
```
## Bonus Challenges
1. **Email Extraction** (⭐): Use regex to extract and count all email addresses mentioned in logs
2. **Time Range Filtering** (⭐): Add query parameters like `/api/stats?hours=24` to filter recent logs
3. **Error Rate Alert** (⭐⭐): Display a warning if ERROR count exceeds 10% of total entries
4. **Multiple Files** (⭐⭐): Handle multiple `.log` files in the directory and combine statistics
5. **Live Refresh** (⭐⭐⭐): Add auto-refresh to HTML using `` tag so page updates every 5 seconds
## Testing Your Solution
```python
# test_challenge1.py
import unittest
from pathlib import Path
import requests
import subprocess
import time
class TestLogAnalyzer(unittest.TestCase):
@classmethod
def setUpClass(cls):
# Start server in background
cls.server_process = subprocess.Popen(['python', 'challenge1.py'])
time.sleep(2) # Wait for server to start
@classmethod
def tearDownClass(cls):
cls.server_process.terminate()
def test_homepage(self):
response = requests.get('http://localhost:8000/')
self.assertEqual(response.status_code, 200)
self.assertIn('Log Statistics', response.text)
def test_api_stats(self):
response = requests.get('http://localhost:8000/api/stats')
self.assertEqual(response.status_code, 200)
data = response.json()
self.assertIn('total_entries', data)
self.assertIn('by_level', data)
```
## Learning Outcomes
- HTTP request/response cycle
- Basic routing concepts
- File I/O with pathlib
- Simple data aggregation
- JSON serialization