https://github.com/zichkoding/log_analyzer

Last synced: 26 days ago
JSON representation

Host: GitHub
URL: https://github.com/zichkoding/log_analyzer
Owner: ZichKoding
Created: 2025-10-29T00:29:24.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-10-30T01:06:57.000Z (8 months ago)
Last Synced: 2025-10-30T01:25:02.937Z (8 months ago)
Language: Python
Size: 22.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Challenge 1: Log Analyzer Web Dashboard (30 minutes, Beginner)

## Problem Statement

Build a simple HTTP server that reads server log files, analyzes them, and displays statistics through a web interface. This challenge introduces basic web server concepts, simple data structures, and file processing.

## Specific Requirements

Create a program that:

1. **Reads log files** from a `logs/` directory containing entries like:

   ```

   2025-10-20 14:23:15 INFO User login: john@example.com

   2025-10-20 14:24:03 ERROR Database connection failed

   2025-10-20 14:25:10 INFO User logout: jane@example.com

   ```

2. **Analyzes logs** to extract:

   - Total number of entries

   - Count by log level (INFO, WARNING, ERROR)

   - Most recent 5 log entries

   - Count of unique users mentioned

3. **Serves a simple web interface** with:

   - Homepage displaying statistics as HTML

   - `/api/stats` endpoint returning JSON data

   - Basic error handling for missing files

4. **Uses only standard library**: `http.server`, `pathlib`, `json`, `collections`, `datetime`

## Implementation Requirements

```python

from http.server import HTTPServer, BaseHTTPRequestHandler

from pathlib import Path

from collections import Counter

import json

class LogAnalyzerHandler(BaseHTTPRequestHandler):

    def do_GET(self):

        # Implement routing for / and /api/stats

        pass

    def analyze_logs(self) -> dict:

        # Read and analyze log files

        pass

```

## Success Criteria

- ✅ Server starts on port 8000 without errors

- ✅ Homepage displays log statistics in readable HTML

- ✅ `/api/stats` returns valid JSON with all required metrics

- ✅ Handles missing log directory gracefully (no crashes)

- ✅ Uses `pathlib` for all file operations

- ✅ Properly counts log levels using `Counter`

- ✅ Code includes type hints on main functions

## Example Use Cases

**Test Setup:**

```python

# Create logs directory with sample data

Path('logs').mkdir(exist_ok=True)

Path('logs/server.log').write_text("""

2025-10-20 14:23:15 INFO User login: john@example.com

2025-10-20 14:24:03 ERROR Database connection failed

2025-10-20 14:25:10 INFO User logout: jane@example.com

2025-10-20 14:26:22 WARNING High memory usage detected

2025-10-20 14:27:05 INFO User login: alice@example.com

""")

```

**Expected Output at `/api/stats`:**

```json

{

  "total_entries": 5,

  "by_level": {

    "INFO": 3,

    "ERROR": 1,

    "WARNING": 1

  },

  "unique_users": 3,

  "recent_entries": [

    "2025-10-20 14:27:05 INFO User login: alice@example.com",

    "2025-10-20 14:26:22 WARNING High memory usage detected",

    "2025-10-20 14:25:10 INFO User logout: jane@example.com",

    "2025-10-20 14:24:03 ERROR Database connection failed",

    "2025-10-20 14:23:15 INFO User login: john@example.com"

  ]

}

```

## Hints and Guidance

1. **File Reading Pattern**:

   ```python

   log_files = Path('logs').glob('*.log')

   for log_file in log_files:

       content = log_file.read_text()

       lines = content.strip().splitlines()

   ```

2. **Parsing Log Lines**: Use `str.split()` with maxsplit to separate timestamp, level, and message:

   ```python

   parts = line.split(maxsplit=3)  # ['2025-10-20', '14:23:15', 'INFO', 'message...']

   ```

3. **Counting**: Use `collections.Counter` for efficient counting:

   ```python

   from collections import Counter

   levels = Counter()

   levels[log_level] += 1

   ```

4. **HTML Response Template**:

   ```python

   html = f"""

   

   

   Log Analyzer

   

       
Server Log Statistics

       Total Entries: {stats['total_entries']}

       By Level:

       

           INFO: {stats['by_level'].get('INFO', 0)}

           WARNING: {stats['by_level'].get('WARNING', 0)}

           ERROR: {stats['by_level'].get('ERROR', 0)}

       

   

   

   """

   ```

5. **JSON Serialization**: Convert Counter to dict for JSON:

   ```python

   json.dumps({'by_level': dict(counter)})

   ```

## Bonus Challenges

1. **Email Extraction** (⭐): Use regex to extract and count all email addresses mentioned in logs

2. **Time Range Filtering** (⭐): Add query parameters like `/api/stats?hours=24` to filter recent logs

3. **Error Rate Alert** (⭐⭐): Display a warning if ERROR count exceeds 10% of total entries

4. **Multiple Files** (⭐⭐): Handle multiple `.log` files in the directory and combine statistics

5. **Live Refresh** (⭐⭐⭐): Add auto-refresh to HTML using `` tag so page updates every 5 seconds

## Testing Your Solution

```python

# test_challenge1.py

import unittest

from pathlib import Path

import requests

import subprocess

import time

class TestLogAnalyzer(unittest.TestCase):

    @classmethod

    def setUpClass(cls):

        # Start server in background

        cls.server_process = subprocess.Popen(['python', 'challenge1.py'])

        time.sleep(2)  # Wait for server to start

    @classmethod

    def tearDownClass(cls):

        cls.server_process.terminate()

    def test_homepage(self):

        response = requests.get('http://localhost:8000/')

        self.assertEqual(response.status_code, 200)

        self.assertIn('Log Statistics', response.text)

    def test_api_stats(self):

        response = requests.get('http://localhost:8000/api/stats')

        self.assertEqual(response.status_code, 200)

        data = response.json()

        self.assertIn('total_entries', data)

        self.assertIn('by_level', data)

```

## Learning Outcomes

- HTTP request/response cycle

- Basic routing concepts

- File I/O with pathlib

- Simple data aggregation

- JSON serialization

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zichkoding/log_analyzer

Awesome Lists containing this project

README

Server Log Statistics

By Level: