https://github.com/gofurry/web-profiler
A lightweight Go middleware for request profiling, exposing entropy, fingerprints, complexity, and charset insights through net/http context.
https://github.com/gofurry/web-profiler
go http middleware observability request-analysis security
Last synced: 2 months ago
JSON representation
A lightweight Go middleware for request profiling, exposing entropy, fingerprints, complexity, and charset insights through net/http context.
- Host: GitHub
- URL: https://github.com/gofurry/web-profiler
- Owner: GoFurry
- License: mit
- Created: 2026-03-27T17:18:51.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-07T16:32:57.000Z (2 months ago)
- Last Synced: 2026-04-07T17:25:37.697Z (2 months ago)
- Topics: go, http, middleware, observability, request-analysis, security
- Language: Go
- Homepage:
- Size: 107 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# web-profiler
[](https://github.com/GoFurry/web-profiler/releases)
[](LICENSE)
[](go.mod)
**[δΈζζζ‘£](docs/README_zh.md) | English | [Benchmark Baseline](docs/benchmark_baseline.md)**
`web-profiler` is a lightweight request analysis middleware for `net/http`.
It inspects incoming requests with bounded overhead, restores the request body for downstream handlers, and exposes structured results through `context.Context`.
It is designed as request-analysis infrastructure, not as a security decision engine.
## π² Highlights
- Native `net/http` middleware API with easy integration into Gin, Chi, Echo, and other `net/http`-based frameworks
- One bounded body capture shared by all analyzers
- Structured request profile exposed through `FromContext`
- Per-request analysis duration with nanosecond precision
- Rich request metadata including observed bytes, header stats, and per-analyzer timings
- Multiple bounded sampling strategies: `head`, `tail`, and `head_tail`
- Optional compressed-body inspection, trusted-proxy CIDR checks, and alternate fingerprint hash algorithms
- Safe degradation with warnings instead of failing the request
- Built-in analyzers for entropy, fingerprint, complexity, and charset distribution
## Installation
```bash
go get github.com/GoFurry/web-profiler
```
## π Quick Start
```go
package main
import (
"log"
"net/http"
webprofiler "github.com/GoFurry/web-profiler"
)
func main() {
cfg := webprofiler.DefaultConfig()
handler := webprofiler.Middleware(cfg)(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
profile, ok := webprofiler.FromContext(r.Context())
if ok && profile != nil {
if profile.Entropy != nil {
log.Printf("entropy=%.4f", profile.Entropy.Value)
}
if profile.Fingerprint != nil {
log.Printf("fingerprint=%s", profile.Fingerprint.Hash)
}
}
// The request body is still readable here.
w.WriteHeader(http.StatusOK)
}))
log.Fatal(http.ListenAndServe(":8080", handler))
}
```
You can also use the convenience helper:
```go
handler := webprofiler.Wrap(mux, webprofiler.DefaultConfig())
```
A runnable native `net/http` example lives at [`example/main.go`](example/main.go).
The latest benchmark reference is recorded in [`docs/benchmark_baseline.md`](docs/benchmark_baseline.md).
## Using Profile Data In Handlers
`FromContext` gives you the collected `Profile`. You can inspect metadata, module outputs, and warnings inside any downstream handler:
```go
func inspectHandler(w http.ResponseWriter, r *http.Request) {
profile, ok := webprofiler.FromContext(r.Context())
if !ok || profile == nil {
http.Error(w, "profile not found", http.StatusInternalServerError)
return
}
body, err := io.ReadAll(r.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
log.Printf("path=%s content_type=%s observed_bytes=%d", profile.Meta.Path, profile.Meta.ContentType, profile.Meta.ObservedBytes)
log.Printf("headers=%d header_bytes=%d", profile.Meta.HeaderCount, profile.Meta.HeaderBytes)
log.Printf("analysis_duration=%s", profile.Meta.AnalysisDuration)
log.Printf("body=%s", string(body))
if profile.Entropy != nil {
log.Printf("entropy=%.4f sample_bytes=%d", profile.Entropy.Value, profile.Entropy.SampledBytes)
}
if profile.Fingerprint != nil {
log.Printf("fingerprint=%s fields=%v", profile.Fingerprint.Hash, profile.Fingerprint.Fields)
}
if profile.Complexity != nil {
log.Printf("complexity_score=%d depth=%d fields=%d scalars=%d", profile.Complexity.Score, profile.Complexity.Depth, profile.Complexity.FieldCount, profile.Complexity.ScalarCount)
}
if profile.Charset != nil {
log.Printf("non_ascii_ratio=%.2f scripts=%v suspicious=%v", profile.Charset.NonASCIIRatio, profile.Charset.UnicodeScriptCounts, profile.Charset.SuspiciousFlags)
}
if len(profile.Warnings) > 0 {
log.Printf("warnings=%v", profile.Warnings)
}
w.WriteHeader(http.StatusOK)
}
```
## Gin Example
Gin uses `net/http` under the hood, so you can wrap its engine directly:
```go
package main
import (
"log"
"net/http"
"github.com/gin-gonic/gin"
webprofiler "github.com/GoFurry/web-profiler"
)
func main() {
cfg := webprofiler.DefaultConfig()
engine := gin.New()
engine.POST("/inspect", func(c *gin.Context) {
profile, ok := webprofiler.FromContext(c.Request.Context())
if !ok || profile == nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "profile not found"})
return
}
c.JSON(http.StatusOK, gin.H{
"path": profile.Meta.Path,
"content_type": profile.Meta.ContentType,
"observed": profile.Meta.ObservedBytes,
"headers": profile.Meta.HeaderCount,
"analysis_ns": profile.Meta.AnalysisDuration.Nanoseconds(),
"entropy": profile.Entropy,
"fingerprint": profile.Fingerprint,
"complexity": profile.Complexity,
"charset": profile.Charset,
"warnings": profile.Warnings,
})
})
handler := webprofiler.Middleware(cfg)(engine)
log.Fatal(http.ListenAndServe(":8080", handler))
}
```
## Public API
```go
func Middleware(cfg Config) func(http.Handler) http.Handler
func Wrap(next http.Handler, cfg Config) http.Handler
func DefaultConfig() Config
func FromContext(ctx context.Context) (*Profile, bool)
```
## What Gets Collected
### `Meta`
- Method, path, normalized content type, and request content length
- Observed body bytes, sample size, truncation state, and header count/bytes
- End-to-end analysis duration plus per-analyzer durations
- Explicit skip and downgrade warnings when body-based analysis does not run
### `Entropy`
- Shannon entropy over the sampled request body bytes
- Normalized entropy, unique-byte count, repetition ratio, and approximate compressibility
- Sample size and observed body size
- Sampling strategy metadata
### `Fingerprint`
- Normalized request headers
- Optional client IP and TLS metadata
- Source flags plus weak/strong hashes with versioning
- Optional hash-only mode when you do not want raw normalized fields in results
- Trusted-proxy CIDR policy and alternate hash algorithms such as `sha1`, `sha256`, `sha512`, and `fnv1a64`
### `Complexity`
- JSON depth, field counts, scalar/null/string stats, and key-length summaries
- Object/array counts, max array length, and max object width
- URL-encoded form statistics plus key/value length summaries
- Optional multipart file metadata such as file counts, extensions, and content types
- Interpretable score factors
### `Charset`
- ASCII, digit, whitespace, symbol, control, and non-ASCII ratios
- Emoji ratio and invisible-character density
- Unicode script distribution counts when enabled
- Optional confusable/homoglyph detection and format-specific metrics for JSON, XML, and form payloads
- Optional suspicious flags such as invalid UTF-8, zero-width characters, and mixed scripts
## π§ Configuration
`DefaultConfig()` returns a ready-to-use setup with bounded defaults:
- `BodyConfig` limits read size, sample size, methods, and content types
- `FingerprintConfig` controls headers, proxy trust, TLS metadata, and hash versioning
- `ComplexityConfig` controls JSON depth, field limits, and supported content types
- `CharsetConfig` controls text analysis size and suspicious-pattern detection
Typical customization:
```go
cfg := webprofiler.DefaultConfig()
cfg.Body.MaxReadBytes = 64 << 10
cfg.Body.SampleBytes = 8 << 10
cfg.Body.SampleStrategy = webprofiler.SampleStrategyHeadTail
cfg.Body.EnableCompressedAnalysis = true
cfg.Fingerprint.IncludeIP = true
cfg.Fingerprint.TrustProxy = true
cfg.Fingerprint.TrustedProxyCIDRs = []string{"10.0.0.0/8", "192.168.0.0/16"}
cfg.Fingerprint.HashAlgorithm = "sha512"
cfg.Fingerprint.ExposeFields = false
cfg.Complexity.MaxJSONDepth = 16
cfg.Complexity.EnableMultipartMeta = true
cfg.Charset.EnableConfusableDetection = true
cfg.Charset.EnableFormatSpecificMetrics = true
```
## Performance Notes
The current benchmark baseline is tracked in [`docs/benchmark_baseline.md`](docs/benchmark_baseline.md).
- `MetaInfo.AnalysisDuration` records middleware analysis time as `time.Duration`
- `MetaInfo` now also exposes per-analyzer timings so you can see where request profiling time is spent
- The example exposes both `analysis_duration` and `analysis_duration_ns` so you can read it directly or aggregate it precisely
- The SHA-256 fingerprint step hashes a very small normalized string built from a few headers and optional TLS/IP fields, so in most cases it is not the main cost
- Alternate fingerprint hash algorithms are available, but `sha256` remains the best default tradeoff for compatibility and stability
- Compressed-body inspection is optional because it adds decompression work; enable it when encoded request payloads are common in your traffic
- If you want the cheapest possible fingerprint output, set `Fingerprint.ExposeFields = false` to keep only hashes and source metadata
- In practice, request-body capture, JSON parsing, and charset scanning are usually more expensive than the final SHA-256 call
- If you run at very high QPS, benchmark with your own traffic and disable `EnableFingerprint`, `IncludeIP`, or `IncludeTLS` if you want an even cheaper profile
## Example Response Fields
The native example at [`example/main.go`](example/main.go) returns a JSON payload like the one you posted. The following table maps each field to its meaning:
| Field | Meaning |
| --- | --- |
| `path` | Request path seen by the middleware and handler. |
| `body` | Request body re-read inside the handler, proving the middleware restored `r.Body`. |
| `observed_bytes` | Number of body bytes actually observed before sampling. |
| `header_count` | Number of request header entries counted by the middleware, including `Host`. |
| `header_bytes` | Approximate size of header keys and values counted by the middleware. |
| `entropy.Value` | Shannon entropy of the sampled body bytes. Higher usually means more byte diversity. |
| `entropy.NormalizedValue` | Entropy normalized to an approximate `0..1` range by dividing by `8 bits/byte`. |
| `entropy.SampledBytes` | Number of bytes used for entropy calculation. |
| `entropy.TotalObservedBytes` | Number of body bytes observed by the middleware before sampling. |
| `entropy.UniqueByteCount` | Number of distinct byte values seen in the sampled body. |
| `entropy.RepetitionRatio` | Share of sampled bytes that repeat beyond their first occurrence. |
| `entropy.CompressionRatio` | Approximate gzip-compressed size divided by sampled size. Lower often means more repetitive content. |
| `entropy.ApproxCompressibility` | A convenience score derived from the compression ratio. Higher usually means easier-to-compress content. |
| `entropy.SampleStrategy` | Sampling mode currently used for body analysis. |
| `fingerprint.Fields` | Normalized fields used to build the request fingerprint. |
| `fingerprint.SourceFlags` | Which input sources contributed to the fingerprint, for example `headers`, `tls`, or `ip`. |
| `fingerprint.Hash` | Stable SHA-256 digest of the normalized fingerprint fields. |
| `fingerprint.WeakHash` | Fingerprint hash that excludes more volatile inputs such as client IP. |
| `fingerprint.StrongHash` | Fingerprint hash built from the full configured input set. |
| `fingerprint.HashAlgorithm` | Fingerprint hash algorithm currently used. |
| `fingerprint.HashVersion` | Fingerprint schema/version identifier. |
| `complexity.ContentType` | Content type used for complexity analysis. |
| `complexity.Depth` | Observed structural depth of the parsed body. Depth is counted on the recursive walk, so scalar leaf values increase the final depth level. |
| `complexity.FieldCount` | Total number of parsed fields/values. |
| `complexity.ObjectCount` | Number of JSON objects encountered. |
| `complexity.ArrayCount` | Number of arrays encountered. |
| `complexity.ScalarCount` | Number of non-container values such as strings, numbers, booleans, and `null`. |
| `complexity.NullCount` | Number of `null` values seen in JSON. |
| `complexity.StringCount` | Number of string values seen in the parsed payload. |
| `complexity.UniqueKeyCount` | Number of keys encountered across parsed objects or form keys. |
| `complexity.MaxArrayLength` | Longest array length seen in the body. |
| `complexity.MaxObjectFields` | Largest number of fields found in a single object or form key set. |
| `complexity.MaxKeyLength` | Longest key length seen during complexity analysis. |
| `complexity.MaxStringLength` | Longest JSON string value length seen during complexity analysis. |
| `complexity.MaxValueLength` | Longest form value length seen during complexity analysis. |
| `complexity.AverageKeyLength` | Average key length for form inputs. |
| `complexity.AverageValueLength` | Average value length for form inputs. |
| `complexity.MultipartFileCount` | Number of uploaded files seen in multipart metadata mode. |
| `complexity.MultipartFieldCount` | Number of non-file form fields seen in multipart metadata mode. |
| `complexity.MultipartFileExtensions` | Count of file extensions seen across multipart uploads. |
| `complexity.MultipartFileContentTypes` | Count of per-file content types seen in multipart uploads. |
| `complexity.MultipartMaxFileNameLength` | Longest multipart file name length seen in the request. |
| `complexity.Score` | Aggregate complexity score. |
| `complexity.ScoreFactors` | Breakdown of how the complexity score was calculated. |
| `charset.TotalChars` | Total characters scanned in the sampled text body. |
| `charset.ASCIIAlphaRatio` | Ratio of ASCII letters in the sampled body. |
| `charset.DigitRatio` | Ratio of digits in the sampled body. |
| `charset.WhitespaceRatio` | Ratio of whitespace characters in the sampled body. |
| `charset.SymbolRatio` | Ratio of punctuation and symbol characters in the sampled body. |
| `charset.ControlCharRatio` | Ratio of control characters or invalid byte sequences. |
| `charset.NonASCIIRatio` | Ratio of non-ASCII characters in the sampled body. |
| `charset.EmojiRatio` | Ratio of emoji-like code points in the sampled text. |
| `charset.InvisibleCharRatio` | Ratio of invisible characters such as zero-width marks or formatting controls. |
| `charset.ConfusableCount` | Number of characters that match a built-in homoglyph/confusable set. |
| `charset.UnicodeScriptCounts` | Count of characters per detected Unicode script when script detection is enabled. |
| `charset.FormatMetrics` | Optional format-aware metrics for JSON, XML, or form payloads, such as token counts and repeated keys. |
| `charset.SuspiciousFlags` | Optional markers such as invalid UTF-8, zero-width characters, or mixed scripts. |
| `content_type` | Normalized request `Content-Type`. |
| `content_length` | Request body length reported by the incoming request. |
| `sampled` | Whether the middleware analyzed only a subset of the observed body. |
| `sample_bytes` | Number of sampled bytes actually used by body analyzers. |
| `body_truncated` | Whether body observation stopped at `MaxReadBytes`. |
| `fingerprint_duration_ns` | Time spent building the request fingerprint. |
| `body_capture_duration_ns` | Time spent capturing and replaying the request body. |
| `entropy_duration_ns` | Time spent calculating body entropy. |
| `complexity_duration_ns` | Time spent calculating complexity metrics. |
| `charset_duration_ns` | Time spent calculating charset metrics. |
| `analysis_duration` | Human-readable middleware analysis duration, for example `187.4Β΅s`. |
| `analysis_duration_ns` | Exact middleware analysis duration in nanoseconds, useful for metrics and aggregation. |
## π Design Boundaries
This middleware:
- analyzes requests, not responses
- does not persist or export results
- does not block traffic on analyzer failures
- does not guarantee deep parsing for every content type
- does not produce business risk decisions
## Result Model
Each request produces a `Profile`:
```go
type Profile struct {
Meta MetaInfo
Entropy *EntropyResult
Fingerprint *FingerprintResult
Complexity *ComplexityResult
Charset *CharsetResult
Warnings []Warning
}
```
Analyzer results are optional pointers so handlers can distinguish between disabled, skipped, and populated modules.
Repository layout is intentionally simple:
- root package: stable middleware API for callers
- `example/`: runnable demo program
- `internal/policy`: config types and normalization
- `internal/model`: profile and result types
- `internal/core`: request capture, analyzers, and context plumbing
## Testing
```bash
go test ./...
```
The test suite covers middleware behavior, body replay, config normalization, analyzer outputs, and warning paths.
## πΊ License
This project is open-sourced under the [MIT License](LICENSE), which permits commercial use, modification, and distribution without requiring the original author's copyright notice to be retained.