{"id":48590075,"url":"https://github.com/gofurry/web-profiler","last_synced_at":"2026-04-08T19:02:00.222Z","repository":{"id":348029937,"uuid":"1193882153","full_name":"GoFurry/web-profiler","owner":"GoFurry","description":"A lightweight Go middleware for request profiling, exposing entropy, fingerprints, complexity, and charset insights through net/http context.","archived":false,"fork":false,"pushed_at":"2026-04-07T16:32:57.000Z","size":110,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-07T17:25:37.697Z","etag":null,"topics":["go","http","middleware","observability","request-analysis","security"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoFurry.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-27T17:18:51.000Z","updated_at":"2026-04-07T16:30:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/GoFurry/web-profiler","commit_stats":null,"previous_names":["gofurry/web-profiler"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/GoFurry/web-profiler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoFurry%2Fweb-profiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoFurry%2Fweb-profiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoFurry%2Fweb-profiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoFurry%2Fweb-profiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoFurry","download_url":"https://codeload.github.com/GoFurry/web-profiler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoFurry%2Fweb-profiler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31569400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","http","middleware","observability","request-analysis","security"],"created_at":"2026-04-08T19:01:55.541Z","updated_at":"2026-04-08T19:02:00.216Z","avatar_url":"https://github.com/GoFurry.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# web-profiler\n\n[![Last Version](https://img.shields.io/github/release/GoFurry/web-profiler/all.svg?logo=github\u0026color=brightgreen)](https://github.com/GoFurry/web-profiler/releases)\n[![License](https://img.shields.io/github/license/GoFurry/coraza-fiber-lite)](LICENSE)\n[![Go Version](https://img.shields.io/badge/go-%3E%3D1.26-blue)](go.mod)\n\n**[中文文档](docs/README_zh.md) | English | [Benchmark Baseline](docs/benchmark_baseline.md)**\n\n`web-profiler` is a lightweight request analysis middleware for `net/http`.\nIt inspects incoming requests with bounded overhead, restores the request body for downstream handlers, and exposes structured results through `context.Context`.\n\nIt is designed as request-analysis infrastructure, not as a security decision engine.\n\n## 🐲 Highlights\n\n- Native `net/http` middleware API with easy integration into Gin, Chi, Echo, and other `net/http`-based frameworks\n- One bounded body capture shared by all analyzers\n- Structured request profile exposed through `FromContext`\n- Per-request analysis duration with nanosecond precision\n- Rich request metadata including observed bytes, header stats, and per-analyzer timings\n- Multiple bounded sampling strategies: `head`, `tail`, and `head_tail`\n- Optional compressed-body inspection, trusted-proxy CIDR checks, and alternate fingerprint hash algorithms\n- Safe degradation with warnings instead of failing the request\n- Built-in analyzers for entropy, fingerprint, complexity, and charset distribution\n\n## Installation\n\n```bash\ngo get github.com/GoFurry/web-profiler\n```\n\n## 🚀 Quick Start\n\n```go\npackage main\n\nimport (\n\t\"log\"\n\t\"net/http\"\n\n\twebprofiler \"github.com/GoFurry/web-profiler\"\n)\n\nfunc main() {\n\tcfg := webprofiler.DefaultConfig()\n\n\thandler := webprofiler.Middleware(cfg)(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {\n\t\tprofile, ok := webprofiler.FromContext(r.Context())\n\t\tif ok \u0026\u0026 profile != nil {\n\t\t\tif profile.Entropy != nil {\n\t\t\t\tlog.Printf(\"entropy=%.4f\", profile.Entropy.Value)\n\t\t\t}\n\t\t\tif profile.Fingerprint != nil {\n\t\t\t\tlog.Printf(\"fingerprint=%s\", profile.Fingerprint.Hash)\n\t\t\t}\n\t\t}\n\n\t\t// The request body is still readable here.\n\t\tw.WriteHeader(http.StatusOK)\n\t}))\n\n\tlog.Fatal(http.ListenAndServe(\":8080\", handler))\n}\n```\n\nYou can also use the convenience helper:\n\n```go\nhandler := webprofiler.Wrap(mux, webprofiler.DefaultConfig())\n```\n\nA runnable native `net/http` example lives at [`example/main.go`](example/main.go).\n\nThe latest benchmark reference is recorded in [`docs/benchmark_baseline.md`](docs/benchmark_baseline.md).\n\n## Using Profile Data In Handlers\n\n`FromContext` gives you the collected `Profile`. You can inspect metadata, module outputs, and warnings inside any downstream handler:\n\n```go\nfunc inspectHandler(w http.ResponseWriter, r *http.Request) {\n\tprofile, ok := webprofiler.FromContext(r.Context())\n\tif !ok || profile == nil {\n\t\thttp.Error(w, \"profile not found\", http.StatusInternalServerError)\n\t\treturn\n\t}\n\n\tbody, err := io.ReadAll(r.Body)\n\tif err != nil {\n\t\thttp.Error(w, err.Error(), http.StatusInternalServerError)\n\t\treturn\n\t}\n\n\tlog.Printf(\"path=%s content_type=%s observed_bytes=%d\", profile.Meta.Path, profile.Meta.ContentType, profile.Meta.ObservedBytes)\n\tlog.Printf(\"headers=%d header_bytes=%d\", profile.Meta.HeaderCount, profile.Meta.HeaderBytes)\n\tlog.Printf(\"analysis_duration=%s\", profile.Meta.AnalysisDuration)\n\tlog.Printf(\"body=%s\", string(body))\n\n\tif profile.Entropy != nil {\n\t\tlog.Printf(\"entropy=%.4f sample_bytes=%d\", profile.Entropy.Value, profile.Entropy.SampledBytes)\n\t}\n\tif profile.Fingerprint != nil {\n\t\tlog.Printf(\"fingerprint=%s fields=%v\", profile.Fingerprint.Hash, profile.Fingerprint.Fields)\n\t}\n\tif profile.Complexity != nil {\n\t\tlog.Printf(\"complexity_score=%d depth=%d fields=%d scalars=%d\", profile.Complexity.Score, profile.Complexity.Depth, profile.Complexity.FieldCount, profile.Complexity.ScalarCount)\n\t}\n\tif profile.Charset != nil {\n\t\tlog.Printf(\"non_ascii_ratio=%.2f scripts=%v suspicious=%v\", profile.Charset.NonASCIIRatio, profile.Charset.UnicodeScriptCounts, profile.Charset.SuspiciousFlags)\n\t}\n\tif len(profile.Warnings) \u003e 0 {\n\t\tlog.Printf(\"warnings=%v\", profile.Warnings)\n\t}\n\n\tw.WriteHeader(http.StatusOK)\n}\n```\n\n## Gin Example\n\nGin uses `net/http` under the hood, so you can wrap its engine directly:\n\n```go\npackage main\n\nimport (\n\t\"log\"\n\t\"net/http\"\n\n\t\"github.com/gin-gonic/gin\"\n\twebprofiler \"github.com/GoFurry/web-profiler\"\n)\n\nfunc main() {\n\tcfg := webprofiler.DefaultConfig()\n\tengine := gin.New()\n\n\tengine.POST(\"/inspect\", func(c *gin.Context) {\n\t\tprofile, ok := webprofiler.FromContext(c.Request.Context())\n\t\tif !ok || profile == nil {\n\t\t\tc.JSON(http.StatusInternalServerError, gin.H{\"error\": \"profile not found\"})\n\t\t\treturn\n\t\t}\n\n\t\tc.JSON(http.StatusOK, gin.H{\n\t\t\t\"path\":         profile.Meta.Path,\n\t\t\t\"content_type\": profile.Meta.ContentType,\n\t\t\t\"observed\":     profile.Meta.ObservedBytes,\n\t\t\t\"headers\":      profile.Meta.HeaderCount,\n\t\t\t\"analysis_ns\":  profile.Meta.AnalysisDuration.Nanoseconds(),\n\t\t\t\"entropy\":      profile.Entropy,\n\t\t\t\"fingerprint\":  profile.Fingerprint,\n\t\t\t\"complexity\":   profile.Complexity,\n\t\t\t\"charset\":      profile.Charset,\n\t\t\t\"warnings\":     profile.Warnings,\n\t\t})\n\t})\n\n\thandler := webprofiler.Middleware(cfg)(engine)\n\tlog.Fatal(http.ListenAndServe(\":8080\", handler))\n}\n```\n\n## Public API\n\n```go\nfunc Middleware(cfg Config) func(http.Handler) http.Handler\nfunc Wrap(next http.Handler, cfg Config) http.Handler\nfunc DefaultConfig() Config\nfunc FromContext(ctx context.Context) (*Profile, bool)\n```\n\n## What Gets Collected\n\n### `Meta`\n\n- Method, path, normalized content type, and request content length\n- Observed body bytes, sample size, truncation state, and header count/bytes\n- End-to-end analysis duration plus per-analyzer durations\n- Explicit skip and downgrade warnings when body-based analysis does not run\n\n### `Entropy`\n\n- Shannon entropy over the sampled request body bytes\n- Normalized entropy, unique-byte count, repetition ratio, and approximate compressibility\n- Sample size and observed body size\n- Sampling strategy metadata\n\n### `Fingerprint`\n\n- Normalized request headers\n- Optional client IP and TLS metadata\n- Source flags plus weak/strong hashes with versioning\n- Optional hash-only mode when you do not want raw normalized fields in results\n- Trusted-proxy CIDR policy and alternate hash algorithms such as `sha1`, `sha256`, `sha512`, and `fnv1a64`\n\n### `Complexity`\n\n- JSON depth, field counts, scalar/null/string stats, and key-length summaries\n- Object/array counts, max array length, and max object width\n- URL-encoded form statistics plus key/value length summaries\n- Optional multipart file metadata such as file counts, extensions, and content types\n- Interpretable score factors\n\n### `Charset`\n\n- ASCII, digit, whitespace, symbol, control, and non-ASCII ratios\n- Emoji ratio and invisible-character density\n- Unicode script distribution counts when enabled\n- Optional confusable/homoglyph detection and format-specific metrics for JSON, XML, and form payloads\n- Optional suspicious flags such as invalid UTF-8, zero-width characters, and mixed scripts\n\n## 🧭 Configuration\n\n`DefaultConfig()` returns a ready-to-use setup with bounded defaults:\n\n- `BodyConfig` limits read size, sample size, methods, and content types\n- `FingerprintConfig` controls headers, proxy trust, TLS metadata, and hash versioning\n- `ComplexityConfig` controls JSON depth, field limits, and supported content types\n- `CharsetConfig` controls text analysis size and suspicious-pattern detection\n\nTypical customization:\n\n```go\ncfg := webprofiler.DefaultConfig()\ncfg.Body.MaxReadBytes = 64 \u003c\u003c 10\ncfg.Body.SampleBytes = 8 \u003c\u003c 10\ncfg.Body.SampleStrategy = webprofiler.SampleStrategyHeadTail\ncfg.Body.EnableCompressedAnalysis = true\ncfg.Fingerprint.IncludeIP = true\ncfg.Fingerprint.TrustProxy = true\ncfg.Fingerprint.TrustedProxyCIDRs = []string{\"10.0.0.0/8\", \"192.168.0.0/16\"}\ncfg.Fingerprint.HashAlgorithm = \"sha512\"\ncfg.Fingerprint.ExposeFields = false\ncfg.Complexity.MaxJSONDepth = 16\ncfg.Complexity.EnableMultipartMeta = true\ncfg.Charset.EnableConfusableDetection = true\ncfg.Charset.EnableFormatSpecificMetrics = true\n```\n\n## Performance Notes\n\nThe current benchmark baseline is tracked in [`docs/benchmark_baseline.md`](docs/benchmark_baseline.md).\n\n- `MetaInfo.AnalysisDuration` records middleware analysis time as `time.Duration`\n- `MetaInfo` now also exposes per-analyzer timings so you can see where request profiling time is spent\n- The example exposes both `analysis_duration` and `analysis_duration_ns` so you can read it directly or aggregate it precisely\n- The SHA-256 fingerprint step hashes a very small normalized string built from a few headers and optional TLS/IP fields, so in most cases it is not the main cost\n- Alternate fingerprint hash algorithms are available, but `sha256` remains the best default tradeoff for compatibility and stability\n- Compressed-body inspection is optional because it adds decompression work; enable it when encoded request payloads are common in your traffic\n- If you want the cheapest possible fingerprint output, set `Fingerprint.ExposeFields = false` to keep only hashes and source metadata\n- In practice, request-body capture, JSON parsing, and charset scanning are usually more expensive than the final SHA-256 call\n- If you run at very high QPS, benchmark with your own traffic and disable `EnableFingerprint`, `IncludeIP`, or `IncludeTLS` if you want an even cheaper profile\n\n## Example Response Fields\n\nThe native example at [`example/main.go`](example/main.go) returns a JSON payload like the one you posted. The following table maps each field to its meaning:\n\n| Field | Meaning |\n| --- | --- |\n| `path` | Request path seen by the middleware and handler. |\n| `body` | Request body re-read inside the handler, proving the middleware restored `r.Body`. |\n| `observed_bytes` | Number of body bytes actually observed before sampling. |\n| `header_count` | Number of request header entries counted by the middleware, including `Host`. |\n| `header_bytes` | Approximate size of header keys and values counted by the middleware. |\n| `entropy.Value` | Shannon entropy of the sampled body bytes. Higher usually means more byte diversity. |\n| `entropy.NormalizedValue` | Entropy normalized to an approximate `0..1` range by dividing by `8 bits/byte`. |\n| `entropy.SampledBytes` | Number of bytes used for entropy calculation. |\n| `entropy.TotalObservedBytes` | Number of body bytes observed by the middleware before sampling. |\n| `entropy.UniqueByteCount` | Number of distinct byte values seen in the sampled body. |\n| `entropy.RepetitionRatio` | Share of sampled bytes that repeat beyond their first occurrence. |\n| `entropy.CompressionRatio` | Approximate gzip-compressed size divided by sampled size. Lower often means more repetitive content. |\n| `entropy.ApproxCompressibility` | A convenience score derived from the compression ratio. Higher usually means easier-to-compress content. |\n| `entropy.SampleStrategy` | Sampling mode currently used for body analysis. |\n| `fingerprint.Fields` | Normalized fields used to build the request fingerprint. |\n| `fingerprint.SourceFlags` | Which input sources contributed to the fingerprint, for example `headers`, `tls`, or `ip`. |\n| `fingerprint.Hash` | Stable SHA-256 digest of the normalized fingerprint fields. |\n| `fingerprint.WeakHash` | Fingerprint hash that excludes more volatile inputs such as client IP. |\n| `fingerprint.StrongHash` | Fingerprint hash built from the full configured input set. |\n| `fingerprint.HashAlgorithm` | Fingerprint hash algorithm currently used. |\n| `fingerprint.HashVersion` | Fingerprint schema/version identifier. |\n| `complexity.ContentType` | Content type used for complexity analysis. |\n| `complexity.Depth` | Observed structural depth of the parsed body. Depth is counted on the recursive walk, so scalar leaf values increase the final depth level. |\n| `complexity.FieldCount` | Total number of parsed fields/values. |\n| `complexity.ObjectCount` | Number of JSON objects encountered. |\n| `complexity.ArrayCount` | Number of arrays encountered. |\n| `complexity.ScalarCount` | Number of non-container values such as strings, numbers, booleans, and `null`. |\n| `complexity.NullCount` | Number of `null` values seen in JSON. |\n| `complexity.StringCount` | Number of string values seen in the parsed payload. |\n| `complexity.UniqueKeyCount` | Number of keys encountered across parsed objects or form keys. |\n| `complexity.MaxArrayLength` | Longest array length seen in the body. |\n| `complexity.MaxObjectFields` | Largest number of fields found in a single object or form key set. |\n| `complexity.MaxKeyLength` | Longest key length seen during complexity analysis. |\n| `complexity.MaxStringLength` | Longest JSON string value length seen during complexity analysis. |\n| `complexity.MaxValueLength` | Longest form value length seen during complexity analysis. |\n| `complexity.AverageKeyLength` | Average key length for form inputs. |\n| `complexity.AverageValueLength` | Average value length for form inputs. |\n| `complexity.MultipartFileCount` | Number of uploaded files seen in multipart metadata mode. |\n| `complexity.MultipartFieldCount` | Number of non-file form fields seen in multipart metadata mode. |\n| `complexity.MultipartFileExtensions` | Count of file extensions seen across multipart uploads. |\n| `complexity.MultipartFileContentTypes` | Count of per-file content types seen in multipart uploads. |\n| `complexity.MultipartMaxFileNameLength` | Longest multipart file name length seen in the request. |\n| `complexity.Score` | Aggregate complexity score. |\n| `complexity.ScoreFactors` | Breakdown of how the complexity score was calculated. |\n| `charset.TotalChars` | Total characters scanned in the sampled text body. |\n| `charset.ASCIIAlphaRatio` | Ratio of ASCII letters in the sampled body. |\n| `charset.DigitRatio` | Ratio of digits in the sampled body. |\n| `charset.WhitespaceRatio` | Ratio of whitespace characters in the sampled body. |\n| `charset.SymbolRatio` | Ratio of punctuation and symbol characters in the sampled body. |\n| `charset.ControlCharRatio` | Ratio of control characters or invalid byte sequences. |\n| `charset.NonASCIIRatio` | Ratio of non-ASCII characters in the sampled body. |\n| `charset.EmojiRatio` | Ratio of emoji-like code points in the sampled text. |\n| `charset.InvisibleCharRatio` | Ratio of invisible characters such as zero-width marks or formatting controls. |\n| `charset.ConfusableCount` | Number of characters that match a built-in homoglyph/confusable set. |\n| `charset.UnicodeScriptCounts` | Count of characters per detected Unicode script when script detection is enabled. |\n| `charset.FormatMetrics` | Optional format-aware metrics for JSON, XML, or form payloads, such as token counts and repeated keys. |\n| `charset.SuspiciousFlags` | Optional markers such as invalid UTF-8, zero-width characters, or mixed scripts. |\n| `content_type` | Normalized request `Content-Type`. |\n| `content_length` | Request body length reported by the incoming request. |\n| `sampled` | Whether the middleware analyzed only a subset of the observed body. |\n| `sample_bytes` | Number of sampled bytes actually used by body analyzers. |\n| `body_truncated` | Whether body observation stopped at `MaxReadBytes`. |\n| `fingerprint_duration_ns` | Time spent building the request fingerprint. |\n| `body_capture_duration_ns` | Time spent capturing and replaying the request body. |\n| `entropy_duration_ns` | Time spent calculating body entropy. |\n| `complexity_duration_ns` | Time spent calculating complexity metrics. |\n| `charset_duration_ns` | Time spent calculating charset metrics. |\n| `analysis_duration` | Human-readable middleware analysis duration, for example `187.4µs`. |\n| `analysis_duration_ns` | Exact middleware analysis duration in nanoseconds, useful for metrics and aggregation. |\n\n## 🌟 Design Boundaries\n\nThis middleware:\n\n- analyzes requests, not responses\n- does not persist or export results\n- does not block traffic on analyzer failures\n- does not guarantee deep parsing for every content type\n- does not produce business risk decisions\n\n## Result Model\n\nEach request produces a `Profile`:\n\n```go\ntype Profile struct {\n\tMeta        MetaInfo\n\tEntropy     *EntropyResult\n\tFingerprint *FingerprintResult\n\tComplexity  *ComplexityResult\n\tCharset     *CharsetResult\n\tWarnings    []Warning\n}\n```\n\nAnalyzer results are optional pointers so handlers can distinguish between disabled, skipped, and populated modules.\n\nRepository layout is intentionally simple:\n\n- root package: stable middleware API for callers\n- `example/`: runnable demo program\n- `internal/policy`: config types and normalization\n- `internal/model`: profile and result types\n- `internal/core`: request capture, analyzers, and context plumbing\n\n## Testing\n\n```bash\ngo test ./...\n```\n\nThe test suite covers middleware behavior, body replay, config normalization, analyzer outputs, and warning paths.\n\n## 🐺 License\n\nThis project is open-sourced under the [MIT License](LICENSE), which permits commercial use, modification, and distribution without requiring the original author's copyright notice to be retained.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgofurry%2Fweb-profiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgofurry%2Fweb-profiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgofurry%2Fweb-profiler/lists"}