{"id":49781846,"url":"https://github.com/itaymendel/git-forensics","last_synced_at":"2026-05-11T21:41:41.969Z","repository":{"id":343056555,"uuid":"1125427754","full_name":"itaymendel/git-forensics","owner":"itaymendel","description":"A TypeScript library for providing insights from git commit history.","archived":false,"fork":false,"pushed_at":"2026-03-08T16:36:02.000Z","size":171,"stargazers_count":8,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-08T20:35:14.167Z","etag":null,"topics":["git","insights"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itaymendel.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-30T17:53:59.000Z","updated_at":"2026-03-08T16:36:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/itaymendel/git-forensics","commit_stats":null,"previous_names":["itaymendel/git-forensics"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/itaymendel/git-forensics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itaymendel%2Fgit-forensics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itaymendel%2Fgit-forensics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itaymendel%2Fgit-forensics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itaymendel%2Fgit-forensics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itaymendel","download_url":"https://codeload.github.com/itaymendel/git-forensics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itaymendel%2Fgit-forensics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32914525,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-11T17:09:15.040Z","status":"ssl_error","status_checked_at":"2026-05-11T17:08:45.420Z","response_time":120,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git","insights"],"created_at":"2026-05-11T21:41:41.322Z","updated_at":"2026-05-11T21:41:41.963Z","avatar_url":"https://github.com/itaymendel.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# git-forensics\n\nA TypeScript library for providing insights from git commit history.\n\n## Features\n\n- **Actionable insights**\n- **Fast - ~700ms for 100,000 commits (getting the git-log will be slow)**\n- **Follows file rename and removal**\n- **Optimized for CI**\n- **Percentile-based classification** — self-calibrating thresholds that work across any codebase size\n- **Composite risk scoring** — weighted multi-metric risk scores per file\n- **Integrated (a VERY basic) [code complexity engine](https://github.com/itaymendel/indent-complexity)**\n- **Bring your own code complexity score**\n- **Add custom metrics using full temporal history**\n\n## Motivation\n\nExisting git analysis tools ([code-maat](https://github.com/adamtornhill/code-maat), [git-of-theseus](https://github.com/erikbern/git-of-theseus), [Hercules](https://github.com/src-d/hercules), etc.) are great for reports but feel heavy as a backend for dev-tools. This library is designed to be lightweight, fast, and embeddable.\n\n\u003e **Tip:** Focus on recent history (6-9 months). While the library handles renames and long histories correctly, older data tends to add noise.\n\n## Installation\n\n```bash\nnpm install git-forensics\n```\n\n## Quick Start\n\n```typescript\nimport { simpleGit } from 'simple-git';\nimport { computeForensics } from 'git-forensics';\n\nconst git = simpleGit('/path/to/repo');\nconst forensics = await computeForensics(git);\n\nforensics.hotspots; // Files changed most often\nforensics.churn; // Code volatility (lines added/deleted)\nforensics.coupledPairs; // Hidden dependencies\nforensics.couplingRankings; // Architectural hubs\nforensics.codeAge; // Stale code detection\nforensics.ownership; // Knowledge silos\nforensics.communication; // Developer coordination needs\nforensics.topContributors; // Per-file contributor breakdown\n```\n\n## Example Output\n\nRunning `computeForensics` on a repository returns structured data across all metrics:\n\n```jsonc\n{\n  \"analyzedCommits\": 842,\n  \"dateRange\": { \"from\": \"2024-03-10\", \"to\": \"2025-01-15\" },\n  \"metadata\": { \"totalFilesAnalyzed\": 134, \"totalAuthors\": 12 },\n\n  \"hotspots\": [\n    { \"file\": \"src/api/routes.ts\", \"revisions\": 87, \"exists\": true },\n    { \"file\": \"src/core/engine.ts\", \"revisions\": 64, \"exists\": true },\n  ],\n\n  \"coupledPairs\": [\n    {\n      \"file1\": \"src/api/routes.ts\",\n      \"file2\": \"src/api/middleware.ts\",\n      \"couplingPercent\": 82,\n      \"coChanges\": 34,\n    },\n  ],\n\n  \"ownership\": [\n    {\n      \"file\": \"src/core/engine.ts\",\n      \"mainDev\": \"alice\",\n      \"ownershipPercent\": 34,\n      \"fractalValue\": 0.18,\n      \"authorCount\": 7,\n    },\n  ],\n\n  // ... plus churn, codeAge, couplingRankings, communication, topContributors\n}\n```\n\nPassing the result to `generateInsights` produces actionable alerts:\n\n```jsonc\n[\n  {\n    \"file\": \"src/core/engine.ts\",\n    \"type\": \"hotspot\",\n    \"severity\": \"critical\",\n    \"data\": {\n      \"type\": \"hotspot\",\n      \"revisions\": 64,\n      \"rank\": 2,\n      \"percentile\": 95,\n    },\n    \"fragments\": {\n      \"title\": \"Hotspot\",\n      \"finding\": \"64 revisions (P95), ranked #2 in repository\",\n      \"risk\": \"Top-ranked churn file — prioritize for refactoring or test hardening\",\n      \"suggestion\": \"Consider breaking into smaller modules or adding test coverage\",\n    },\n  },\n  {\n    \"file\": \"src/core/engine.ts\",\n    \"type\": \"ownership-risk\",\n    \"severity\": \"critical\",\n    \"data\": {\n      \"type\": \"ownership-risk\",\n      \"fractalValue\": 0.18,\n      \"authorCount\": 7,\n      \"mainDev\": \"alice\",\n      \"percentile\": 92,\n    },\n    \"fragments\": {\n      \"title\": \"Fragmented Ownership\",\n      \"finding\": \"7 contributors, fragmentation score 0.18 (P92)\",\n      \"risk\": \"Diffuse ownership slows review cycles and increases merge conflicts\",\n      \"suggestion\": \"Request review from alice (primary contributor)\",\n    },\n  },\n  // ... insights generated for each metric that exceeds thresholds\n]\n```\n\n## Actionable Insights\n\n`generateInsights` transforms metrics into alerts with severity (`warning`, `critical`) and human-readable fragments (`title`, `finding`, `risk`, `suggestion`).\n\nInsights use **percentile-based thresholds** — a file is flagged based on where it ranks relative to other files in the same repository. This makes thresholds self-calibrating across codebases of any size.\n\n### Insight thresholds\n\n| Question                            | Metric             | Insight triggers when                          |\n| ----------------------------------- | ------------------ | ---------------------------------------------- |\n| Where's the riskiest code?          | `hotspots`         | Revisions in P75+ (warning) or P90+ (critical) |\n| What keeps getting rewritten?       | `churn`            | Churn in P75+ or P90+                          |\n| What hidden dependencies exist?     | `coupledPairs`     | ≥70% co-change rate (absolute, not percentile) |\n| What has ripple effects?            | `couplingRankings` | Coupling score in P75+ or P90+                 |\n| What's been forgotten?              | `codeAge`          | Age in P75+ or P90+                            |\n| Who owns what? Any knowledge silos? | `ownership`        | ≥3 authors, fragmentation in P75+ or P90+      |\n\nAll thresholds are overridable — pass a partial `thresholds` object and only the values you specify will change:\n\n```typescript\nconst insights = generateInsights(forensics, {\n  thresholds: {\n    hotspot: { warning: 80, critical: 95 }, // percentile cutoffs\n    churn: { warning: 80 },\n    staleCode: { warning: 60, critical: 85 },\n    coupling: { minPercent: 80 }, // stays absolute — not percentile-based\n    ownershipRisk: { warning: 70, critical: 90, minAuthors: 4 },\n    couplingScore: { warning: 80, critical: 95 },\n  },\n});\n```\n\n### Analysis options\n\nThe analysis pipeline has its own configurable thresholds that control what data is collected:\n\n```typescript\nconst forensics = await computeForensics(git, {\n  maxFilesPerCommit: 50, // skip large commits from coupling analysis (default: 50)\n  minCoChanges: 3, // minimum co-changes to report a coupled pair (default: 3)\n  minCouplingPercent: 30, // minimum coupling % to report a pair (default: 30)\n  minSharedEntities: 2, // minimum shared files for communication pairs (default: 2)\n});\n```\n\nThese options are also available on `computeForensicsFromData()`.\n\n### Build your own insights\n\n`forensics.stats` contains the complete temporal history—every commit, by every author, for every file. Access `stats.fileStats[file].byAuthor`, `authorContributions`, `nameHistory`, etc. to build custom metrics like temporal histograms, expertise scores, or handoff detection.\n\n## Composite Risk Score\n\n`computeRiskScores` produces a single 0-100 risk score per file by combining percentile ranks across all metrics with configurable weights:\n\n```typescript\nimport { computeRiskScores } from 'git-forensics';\n\nconst scores = computeRiskScores(forensics);\n// [\n//   { file: 'src/core/engine.ts', riskScore: 87.5, breakdown: { revisions: 22.5, churn: 25, ownershipRisk: 18, age: 12, couplingScore: 10 } },\n//   { file: 'src/api/routes.ts', riskScore: 72.0, breakdown: { ... } },\n//   ...\n// ]\n```\n\nDefault weights:\n\n| Metric         | Weight |\n| -------------- | ------ |\n| Revisions      | 0.25   |\n| Churn          | 0.25   |\n| Ownership Risk | 0.20   |\n| Age            | 0.15   |\n| Coupling Score | 0.15   |\n\nOverride weights to match your priorities:\n\n```typescript\nconst scores = computeRiskScores(forensics, {\n  revisions: 0.4,\n  churn: 0.3,\n  ownershipRisk: 0.1,\n  age: 0.1,\n  couplingScore: 0.1,\n});\n```\n\n## File Metrics with Percentiles\n\n`extractFileMetrics` flattens forensics into per-file rows for storage. Pass `includePercentiles: true` to enrich each row with percentile ranks and a composite risk score:\n\n```typescript\nimport { extractFileMetrics } from 'git-forensics';\n\nconst metrics = extractFileMetrics(forensics, { includePercentiles: true });\n// Each entry includes:\n// {\n//   file, revisions, ageMonths, churn, fractalValue, ...\n//   percentiles: { revisions: 90, churn: 75, ownershipRisk: 85, ageMonths: 60, couplingScore: 40 },\n//   riskScore: 72.5,\n// }\n```\n\n## Percentile Utilities\n\nThe underlying percentile functions are exported for building custom scoring:\n\n```typescript\nimport {\n  percentileRank,\n  createPercentileRanker,\n  createInvertedPercentileRanker,\n} from 'git-forensics';\n\n// One-off calculation\npercentileRank(50, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]); // 45\n\n// Reusable ranker for repeated lookups\nconst rank = createPercentileRanker([10, 20, 30, 40, 50]);\nrank(30); // 50\nrank(50); // 90\n\n// Inverted ranker (lower values = higher percentile)\nconst riskRank = createInvertedPercentileRanker([0.1, 0.3, 0.5, 0.7, 0.9]);\nriskRank(0.1); // 90 (lowest value = highest risk)\n```\n\n## Complexity Analysis\n\ngit-forensics separates commit analysis from static code analysis. It provides optional complexity helpers for convenience (using [`indent-complexity`](https://github.com/itaymendel/indent-complexity)).\nIt is recommended you use a language-aware complexity scoring and pass the results to `computeForensics`.\n\n## CI Usage\n\n### Building a report\n\nLoop over insights and build a PR comment or CI annotation:\n\n```typescript\nconst insights = generateInsights(forensics, { minSeverity: 'warning' });\n\nfor (const insight of insights) {\n  const prefix = insight.severity === 'critical' ? '[CRITICAL]' : '[WARNING]';\n  console.log(`${prefix} ${insight.file} - ${insight.fragments.title}`);\n  console.log(`  ${insight.fragments.finding}`);\n  console.log(`  ${insight.fragments.suggestion}\\n`);\n}\n```\n\n### Optimization: Store \u0026 Reuse (large codebases)\n\nFor very large repos, store the `computeForensics` result between runs and rehydrate with `generateInsights` — no git scan needed:\n\n```typescript\nimport { generateInsights, getChangedFiles } from 'git-forensics';\n\n// Fetch pre-computed forensics from your server/cache\nconst forensics = await fetch('https://your-server/api/forensics?repo=org/repo').then((r) =\u003e\n  r.json()\n);\n\n// Generate insights only for PR changed files\nconst changedFiles = await getChangedFiles(git, 'origin/main');\nconst insights = generateInsights(forensics, { files: changedFiles, minSeverity: 'warning' });\n```\n\n## Data-Driven API\n\nFor environments without direct git access use `computeForensicsFromData()` with pre-fetched git data:\n\n```typescript\nimport { computeForensicsFromData, gitLogDataSchema, validateGitLogData } from 'git-forensics';\n\n// Data must match the following format\nconst data = {\n  log: {\n    all: [\n      {\n        hash: 'abc123',\n        date: '2025-01-15T10:00:00Z',\n        author_name: 'Alice',\n        message: 'Add feature',\n        diff: {\n          files: [\n            { file: 'src/app.ts', insertions: 50, deletions: 10 },\n            { file: 'src/utils.ts', insertions: 20, deletions: 5 },\n          ],\n        },\n      },\n      // ... more commits\n    ],\n  },\n  trackedFiles: 'src/app.ts\\nsrc/utils.ts\\nsrc/index.ts', // from git ls-files\n};\n\n// Print JSON-schema if needed\nconsole.log(gitLogDataSchema); // JSON Schema object\n\n// Validate before processing\nvalidateGitLogData(data); // throws if invalid\n\nconst forensics = computeForensicsFromData(data);\n```\n\n## Migration from v1.x\n\nv2.0.0 replaces absolute thresholds with percentile-based classification. Key changes:\n\n- **`InsightThresholds`** values are now percentile cutoffs (0-100), not raw metric values\n- **`InsightData`** variants (except `coupling`) include a `percentile` field\n- **Stale-code severity** changed from `info`/`warning` to `warning`/`critical`\n- **Finding strings** now include `(Pxx)` percentile annotations\n- **Generator function signatures** added a `percentileRank` parameter (affects direct generator importers)\n- New exports: `computeRiskScores`, `DEFAULT_RISK_WEIGHTS`, `percentileRank`, `createPercentileRanker`, `createInvertedPercentileRanker`\n- New types: `PercentileThresholds`, `RiskWeights`, `FileRiskScore`, `ExtractFileMetricsOptions`\n\n## Attribution\n\nBased on concepts from Adam Tornhill's [Your Code as a Crime Scene](https://pragprog.com/titles/atcrime2/your-code-as-a-crime-scene-second-edition/) and [Software Design X-Rays](https://pragprog.com/titles/atevol/software-design-x-rays/).\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitaymendel%2Fgit-forensics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitaymendel%2Fgit-forensics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitaymendel%2Fgit-forensics/lists"}