https://github.com/itaymendel/git-forensics

A TypeScript library for providing insights from git commit history.
https://github.com/itaymendel/git-forensics
git insights
Last synced: 2 months ago
JSON representation
A TypeScript library for providing insights from git commit history.
Host: GitHub
URL: https://github.com/itaymendel/git-forensics
Owner: itaymendel
License: mit
Created: 2025-12-30T17:53:59.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-03-08T16:36:02.000Z (5 months ago)
Last Synced: 2026-03-08T20:35:14.167Z (5 months ago)
Topics: git, insights
Language: TypeScript
Homepage:
Size: 167 KB
Stars: 8
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # git-forensics

A TypeScript library for providing insights from git commit history.

## Features

- **Actionable insights**

- **Fast - ~700ms for 100,000 commits (getting the git-log will be slow)**

- **Follows file rename and removal**

- **Optimized for CI**

- **Percentile-based classification** — self-calibrating thresholds that work across any codebase size

- **Composite risk scoring** — weighted multi-metric risk scores per file

- **Integrated (a VERY basic) [code complexity engine](https://github.com/itaymendel/indent-complexity)**

- **Bring your own code complexity score**

- **Add custom metrics using full temporal history**

## Motivation

Existing git analysis tools ([code-maat](https://github.com/adamtornhill/code-maat), [git-of-theseus](https://github.com/erikbern/git-of-theseus), [Hercules](https://github.com/src-d/hercules), etc.) are great for reports but feel heavy as a backend for dev-tools. This library is designed to be lightweight, fast, and embeddable.

> **Tip:** Focus on recent history (6-9 months). While the library handles renames and long histories correctly, older data tends to add noise.

## Installation

```bash

npm install git-forensics

```

## Quick Start

```typescript

import { simpleGit } from 'simple-git';

import { computeForensics } from 'git-forensics';

const git = simpleGit('/path/to/repo');

const forensics = await computeForensics(git);

forensics.hotspots; // Files changed most often

forensics.churn; // Code volatility (lines added/deleted)

forensics.coupledPairs; // Hidden dependencies

forensics.couplingRankings; // Architectural hubs

forensics.codeAge; // Stale code detection

forensics.ownership; // Knowledge silos

forensics.communication; // Developer coordination needs

forensics.topContributors; // Per-file contributor breakdown

```

## Example Output

Running `computeForensics` on a repository returns structured data across all metrics:

```jsonc

{

  "analyzedCommits": 842,

  "dateRange": { "from": "2024-03-10", "to": "2025-01-15" },

  "metadata": { "totalFilesAnalyzed": 134, "totalAuthors": 12 },

  "hotspots": [

    { "file": "src/api/routes.ts", "revisions": 87, "exists": true },

    { "file": "src/core/engine.ts", "revisions": 64, "exists": true },

  ],

  "coupledPairs": [

    {

      "file1": "src/api/routes.ts",

      "file2": "src/api/middleware.ts",

      "couplingPercent": 82,

      "coChanges": 34,

    },

  ],

  "ownership": [

    {

      "file": "src/core/engine.ts",

      "mainDev": "alice",

      "ownershipPercent": 34,

      "fractalValue": 0.18,

      "authorCount": 7,

    },

  ],

  // ... plus churn, codeAge, couplingRankings, communication, topContributors

}

```

Passing the result to `generateInsights` produces actionable alerts:

```jsonc

[

  {

    "file": "src/core/engine.ts",

    "type": "hotspot",

    "severity": "critical",

    "data": {

      "type": "hotspot",

      "revisions": 64,

      "rank": 2,

      "percentile": 95,

    },

    "fragments": {

      "title": "Hotspot",

      "finding": "64 revisions (P95), ranked #2 in repository",

      "risk": "Top-ranked churn file — prioritize for refactoring or test hardening",

      "suggestion": "Consider breaking into smaller modules or adding test coverage",

    },

  },

  {

    "file": "src/core/engine.ts",

    "type": "ownership-risk",

    "severity": "critical",

    "data": {

      "type": "ownership-risk",

      "fractalValue": 0.18,

      "authorCount": 7,

      "mainDev": "alice",

      "percentile": 92,

    },

    "fragments": {

      "title": "Fragmented Ownership",

      "finding": "7 contributors, fragmentation score 0.18 (P92)",

      "risk": "Diffuse ownership slows review cycles and increases merge conflicts",

      "suggestion": "Request review from alice (primary contributor)",

    },

  },

  // ... insights generated for each metric that exceeds thresholds

]

```

## Actionable Insights

`generateInsights` transforms metrics into alerts with severity (`warning`, `critical`) and human-readable fragments (`title`, `finding`, `risk`, `suggestion`).

Insights use **percentile-based thresholds** — a file is flagged based on where it ranks relative to other files in the same repository. This makes thresholds self-calibrating across codebases of any size.

### Insight thresholds

| Question                            | Metric             | Insight triggers when                          |

| ----------------------------------- | ------------------ | ---------------------------------------------- |

| Where's the riskiest code?          | `hotspots`         | Revisions in P75+ (warning) or P90+ (critical) |

| What keeps getting rewritten?       | `churn`            | Churn in P75+ or P90+                          |

| What hidden dependencies exist?     | `coupledPairs`     | ≥70% co-change rate (absolute, not percentile) |

| What has ripple effects?            | `couplingRankings` | Coupling score in P75+ or P90+                 |

| What's been forgotten?              | `codeAge`          | Age in P75+ or P90+                            |

| Who owns what? Any knowledge silos? | `ownership`        | ≥3 authors, fragmentation in P75+ or P90+      |

All thresholds are overridable — pass a partial `thresholds` object and only the values you specify will change:

```typescript

const insights = generateInsights(forensics, {

  thresholds: {

    hotspot: { warning: 80, critical: 95 }, // percentile cutoffs

    churn: { warning: 80 },

    staleCode: { warning: 60, critical: 85 },

    coupling: { minPercent: 80 }, // stays absolute — not percentile-based

    ownershipRisk: { warning: 70, critical: 90, minAuthors: 4 },

    couplingScore: { warning: 80, critical: 95 },

  },

});

```

### Analysis options

The analysis pipeline has its own configurable thresholds that control what data is collected:

```typescript

const forensics = await computeForensics(git, {

  maxFilesPerCommit: 50, // skip large commits from coupling analysis (default: 50)

  minCoChanges: 3, // minimum co-changes to report a coupled pair (default: 3)

  minCouplingPercent: 30, // minimum coupling % to report a pair (default: 30)

  minSharedEntities: 2, // minimum shared files for communication pairs (default: 2)

});

```

These options are also available on `computeForensicsFromData()`.

### Build your own insights

`forensics.stats` contains the complete temporal history—every commit, by every author, for every file. Access `stats.fileStats[file].byAuthor`, `authorContributions`, `nameHistory`, etc. to build custom metrics like temporal histograms, expertise scores, or handoff detection.

## Composite Risk Score

`computeRiskScores` produces a single 0-100 risk score per file by combining percentile ranks across all metrics with configurable weights:

```typescript

import { computeRiskScores } from 'git-forensics';

const scores = computeRiskScores(forensics);

// [

//   { file: 'src/core/engine.ts', riskScore: 87.5, breakdown: { revisions: 22.5, churn: 25, ownershipRisk: 18, age: 12, couplingScore: 10 } },

//   { file: 'src/api/routes.ts', riskScore: 72.0, breakdown: { ... } },

//   ...

// ]

```

Default weights:

| Metric         | Weight |

| -------------- | ------ |

| Revisions      | 0.25   |

| Churn          | 0.25   |

| Ownership Risk | 0.20   |

| Age            | 0.15   |

| Coupling Score | 0.15   |

Override weights to match your priorities:

```typescript

const scores = computeRiskScores(forensics, {

  revisions: 0.4,

  churn: 0.3,

  ownershipRisk: 0.1,

  age: 0.1,

  couplingScore: 0.1,

});

```

## File Metrics with Percentiles

`extractFileMetrics` flattens forensics into per-file rows for storage. Pass `includePercentiles: true` to enrich each row with percentile ranks and a composite risk score:

```typescript

import { extractFileMetrics } from 'git-forensics';

const metrics = extractFileMetrics(forensics, { includePercentiles: true });

// Each entry includes:

// {

//   file, revisions, ageMonths, churn, fractalValue, ...

//   percentiles: { revisions: 90, churn: 75, ownershipRisk: 85, ageMonths: 60, couplingScore: 40 },

//   riskScore: 72.5,

// }

```

## Percentile Utilities

The underlying percentile functions are exported for building custom scoring:

```typescript

import {

  percentileRank,

  createPercentileRanker,

  createInvertedPercentileRanker,

} from 'git-forensics';

// One-off calculation

percentileRank(50, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]); // 45

// Reusable ranker for repeated lookups

const rank = createPercentileRanker([10, 20, 30, 40, 50]);

rank(30); // 50

rank(50); // 90

// Inverted ranker (lower values = higher percentile)

const riskRank = createInvertedPercentileRanker([0.1, 0.3, 0.5, 0.7, 0.9]);

riskRank(0.1); // 90 (lowest value = highest risk)

```

## Complexity Analysis

git-forensics separates commit analysis from static code analysis. It provides optional complexity helpers for convenience (using [`indent-complexity`](https://github.com/itaymendel/indent-complexity)).

It is recommended you use a language-aware complexity scoring and pass the results to `computeForensics`.

## CI Usage

### Building a report

Loop over insights and build a PR comment or CI annotation:

```typescript

const insights = generateInsights(forensics, { minSeverity: 'warning' });

for (const insight of insights) {

  const prefix = insight.severity === 'critical' ? '[CRITICAL]' : '[WARNING]';

  console.log(`${prefix} ${insight.file} - ${insight.fragments.title}`);

  console.log(`  ${insight.fragments.finding}`);

  console.log(`  ${insight.fragments.suggestion}\n`);

}

```

### Optimization: Store & Reuse (large codebases)

For very large repos, store the `computeForensics` result between runs and rehydrate with `generateInsights` — no git scan needed:

```typescript

import { generateInsights, getChangedFiles } from 'git-forensics';

// Fetch pre-computed forensics from your server/cache

const forensics = await fetch('https://your-server/api/forensics?repo=org/repo').then((r) =>

  r.json()

);

// Generate insights only for PR changed files

const changedFiles = await getChangedFiles(git, 'origin/main');

const insights = generateInsights(forensics, { files: changedFiles, minSeverity: 'warning' });

```

## Data-Driven API

For environments without direct git access use `computeForensicsFromData()` with pre-fetched git data:

```typescript

import { computeForensicsFromData, gitLogDataSchema, validateGitLogData } from 'git-forensics';

// Data must match the following format

const data = {

  log: {

    all: [

      {

        hash: 'abc123',

        date: '2025-01-15T10:00:00Z',

        author_name: 'Alice',

        message: 'Add feature',

        diff: {

          files: [

            { file: 'src/app.ts', insertions: 50, deletions: 10 },

            { file: 'src/utils.ts', insertions: 20, deletions: 5 },

          ],

        },

      },

      // ... more commits

    ],

  },

  trackedFiles: 'src/app.ts\nsrc/utils.ts\nsrc/index.ts', // from git ls-files

};

// Print JSON-schema if needed

console.log(gitLogDataSchema); // JSON Schema object

// Validate before processing

validateGitLogData(data); // throws if invalid

const forensics = computeForensicsFromData(data);

```

## Migration from v1.x

v2.0.0 replaces absolute thresholds with percentile-based classification. Key changes:

- **`InsightThresholds`** values are now percentile cutoffs (0-100), not raw metric values

- **`InsightData`** variants (except `coupling`) include a `percentile` field

- **Stale-code severity** changed from `info`/`warning` to `warning`/`critical`

- **Finding strings** now include `(Pxx)` percentile annotations

- **Generator function signatures** added a `percentileRank` parameter (affects direct generator importers)

- New exports: `computeRiskScores`, `DEFAULT_RISK_WEIGHTS`, `percentileRank`, `createPercentileRanker`, `createInvertedPercentileRanker`

- New types: `PercentileThresholds`, `RiskWeights`, `FileRiskScore`, `ExtractFileMetricsOptions`

## Attribution

Based on concepts from Adam Tornhill's [Your Code as a Crime Scene](https://pragprog.com/titles/atcrime2/your-code-as-a-crime-scene-second-edition/) and [Software Design X-Rays](https://pragprog.com/titles/atevol/software-design-x-rays/).

## License

MIT
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/itaymendel/git-forensics

Awesome Lists containing this project

README