https://github.com/sayak-sarkar/contextlake

A local context layer for AI tools: mirror your repositories, index them into a knowledge graph, and serve it over MCP — so agents work from real source.
https://github.com/sayak-sarkar/contextlake

ai cli code-search context gitlab knowledge-graph mcp

Last synced: 2 days ago
JSON representation

A local context layer for AI tools: mirror your repositories, index them into a knowledge graph, and serve it over MCP — so agents work from real source.

Host: GitHub
URL: https://github.com/sayak-sarkar/contextlake
Owner: sayak-sarkar
License: mit
Created: 2026-06-21T07:26:00.000Z (9 days ago)
Default Branch: main
Last Pushed: 2026-06-22T00:09:05.000Z (8 days ago)
Last Synced: 2026-06-22T02:26:27.290Z (8 days ago)
Topics: ai, cli, code-search, context, gitlab, knowledge-graph, mcp
Language: Python
Size: 328 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Roadmap: ROADMAP.md

Awesome Lists containing this project

README

          


  



contextlake

All your real context, in one local lake.



  A local context layer for your AI tools: mirror your repositories, index them


  into a knowledge graph, and serve it over MCP, so agents answer from real source instead of guessing.





  

  

  

  

  



---

## Why contextlake

Your AI assistant is only as good as what it can actually see. Point it at one file and

it's sharp; ask it about *the system*, which service calls this API, who depends on that

package, where a symbol is really defined across dozens of repos, and it starts guessing.

**contextlake gives your tools the real source to read.** It mirrors your repositories to

your machine, indexes them into a queryable knowledge graph, and serves that graph to your

editor over [MCP](https://modelcontextprotocol.io). Everything runs locally and offline, 

no code leaves your machine, and it carries no credentials of its own.

## How it works

contextlake is three layers you adopt one at a time. The mirror is useful on its own, and

each layer above it is optional.



  



1. **Mirror**: clone every repo you can reach in a GitLab group into a faithful copy of its

   namespace tree, each on its most active branch, kept fresh with one command. *(The source

   is GitLab today; the design is source-agnostic.)*

2. **Knowledge layer** *(optional)*: parse the mirror into a code + dependency **graph**, add

   **semantic search**, a council-verified **wiki**, and **connectors** to Atlassian / Figma / GitLab.

3. **Serve**: expose it all over **MCP** and an offline interactive **graph visualizer**, so

   agents can answer *"where is `X` defined?"* or *"who calls `Y`?"* instead of grepping.

Each layer has its own guide: the mirror in **[Usage & config](docs/usage.md)**, the knowledge

layer and serving in **[Knowledge layer](docs/knowledge-layer.md)**, and the whole flow start to

finish in **[QUICKSTART](QUICKSTART.md)**.

## Install

```bash

pip install contextlake             # the mirroring CLI

pip install "contextlake[kb]"       # + the knowledge layer (graph, search, wiki, MCP server)

```

Prefer an isolated, zero-setup install? [`uv`](https://docs.astral.sh/uv/) fetches the right

Python and an isolated environment for you:

```bash

uv tool install "contextlake[kb]"            # install the CLI on your PATH

uvx --from "contextlake[kb]" contextlake --help   # …or run it once, without installing

# pipx install "contextlake[kb]"             # pipx works too

```

From source (for contributors)

```bash

git clone https://github.com/sayak-sarkar/contextlake && cd contextlake

pip install -e ".[kb]"

```

**Prerequisites:** `git`, and, only for GitLab mirroring, an authenticated

[`glab`](https://gitlab.com/gitlab-org/cli) (`glab auth login`). The knowledge layer needs

neither. Once installed, `contextlake`, `python -m contextlake`, and `python3 contextlake.py`

are equivalent.

## Quickstart: one repo, no setup

You don't need GitLab or any config to try contextlake on a repo you already have.

No install? Run it once with [`uvx`](https://docs.astral.sh/uv/): prefix any command

below with `uvx --from "contextlake[kb]"` (e.g. `uvx --from "contextlake[kb]" contextlake index --source .`).

```bash

contextlake index                     # parse the current repo into a local knowledge graph

contextlake graph --overview --open   # open the interactive graph in your browser

contextlake serve                     # …or serve it to your AI IDE over MCP

```

**Wire it into your editor in one line**, no config file needed (it uses the local

`~/.contextlake/kb` store you just built):

```bash

claude mcp add contextlake -- contextlake serve      # Claude Code

# zero-install variant: claude mcp add contextlake -- uvx --from "contextlake[kb]" contextlake serve

```



  



contextlake graph, a whole codebase as one offline, navigable graph.


Everything lands in a local store (`~/.contextlake/kb`), nothing leaves your machine. Index

any path with `--source PATH`, or every git repo under a directory with `--workspace DIR`.

> **Want the full path**, mirror a GitLab fleet → graph → wired editor in a few minutes?

> [**QUICKSTART.md**](QUICKSTART.md) walks the whole flow.

## Fleet mode: mirror a GitLab group

Where contextlake goes beyond single-repo tools is mirroring and cross-referencing a *whole

GitLab fleet*. Copy the example config and set your group + workspace:

```bash

cp .contextlake.ini.example ~/.contextlake.ini

```

```ini

[contextlake]

work_dir = ~/work

gitlab_group = your-gitlab-group

```

```bash

contextlake status      # see where you stand (read-only)

contextlake sync        # fetch → clone → update → branches → verify → audit

```

It carries no credentials of its own (auth rides on your existing `glab` login), so

`.contextlake.ini` holds only non-secret settings and is gitignored by default. It runs

across hundreds of repos **concurrently**, with an adaptive worker pool, retries with

backoff, and **never stomps on the feature branch you're in the middle of**.

> **Behind a slow / TLS-inspecting corporate proxy** (e.g. Zscaler) where `glab`'s API calls

> time out? Set `GITLAB_TOKEN` (a `read_api` token) and contextlake enumerates projects via

> its own HTTP client, which tolerates the slow DNS where `glab`'s short dial timeout fails.

## Commands at a glance

Run any command as `contextlake `. Full per-command docs: **[docs/usage.md](docs/usage.md)**.

| Command | What it does |

| --- | --- |

| `status` | Show the workspace sync state vs GitLab (read-only) |

| `sync` | The full pipeline: fetch → clone → update → branches → verify → audit |

| `fetch` · `clone` · `update` | The sync steps, individually |

| `branches` | Switch each repo to its most active branch |

| `verify` · `audit` | Check the mirror vs GitLab; report repo health, age & drift (JSON + CSV) |

| `bootstrap` | **Turnkey**: sync + index + connect + embed + wiki + steer |

| `index` | Build the code/dependency graph (`--workspace`, incremental, `--watch`) |

| `connect` | Link repos to Atlassian / Figma / GitLab items (`--watch` to keep refreshing) |

| `embed` | Build semantic-search vectors (zero-config built-in CPU model, Ollama, or an API; incremental, `--watch`) |

| `ingest` | Aggregate external docs into the graph + semantic store (built-in `files`/`web`/`api`/`mcp` sources, or plugins) |

| `wiki` | LLM-synthesized, council-verified wiki pages |

| `query` | Search the index (`--kind`, `--repo`, `--as-of `) |

| `owners` | Likely owners / SMEs for a repo (or `--path`), ranked from git history |

| `impact` | Change-impact / blast radius: what depends on a symbol (`--hops`) |

| `graph` | Visualize the graph, offline interactive HTML / DOT / Mermaid / JSON |

| `serve` | Expose the graph over MCP (`--transport stdio`/`http`) |

| `steer` | Write editor steering, `AGENTS.md`, `.mcp.json`, `.windsurfrules`, skills |

| `lint` · `doctor` · `eval` | Graph health · environment check · retrieval-quality scoring |

Global options apply to any command: `--dry-run` (preview without changing anything),

`-v`/`-q` (verbosity), `--log-file PATH`, `--config PATH`, `--version`. Output is colorized on

a TTY and plain when piped; set `NO_COLOR` to force-disable.

## Knowledge layer

Beyond mirroring, the optional `contextlake.kb` layer turns your repos into a **knowledge

graph** and serves it to AI tools over **MCP**. It can link repos to their Atlassian / Figma /

GitLab items, add **semantic search**, write a curated **wiki**, **visualize** the graph

(offline interactive HTML, fleet overview, a symbol's neighbourhood, or a single repo), and

generate per-tool **steering files** + a skills library. Most of it needs no model; the rest

works with a local Ollama or any OpenAI-compatible endpoint.

One command sets it all up:

```bash

contextlake bootstrap --kb-config ~/.contextlake/kb.toml

```

Full guide: **[docs/knowledge-layer.md](docs/knowledge-layer.md)**.

## Documentation

- **[QUICKSTART.md](QUICKSTART.md)**, install → bootstrap → wire your editor, in minutes

- **[docs/usage.md](docs/usage.md)**, every command, configuration, branch safety, scheduling

- **[docs/knowledge-layer.md](docs/knowledge-layer.md)**, the graph, connectors, search, wiki, steering

- **[docs/internals.md](docs/internals.md)**, architecture & internals

- **[docs/releasing.md](docs/releasing.md)**, maintainer runbook: versioning, tagging, publishing

- **[CHANGELOG.md](CHANGELOG.md)** · **[ROADMAP.md](ROADMAP.md)** · **[CONTRIBUTING.md](CONTRIBUTING.md)** · **[BRANDING.md](BRANDING.md)**

## License

MIT, see [LICENSE](LICENSE). Pebble the otter is the project mascot; *deep context, clear answers.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayak-sarkar/contextlake

Awesome Lists containing this project

README

contextlake