{"id":14155156,"url":"https://github.com/datopian/markdowndb","last_synced_at":"2025-05-15T12:06:55.840Z","repository":{"id":160644016,"uuid":"633827041","full_name":"datopian/markdowndb","owner":"datopian","description":"Turn markdown files into structured, queryable data with JS. Build markdown-powered docs, blogs, and sites quickly and reliably.","archived":false,"fork":false,"pushed_at":"2025-03-10T11:39:31.000Z","size":6964,"stargazers_count":357,"open_issues_count":19,"forks_count":17,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-11T11:06:49.088Z","etag":null,"topics":["awesomeness","catalog","contentlayer","contentlayer-nextjs","contentlayer-typescript","database","headless-cms","jamstack","markdown"],"latest_commit_sha":null,"homepage":"https://markdowndb.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datopian.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-28T11:21:05.000Z","updated_at":"2025-05-09T16:34:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"6baaa5f0-5bb5-4463-8fcb-a755fcc1caf0","html_url":"https://github.com/datopian/markdowndb","commit_stats":{"total_commits":92,"total_committers":7,"mean_commits":"13.142857142857142","dds":0.6847826086956521,"last_synced_commit":"86bc39ad75fe2d8ecaa4bf6f33c760c7a9cc26d8"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fmarkdowndb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fmarkdowndb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fmarkdowndb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fmarkdowndb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datopian","download_url":"https://codeload.github.com/datopian/markdowndb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254337613,"owners_count":22054253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesomeness","catalog","contentlayer","contentlayer-nextjs","contentlayer-typescript","database","headless-cms","jamstack","markdown"],"created_at":"2024-08-17T08:02:18.462Z","updated_at":"2025-05-15T12:06:50.817Z","avatar_url":"https://github.com/datopian.png","language":"TypeScript","readme":"# MarkdownDB\n\n[![](https://badgen.net/npm/v/mddb)](https://www.npmjs.com/package/mddb)\n[![](https://dcbadge.vercel.app/api/server/xfFDMPU9dC)](https://discord.gg/xfFDMPU9dC)\n\nMarkdownDB is a javascript library that turns markdown files into structured queryable databaase (SQL-based and simple JSON). It helps you build rich markdown-powered sites easily and reliably. Specifically it:\n\n- Parses your markdown files to extract structured data (frontmatter, tags etc) and builds a queryable index either in JSON files or a local SQLite database\n- Provides a lightweight javascript API for querying the index and using the data files into your application\n\n## Features and Roadmap\n\n- [x] **Index a folder of files** - create a db index given a folder of markdown and other files\n  - [x] **Command line tool for indexing**: Create a markdowndb (index) on the command line **v0.1**\n  - [x] SQL(ite) index **v0.2**\n  - [x] JSON index **v0.6**\n  - [ ] BONUS Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)\n  - [x] Configuration for Including/Excluding Files in the folder\n\nExtract structured data like:\n\n- [x] **Frontmatter metadata**: Extract markdown frontmatter and add in a metadata field\n  - [ ] deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X\n- [x] **Tags**: Extracts tags in markdown pages\n  - [x] Extract tags in frontmatter **v0.1**\n  - [x] Extract tags in body like `#abc` **v0.5**\n- [x] **Links**: links between files like `[hello](abc.md)` or wikilink style `[[xyz]]` so we can compute backlinks or deadlinks etc (see #4) **v0.2**\n- [x] **Tasks**: extract tasks like this `- [ ] this is a task` (See obsidian data view) **v0.4**\n\nData enhancement and validation\n\n- [x] **Computed fields**: add new metadata properties based on existing metadata e.g. a slug field computed from title field; or, adding a title based on the first h1 heading in a doc; or, a type field based on the folder of the file (e.g. these are blog posts). cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields.\n- [ ] 🚧 **Data validation and Document Types**: validate metadata against a schema/type so that I know the data in the database is \"valid\" #55\n  - [ ] BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type\n\n## Quick start\n\n### Have a folder of markdown content\n\nFor example, your blog posts. Each file can have a YAML frontmatter header with metadata like title, date, tags, etc.\n\n```md\n---\ntitle: My first blog post\ndate: 2021-01-01\ntags: [a, b, c]\nauthor: John Doe\n---\n\n# My first blog post\n\nThis is my first blog post.\nI'm using MarkdownDB to manage my blog posts.\n```\n\n### Index the files with MarkdownDB\n\nUse the npm `mddb` package to index Markdown files into an SQLite database. This will create a `markdown.db` file in the current directory. You can preview it with any SQLite viewer, e.g. https://sqlitebrowser.org/.\n\n```bash\n# npx mddb \u003cpath-to-folder-with-your-md-files\u003e\nnpx mddb ./blog\n```\n\n### Watching for Changes\n\nTo monitor files for changes and update the database accordingly, simply add the `--watch` flag to the command:\n```bash\nnpx mddb ./blog --watch\n```\nThis command will continuously watch for any modifications in the specified folder (`./blog`), automatically rebuilding the database whenever a change is detected.\n\n### Query your files with SQL...\n\nE.g. get all the files with with tag `a`.\n\n```sql\nSELECT files.*\nFROM files\nINNER JOIN file_tags ON files._id = file_tags.file\nWHERE file_tags.tag = 'a'\n```\n\n### ...or using MarkdownDB Node.js API in a framework of your choice!\n\nUse our Node API to query your data for your blog, wiki, docs, digital garden, or anything you want!\n\nInstall `mddb` package in your project:\n\n```bash\nnpm install mddb\n```\n\nNow, once the data is in the database, you can add the following script to your project (e.g. in `/lib` folder). It will allow you to establish a single connection to the database and use it across you app.\n\n```js\n// @/lib/mddb.mjs\nimport { MarkdownDB } from \"mddb\";\n\nconst dbPath = \"markdown.db\";\n\nconst client = new MarkdownDB({\n  client: \"sqlite3\",\n  connection: {\n    filename: dbPath,\n  },\n});\n\nconst clientPromise = client.init();\n\nexport default clientPromise;\n```\n\nNow, you can import it across your project to query the database, e.g.:\n\n```js\nimport clientPromise from \"@/lib/mddb\";\n\nconst mddb = await clientPromise;\nconst blogs = await mddb.getFiles({\n  folder: \"blog\",\n  extensions: [\"md\", \"mdx\"],\n});\n```\n\n## Computed Fields\n\nThis feature helps you define functions that compute additional fields you want to include.\n\n### Step 1: Define the Computed Field Function\n\nNext, define a function that computes the additional field you want to include. In this example, we have a function named `addTitle` that extracts the title from the first heading in the AST (Abstract Syntax Tree) of a Markdown file.\n\n```javascript\nconst addTitle = (fileInfo, ast) =\u003e {\n  // Find the first header node in the AST\n  const headerNode = ast.children.find((node) =\u003e node.type === \"heading\");\n\n  // Extract the text content from the header node\n  const title = headerNode\n    ? headerNode.children.map((child) =\u003e child.value).join(\"\")\n    : \"\";\n\n  // Add the title property to the fileInfo\n  fileInfo.title = title;\n};\n```\n\n### Step 2: Indexing the Folder with Computed Fields\n\nNow, use the `client.indexFolder` method to scan and index the folder containing your Markdown files. Pass the `addTitle` function in the `computedFields` option array to include the computed title in the database.\n\n```javascript\nclient.indexFolder(folderPath: \"PATH_TO_FOLDER\", customConfig: { computedFields: [addTitle] });\n```\n\n## Configuring `markdowndb.config.js`\n\n- Implement computed fields to dynamically calculate values based on specified logic or dependencies.\n- Specify the patterns for including or excluding files in MarkdownDB.\n\n### Example Configuration\n\nHere's an example `markdowndb.config.js` with custom configurations:\n\n```javascript\nexport default {\n  computedFields: [\n    (fileInfo, ast) =\u003e {\n      // Your custom logic here\n    },\n  ],\n  include: [\"docs/**/*.md\"], // Include only files matching this pattern\n  exclude: [\"drafts/**/*.md\"], // Exclude those files matching this pattern\n};\n```\n\n### (Optional) Index your files in a `prebuild` script\n\n```json\n{\n  \"name\": \"my-mddb-app\",\n  \"scripts\": {\n    ...\n    \"mddb\": \"mddb \u003cpath-to-your-content-folder\u003e\",\n    \"prebuild\": \"npm run mddb\"\n  },\n  ...\n}\n\n```\n\n### With Next.js project\n\nFor example, in your Next.js project's pages, you could do:\n\n```js\n// @/pages/blog/index.js\nimport React from \"react\";\nimport clientPromise from \"@/lib/mddb.mjs\";\n\nexport default function Blog({ blogs }) {\n  return (\n    \u003cdiv\u003e\n      \u003ch1\u003eBlog\u003c/h1\u003e\n      \u003cul\u003e\n        {blogs.map((blog) =\u003e (\n          \u003cli key={blog.id}\u003e\n            \u003ca href={blog.url_path}\u003e{blog.title}\u003c/a\u003e\n          \u003c/li\u003e\n        ))}\n      \u003c/ul\u003e\n    \u003c/div\u003e\n  );\n}\n\nexport const getStaticProps = async () =\u003e {\n  const mddb = await clientPromise;\n  // get all files that are not marked as draft in the frontmatter\n  const blogFiles = await mddb.getFiles({\n    frontmatter: {\n      draft: false,\n    },\n  });\n\n  const blogsList = blogFiles.map(({ metadata, url_path }) =\u003e ({\n    ...metadata,\n    url_path,\n  }));\n\n  return {\n    props: {\n      blogs: blogsList,\n    },\n  };\n};\n```\n\n## API reference\n\n### Queries\n\n**Retrieve a file by URL path:**\n\n```ts\nmddb.getFileByUrl(\"urlPath\");\n```\n\nCurrently used file path -\u003e url resolver function:\n\n```ts\nconst defaultFilePathToUrl = (filePath: string) =\u003e {\n  let url = filePath\n    .replace(/\\.(mdx|md)/, \"\") // remove file extension\n    .replace(/\\\\/g, \"/\") // replace windows backslash with forward slash\n    .replace(/(\\/)?index$/, \"\"); // remove index at the end for index.md files\n  url = url.length \u003e 0 ? url : \"/\"; // for home page\n  return encodeURI(url);\n};\n```\n\n🚧 The resolver function will be configurable in the future.\n\n**Retrieve a file by it's database ID:**\n\n```ts\nmddb.getFileByUrl(\"fileID\");\n```\n\n**Get all indexed files**:\n\n```ts\nmddb.getFiles();\n```\n\n**By file types**:\n\nYou can specify `type` of the document in its frontmatter. You can then get all the files of this type, e.g. all `blog` type documents.\n\n```ts\nmddb.getFiles({ filetypes: [\"blog\", \"article\"] }); // files of either blog or article type\n```\n\n**By tags:**\n\n```ts\nmddb.getFiles({ tags: [\"tag1\", \"tag2\"] }); // files tagged with either tag1 or tag2\n```\n\n**By file extensions:**\n\n```ts\nmddb.getFiles({ extensions: [\"mdx\", \"md\"] }); // all md and mdx files\n```\n\n**By frontmatter fields:**\n\nYou can query by multiple frontmatter fields at once.\n\nAt them moment, only exact matches are supported. However, `false` values do not need to be set explicitly. I.e. if you set `draft: true` on some blog posts and want to get all the posts that are **not drafts**, you don't have to explicitly set `draft: false` on them.\n\n```ts\nmddb.getFiles({\n  frontmatter: {\n    key1: \"value1\",\n    key2: true,\n    key3: 123,\n    key4: [\"a\", \"b\", \"c\"], // this will match exactly [\"a\", \"b\", \"c\"]\n  },\n});\n```\n\n**By folder:**\n\nGet all files in a subfolder (path relative to your content folder).\n\n```ts\nmddb.getFiles({ folder: \"path\" });\n```\n\n**Combined conditions:**\n\n```ts\nmddb.getFiles({ tags: [\"tag1\"], filetypes: [\"blog\"], extensions: [\"md\"] });\n```\n\n**Retrieve all tags:**\n\n```ts\nmddb.getTags();\n```\n\n**Get links (forward or backward) related to a file:**\n\n```ts\nmddb.getLinks({ fileId: \"ID\", direction: \"forward\" });\n```\n\n## Architecture\n\n```mermaid\ngraph TD\n\nmarkdown --remark-parse--\u003e st[syntax tree]\nst --extract features--\u003e jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]\njsobj1 --computing--\u003e jsobj[TS Objects]\njsobj --convert to sql--\u003e sqlite[SQLite markdown.db]\njsobj --write to disk--\u003e json[JSON on disk in .markdowndb folder]\njsobj --tests--\u003e testoutput[Test results]\n```\n\n## Related Efforts\n\nSome related efforts:\n\n- https://github.com/sdorra/content-collections ⭐345 as of 2024-07-28\n- https://github.com/zce/velite ➕2024-06-27 ⭐327\n\n![image](https://github.com/datopian/markdowndb/assets/180658/5d94b5a2-163c-4b67-b9d7-78f1d246829d)\n\n![image](https://github.com/user-attachments/assets/88308e18-2426-4b8b-abe0-d7394de7e5a4)\n","funding_links":[],"categories":["TypeScript","markdown"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatopian%2Fmarkdowndb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatopian%2Fmarkdowndb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatopian%2Fmarkdowndb/lists"}