{"id":18708235,"url":"https://github.com/unytics/catalog_builder","last_synced_at":"2025-04-12T10:34:00.209Z","repository":{"id":226194348,"uuid":"767983636","full_name":"unytics/catalog_builder","owner":"unytics","description":"Data Catalogs Made Easy","archived":false,"fork":false,"pushed_at":"2024-05-03T09:25:53.000Z","size":2353,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-03T13:19:39.495Z","etag":null,"topics":["bigquery","data-catalog","data-discovery","databricks","dbt","redshift","snowflake"],"latest_commit_sha":null,"homepage":"https://unytics.io/catalog_builder/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unytics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-06T08:59:28.000Z","updated_at":"2024-06-11T15:16:22.633Z","dependencies_parsed_at":"2024-03-22T16:43:31.929Z","dependency_job_id":"db8bb180-f56a-4984-ac15-f0a2ac123725","html_url":"https://github.com/unytics/catalog_builder","commit_stats":null,"previous_names":["unytics/catalog_builder"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unytics%2Fcatalog_builder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unytics%2Fcatalog_builder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unytics%2Fcatalog_builder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unytics%2Fcatalog_builder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unytics","download_url":"https://codeload.github.com/unytics/catalog_builder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223511421,"owners_count":17157518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","data-catalog","data-discovery","databricks","dbt","redshift","snowflake"],"created_at":"2024-11-07T12:22:27.503Z","updated_at":"2024-11-07T12:22:28.216Z","avatar_url":"https://github.com/unytics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![logo](https://github.com/unytics/catalog_builder/assets/111615732/bdb75e70-c7cd-4c7b-aa28-f015011f1edb)\n\n\n\n\u003cp align=\"center\"\u003e\n    \u003cem\u003eBuild a custom data-catalog in minutes\u003c/em\u003e\n\u003c/p\u003e\n\n---\n\n\u003cbr\u003e\n\n## 🔍️ 1. What is CatalogBuilder?\n\n- CatalogBuilder is a simple tool to **generate \u0026 deploy a documentation website for your data assets**.\n- It enables anyone at your company to **quickly find the trusted data they are looking for**. \n\n\u003cbr\u003e\n\n## 💡  2. Why CatalogBuilder?\n\n\u003e There are **many open-source projects** (*admundsen, open-metadata, datahub, metacat, atlas*) to build such a catalog in-house. But as they offer a lot of advanced features, they are **hard to manage and deploy** if you're not a tech expert. They can be even **harder to customize**. \n\u003e \n\u003e **dbt docs** is great to generate a documentation website on top of your dbt assets but:\n\u003e \n\u003e - it focuses on dbt only (while you are interested in other sources + metadata)\n\u003e - is very hard to customize (except you're an angular expert)\n\u003e - can be slow.\n\n\u003cbr\u003e\n\n👉 CatalogBuilder aims at offering a **lightweight alternative** to generate a documentation website on top of your data assets. It focuses on **read-only data discovery** and:\n\n1. ✔️ can be easily customized and deployed by low tech people\n2. ✔️ can then handle the very specific needs of your company\n3. ✔️ is fast and lightweight\n4. ✔️ is built on top of the very famous [mkdocs-material](https://github.com/squidfunk/mkdocs-material) python library which is used by millions of developers to deploy their documentation (*such as [fastapi](https://fastapi.tiangolo.com/)*).\n\n\n\u003cbr\u003e\n\n## 💥 3. Getting Started with `catalog` CLI\n\n\u003e `catalog` is the CLI (command-line-interface) of CatalogBuilder to generate, show \u0026 deploy the documentation.\n\n### 3.1 Install `catalog` CLI 🛠️\n\n``` sh\npip install catalog-builder\n```\n\n### 3.2 Create your first documentation configuration 👨‍💻\n\n``` sh\ncatalog download dbt_gitlab_data_team\n```\n\nTo get started, let's download  a catalog configuration example from the GitHub repo and play with it. The above command will download the [`catalogs/dbt_gitlab_data_team`](https://github.com/unytics/catalog_builder/tree/main/catalogs/dbt_gitlab_data_team) folder on your laptop.\n\n\n\u003e You will find in the folder:\n\u003e \n\u003e - `assets file`: a file containing the list of the assets you want to put in your documentation. It can be a parquet file named `assets.parquet` or a [json lines file](https://medium.com/@sujathamudadla1213/difference-between-ordinary-json-and-json-lines-fc746f93d75e) named  `assets.jsonl`. Each asset in the file must have the following fields:\n\u003e   - `asset_type`: for example: `table`.\n\u003e   - `documentation_path`: the path of the asset page in the generated documentation. For example `dataset_name/table_name`.\n\u003e   - `data`: a dict of attributes used to generate the documentation. For example `{\"name\": \"foo\"}`\n\u003e - `generate_assets_file.py`: the python script used to (re)generate the `assets file`.\n\u003e - `requirements.txt`: the python requirements needed by `generate_assets_file.py`.\n\u003e - `templates`: a folder which includes a jinja-template markdown-file for each `asset_type`. These templates are used to generate a markdown documentation file for each asset.\n\u003e - `source_docs`: a folder which includes files to include as-is in the documentation.\n\u003e - `mkdocs.yml`: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.\n\n\n### 3.3 Build your catalog website 👾\n\n``` sh\ncatalog build dbt_gitlab_data_team\n```\n\n\u003e 1. For each asset of the `assets file`, the jinja template of `asset_type` will be rendered using the asset `data` to generate a markdown file which will be written into `catalogs/dbt_gitlab_data_team/docs/` at `documentation_path`.\n\u003e 2. All files in `catalogs/dbt_gitlab_data_team/source_docs/` are copied into `catalogs/dbt_gitlab_data_team/docs/`\n\u003e 3. Mkdocs will then build the documentation website from the markdown files into `catalogs/dbt_gitlab_data_team/site` (using `mkdocs.yml` configuration file).\n\n\n### 3.4 Run your catalog website locally ⚡\n\n``` sh\ncatalog serve dbt_gitlab_data_team\n```\n\n\u003e You can now see the generated documentation website at http://localhost:8000.\n\n\n### 3.5 Deploy the documentation website! 🚀\n\n**A. To deploy on GitHub pages**:\n\n``` sh\ncatalog deploy github-pages dbt_gitlab_data_team\n```\n\n\u003e Mkdocs will [deploy the site on GitHub pages](https://www.mkdocs.org/user-guide/deploying-your-docs/) (this only works if you are on a github repository).\n\n\n**B. To deploy on Google Cloud Storage Bucket**:\n\n``` sh\ncatalog deploy gcs dbt_gitlab_data_team\n```\n\n\u003e Mkdocs will copy all the files in `catalogs/dbt_gitlab_data_team/site` to the bucket defined by `site_url` value of `catalogs/dbt_gitlab_data_team/mkdocs.yml`. For instance if the site url is `http://catalogs.unytics.io/dbt_gitlab_data_team/` it will copy all files under `catalogs/dbt_gitlab_data_team/site` to `gs://catalogs.unytics.io/dbt_gitlab_data_team/` \n\n\n**C. To deploy elsewhere**:\n\nYou can follow [these instructions](https://www.mkdocs.org/user-guide/deploying-your-docs/#other-providers) from mkdocs.\n\n\u003cbr\u003e\n\n\n## 💎 4. Generate your dbt documentation\n\nTo generate a documentation website for your own dbt project, do the following:\n\n1. Change directory to your dbt project directory\n3. Download `catalogs/dbt` documentation example by running `catalog download dbt`.\n2. Run `dbt docs generate` to compute `target/manifest.json` and `target/catalog.json`.\n4. Generate the assets file by running `python catalogs/dbt/generate_assets_file.py`. The script will parse `target/manifest.json` and `target/catalog.json` to generate the `assets file` in the expected format.\n5. Run `catalog serve dbt` to build the website and show it locally.\n\n\u003cbr\u003e\n\n\n## Keep in touch 🧑‍💻\n\n[Join our Slack](https://join.slack.com/t/unytics/shared_invite/zt-1gbv491mu-cs03EJbQ1fsHdQMcFN7E1Q) for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat 🙂.\n\n\u003cbr\u003e\n\n## 👋 Contribute\n\nAny contribution is more than welcome 🤗!\n\n- Add a ⭐ on the repo to show your support\n- [Join our Slack](https://join.slack.com/t/unytics/shared_invite/zt-1gbv491mu-cs03EJbQ1fsHdQMcFN7E1Q) and talk with us\n- Raise an issue to raise a bug or suggest improvements\n- Open a PR! \n\n\n\u003cstyle\u003e\n.md-sidebar--primary {\ndisplay: none!important;\n}\n:root {\n--md-primary-fg-color:        #2acfa7ff!important;\n}  \n\u003c/style\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funytics%2Fcatalog_builder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funytics%2Fcatalog_builder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funytics%2Fcatalog_builder/lists"}