{"id":21238781,"url":"https://github.com/splitgraph/splitgraph-yml-example","last_synced_at":"2026-02-02T16:07:55.905Z","repository":{"id":114333848,"uuid":"480387837","full_name":"splitgraph/splitgraph-yml-example","owner":"splitgraph","description":"End-to-end splitgraph.yml example","archived":false,"fork":false,"pushed_at":"2022-04-11T20:56:10.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-21T23:55:23.187Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/splitgraph.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-04-11T13:16:56.000Z","updated_at":"2022-04-11T20:51:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"467ba679-ac27-4765-acc5-1ba16fbf89fd","html_url":"https://github.com/splitgraph/splitgraph-yml-example","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":"splitgraph/splitgraph-cloud-template","purl":"pkg:github/splitgraph/splitgraph-yml-example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splitgraph%2Fsplitgraph-yml-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splitgraph%2Fsplitgraph-yml-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splitgraph%2Fsplitgraph-yml-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splitgraph%2Fsplitgraph-yml-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/splitgraph","download_url":"https://codeload.github.com/splitgraph/splitgraph-yml-example/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splitgraph%2Fsplitgraph-yml-example/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29015145,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T14:58:54.169Z","status":"ssl_error","status_checked_at":"2026-02-02T14:58:51.285Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-21T00:38:26.198Z","updated_at":"2026-02-02T16:07:55.900Z","avatar_url":"https://github.com/splitgraph.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sample Splitgraph Cloud project\n\nThis is a project that we use to showcase the `splitgraph.yml` functionality. It builds two repositories:\n\n  - [splitgraph-demo/for-hire-vehicles](https://www.splitgraph.com/splitgraph-demo/for-hire-vehicles): List of active for-hire vehicles in New York City ([source](https://data.cityofnewyork.us/Transportation/For-Hire-Vehicles-FHV-Active/8wbx-tsch))\n  - [splitgraph-demo/for-hire-vehicles-summary](https://www.splitgraph.com/splitgraph-demo/for-hire-vehicles-summary): dbt transformation that summarizes the data in the first repository\n\nFor another Splitgraph Cloud project, see an example [here](https://github.com/splitgraph/dbt-transform-example): this uses a more\ncomplex dbt model that joins across multiple Splitgraph datasets.\n\n# Default generated README\n\nWelcome to the sample Splitgraph Cloud project that we generated for your chosen data sources.\n\nThis project contains:\n\n  * [`splitgraph.yml`](./splitgraph.yml): defines live and ingested data sources as well as other\n    metadata for your data catalog.\n  * [`splitgraph.credentials.yml`](./splitgraph.credentials.yml): defines credentials to your \n    data sources\n  * [`build.moveme.yml`](./build.moveme.yml): GitHub Action that runs ingestion / metadata upload\n    for all sources:\n    * Adds data sources supporting \"live\" querying (PostgreSQL, MySQL, Elasticsearch, CSV-in-S3) to\n      Splitgraph without ingestion, letting you query them at source\n    * Runs a \"sync\" action for other data sources (SaaS etc) to load their data to Splitgraph  \n    * Optionally, also runs a dbt project at the end of ingestion to build models.\n    \nAll built repositories are going to be private to your account. You can manage access settings in\nthe UI by going to https://splitgraph.com/namespace/repository. \n\n## Required setup\n\nBefore you can run this project from GitHub Action, you need to perform a few extra setup steps.\n\n### Add credentials to `splitgraph.credentials.yml`\n\nEdit [`splitgraph.credentials.yml`](./splitgraph.credentials.yml) to add required credentials to\nyour data sources. **DO NOT COMMIT IT!** You'll add the contents of this file as a secret in the\nnext step.\n\n### Set up GitHub secrets\n\nGo to the [Secrets page](https://github.com/splitgraph/splitgraph-yml-example/settings/secrets/actions/new) for this\nrepository and create the following secrets:\n  \n  * `SPLITGRAPH_CREDENTIALS_YML`: contents of the `splitgraph.credentials.yml` with the data source\n    credentals that you've edited in the previous step. \n  * `SPLITGRAPH_API_KEY` / `SPLITGRAPH_API_SECRET`: API keys to Splitgraph Cloud (also known as\n    \"SQL credentials\"). You can get them at https://www.splitgraph.com/settings/sql-credentials (or\n    your deployment URL if you're on a private deployment).\n\n### Edit `splitgraph.yml`\n\nWe generated a [`splitgraph.yml`](./splitgraph.yml) file from your chosen plugins'\nparameters JSONSchema. You should review it and add suitable plugin settings:\n\n  - set `tables` to `tables: {}` to let the plugin automatically infer the schema and the\n    options of the data source (by default, it adds a sample table into the project file)\n  - change and customize the `metadata` block\n  - set up the plugin parameters in `external.params`. Where the comment says `CHOOSE ONE`\n    and offers a list of alternative subobjects, choose one entry from the list and delete\n    the list itself, leaving the object at the top level.\n\nExample:\n\n```yaml\n- namespace: my_namespace\n  repository: csv\n  # Catalog-specific metadata for the repository. Optional.\n  metadata:\n    readme:\n      text: Readme\n    description: Description of the repository\n    topics:\n    - sample_topic\n  # Data source settings for the repository. Optional.\n  external:\n    # Name of the credential that the plugin uses. This can also be a credential_id if the\n    # credential is already registered on Splitgraph.\n    credential: csv\n    plugin: csv\n    # Plugin-specific parameters matching the plugin's parameters schema\n    params:\n      connection:  # Choose one of:\n      - connection_type: http  # REQUIRED. Constant\n        url: '' # REQUIRED. HTTP URL to the CSV file\n      - connection_type: s3  # REQUIRED. Constant\n        s3_endpoint: '' # REQUIRED. S3 endpoint (including port if required)\n        s3_bucket: '' # REQUIRED. Bucket the object is in\n        s3_region: '' # Region of the S3 bucket\n        s3_secure: false # Whether to use HTTPS for S3 access\n        s3_object: '' # Limit the import to a single object\n        s3_object_prefix: '' # Prefix for object in S3 bucket\n      autodetect_header: true # Detect whether the CSV file has a header automatically\n      autodetect_dialect: true # Detect the CSV file's dialect (separator, quoting characters etc) automatically\n      autodetect_encoding: true # Detect the CSV file's encoding automatically\n      autodetect_sample_size: 65536 # Sample size, in bytes, for encoding/dialect/header detection\n      schema_inference_rows: 100000 # Number of rows to use for schema inference\n      encoding: utf-8 # Encoding of the CSV file\n      ignore_decode_errors: false # Ignore errors when decoding the file\n      header: true # First line of the CSV file is its header\n      delimiter: ',' # Character used to separate fields in the file\n      quotechar: '\"' # Character used to quote fields\n    tables:\n      sample_table:\n        # Plugin-specific table parameters matching the plugin's schema\n        options:\n          url: ''  # HTTP URL to the CSV file\n          s3_object: '' # S3 object of the CSV file\n          autodetect_header: true # Detect whether the CSV file has a header automatically\n          autodetect_dialect: true # Detect the CSV file's dialect (separator, quoting characters etc) automatically\n          autodetect_encoding: true # Detect the CSV file's encoding automatically\n          autodetect_sample_size: 65536 # Sample size, in bytes, for encoding/dialect/header detection\n          schema_inference_rows: 100000 # Number of rows to use for schema inference\n          encoding: utf-8 # Encoding of the CSV file\n          ignore_decode_errors: false # Ignore errors when decoding the file\n          header: true # First line of the CSV file is its header\n          delimiter: ',' # Character used to separate fields in the file\n          quotechar: '\"' # Character used to quote fields\n        # Schema of the table, a list of objects with `name` and `type`. If set to `[]`, will infer. \n        schema: []\n    # Whether live querying is enabled for the plugin (creates a \"live\" tag in the\n    # repository proxying to the data source). The plugin must support live querying.\n    is_live: true\n    # Ingestion schedule settings. Disable this if you're using GitHub Actions or other methods\n    # to trigger ingestion.\n    schedule:\n```  \n\nbecomes:\n\n```yaml\n- namespace: my_namespace\n  repository: csv\n  metadata:\n    readme:\n      text: Readme\n    description: Description of the repository\n    topics:\n    - sample_topic\n  external:\n    # No credential required since we're querying a CSV file over HTTP\n    plugin: csv\n    # Plugin-specific parameters matching the plugin's parameters schema\n    params:\n      connection:\n        connection_type: http  # REQUIRED. Constant\n        url: 'https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv' # REQUIRED. HTTP URL to the CSV file\n      autodetect_header: true # Detect whether the CSV file has a header automatically\n      autodetect_dialect: true # Detect the CSV file's dialect (separator, quoting characters etc) automatically\n      autodetect_encoding: true # Detect the CSV file's encoding automatically\n      autodetect_sample_size: 65536 # Sample size, in bytes, for encoding/dialect/header detection\n      schema_inference_rows: 100000 # Number of rows to use for schema inference\n      encoding: utf-8 # Encoding of the CSV file\n      ignore_decode_errors: false # Ignore errors when decoding the file\n      header: true # First line of the CSV file is its header\n      delimiter: ',' # Character used to separate fields in the file\n      quotechar: '\"' # Character used to quote fields\n    # Automatically infer table parameters\n    tables: {}\n    is_live: true\n```\n\n### Set up GitHub Actions\n\nBecause this repository was itself generated by a GitHub Actions job, we can't edit the workflow\nfiles for this repository from within the action. You will need to move the job definition file\n([`build.moveme.yml`](./build.moveme.yml)) to `.github/workflows/build.yml`.\n\nOptionally, also delete the `seed.yml` file that was used to generate this project.\n\n### Set up dbt and write the models\n\nIf you added dbt to this project, this repository also contains a sample dbt project that references\ndata from all the datasets you've added to it. See [`dbt_project.yml`](./dbt_project.yml) and the\n[`models/staging/sources.yml`](models/staging/sources.yml) file for more information.\n\nCurrently, we can't infer the columns and the tables that your data sources will produce at this\nproject generation time, so this dbt project is here as a rough starting point. To get it working,\nyou will need to: \n \n* Manually define tables in your sources (see \n  [`models/staging/sources.yml`](models/staging/sources.yml), \"tables\" sections). You might want\n  to run the ingestion GitHub Action once first without the dbt step in order to create the\n  repositories on Splitgraph and see their tables and columns.\n* Write the actual models that reference the sources using the `source(...)` macros (see \n  `models/staging/(source_name)/source_name.sql` for an example)\n\n## Run the action\n\nBy default, the generated action waits for a manual trigger to run. You can trigger the action by\ngoing to https://github.com/splitgraph/splitgraph-yml-example/actions/workflows/build.yml and clicking \"Run workflow\". \n\n## Next steps\n \n  * Edit the GitHub Action to, for example, add a run schedule\n  * Browse the ingested and built datasets at https://splitgraph.com/namespace/repository\n  * Connect to Splitgraph with an SQL client (see [the docs](https://www.splitgraph.com/docs/splitgraph-cloud/data-delivery-network)) \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplitgraph%2Fsplitgraph-yml-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsplitgraph%2Fsplitgraph-yml-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplitgraph%2Fsplitgraph-yml-example/lists"}