{"id":18365241,"url":"https://github.com/openlink/vos-docker-bulkload-example","last_synced_at":"2026-02-23T00:38:42.198Z","repository":{"id":44923638,"uuid":"513140781","full_name":"openlink/vos-docker-bulkload-example","owner":"openlink","description":"A tutorial on how to bulkload data into a virtuoso docker container","archived":false,"fork":false,"pushed_at":"2022-07-12T12:56:21.000Z","size":5,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-15T19:51:17.478Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openlink.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-12T12:51:38.000Z","updated_at":"2024-08-31T01:52:55.000Z","dependencies_parsed_at":"2022-08-04T01:00:16.734Z","dependency_job_id":null,"html_url":"https://github.com/openlink/vos-docker-bulkload-example","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openlink%2Fvos-docker-bulkload-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openlink%2Fvos-docker-bulkload-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openlink%2Fvos-docker-bulkload-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openlink%2Fvos-docker-bulkload-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openlink","download_url":"https://codeload.github.com/openlink/vos-docker-bulkload-example/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248225656,"owners_count":21068078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T23:12:57.417Z","updated_at":"2025-10-26T21:07:22.948Z","avatar_url":"https://github.com/openlink.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Example: bulkloading data in a Virtuoso docker image\n\n_Copyright (C) 2022 OpenLink Software \u003csupport@openlinksw.com\u003e_\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n**Table of Contents**\n\n- [Introduction](#introduction)\n- [Downloading and running the example](#downloading-and-running-the-example)\n- [Explanation](#explanation)\n  - [The docker-compose.yml script](#the-docker-composeyml-script)\n  - [The ./data directory](#the-data-directory)\n    - [Notes on the bulkloader](#notes-on-the-bulkloader)\n  - [The ./scripts directory](#the-scripts-directory)\n    - [The 10-bulkload.sql script](#the-10-bulkloadsql-script)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n## Introduction\n\nBoth the \n[OpenLink Virtuoso Commercial docker image](https://hub.docker.com/repository/docker/openlink/virtuoso-closedsource-8)\n(openlink/virtuoso-commercial-8) and the \n[OpenLink Virtuoso Open Source docker image](https://hub.docker.com/repository/docker/openlink/virtuoso-opensource-7)\n(openlink/virtuoso-opensource-7) allow users to run a combination of shell scripts and SQL\nscripts to initialize a new database.\n\nThis example shows how a Virtuoso Open Source Docker instance, started by docker-compose,\ncan bulkload initial data into the QUAD store when initializing a new database.\n\nIt has been tested on both Ubuntu 18.04 (x86_64) and macOS Big Sur 11.6 (x86_64 and Apple Silicon).\n\nMost modern Linux distributions provide docker packages as part of their repository.\n\nFor Apple macOS and Microsoft Windows Docker installers can be downloaded from the \n[Docker website](https://docker.com/products).\n\n**Note**: Installing software like git, Docker and Docker Compose is left as an excercise for\nthe reader.\n\n\n##  Downloading and running the example\n\nThe source code for this example can be cloned from its repository on GitHub using the following\ncommand:\n\n```shell\n$ git clone https://github.com/openlink/vos-docker-bulkload-example\n```\n\nThe example is started using the following commands:\n\n```shell\n$ cd vos-docker-bulkload-example\n$ docker-compose pull\n$ docker-compose up\n```\n\nOnce Virtuoso has started, you can use a browser to connect to the local SPARQL endpoint:\n\n```text\nhttp://localhost:8890/sparql\n```\n\nCut and paste the following query and press the 'Execute' button.\n\n```text\nSELECT * from \u003curn:bulkload:test1\u003e WHERE { ?s ?p ?o }\n```\n\nwhich should give the following result:\n\n| s   | p   | o                                |\n| --- | --- | -------------------------------- |\n| s1  | p1  | This is example 1 (uncompressed) |\n| s3  | p3  | This is example 3 (gzip)         |\n| s4  | p4  | This is example 4 (xz)           |\n\n\nNext, cut and paste the following query and press the 'Execute' button.\n\n```text\nSELECT * from \u003curn:bulkload:test2\u003e WHERE { ?s ?p ?o }\n```\n\nwhich should give the following result:\n\n| s   | p   | o                         |\n| --- | --- | ------------------------- |\n| s2  | p2  | This is example 2 (bzip2) |\n\n\nTo stop the example, just press `CTRL-C` on the docker-compose window.\n\nFinally to clean the example, run the following command:\n```shell\n$ docker-compose rm\n```\n\n\n## Explanation\n\n\n### The docker-compose.yml script\n\nThis is the `docker-compose.yml` script we are using in this example:\n```yml\nversion: \"3.3\"\nservices:\n  virtuoso_db:\n    image: openlink/virtuoso-opensource-7\n    volumes:\n      - ./data:/database/data\n      - ./scripts:/opt/virtuoso-opensource/initdb.d\n    environment:\n      - DBA_PASSWORD=dba\n    ports:\n      - \"1111:1111\"\n      - \"8890:8890\"\n```\n\nThe docker compose program uses this information to run the following steps:\n\n  1. Downloads the `openlink/virtuoso-opensource-7` docker image from Docker Hub if it does not\nfind a version of the image in your local docker cache. As you may already have an older image in\nyour cache, you may want to first run `docker-compose pull` to make sure you have downloaded\nthe absolute latest version of the image.\n  2. Creates an instance of this docker image.\n  3. Mounts the local `data` directory on the host os, as `/database/data` in the docker instance.\n  4. Mounts the local `scripts` directory on the host OS, as `/initdb.d` in the docker instance.\n  5. Sets the `dba` password to something trivial.\n  6. Exposes the standard network ports `1111` and `8890`.\n  7. Runs the registered startup script for this image.\n\n\n### The ./data directory\n\nThis directory contains the initial data that we want to load into the database. \n\n```\n./data\n├── README.md\n├── example1.nt\n├── example2.nt.bz2\n├── example2.nt.graph\n├── example3.nt.gz\n├── global.graph\n└── subdir\n    └── example4.nt.xz\n```\nThe `docker-compose.yml` script mounts this data directory below the `/database` directory as\n`/database/data`.\n\nSince the database directory is in the `DirsAllowed` setting in the `[Parameters]` section,\nwe do not have to make modifications to the `virtuoso.ini` configuration file.\n\n\n#### Notes on the bulkloader\nThe Virtuoso bulkloader can automatically handle compressed (.gz, .bz2 and .xz) files and will\nchoose the appropriate decompression function to read the content from the file.\n\nIt then chooses an appropriate parser for the data based on the suffix of the file:\n\n  * **N-QUAD** when using the .nq or .n4 suffix\n  * **Trig** when using the .trig suffix\n  * **RDF/XML** when using .xml, .owl, .rdf or .rdfs suffix\n  * **Turtle** when using .ttl or .nt suffix\n\nWhile the function \u003ccode\u003eld_dir_all()\u003c/code\u003e allows the operator to provide a graph name, it is much simpler\nfor the data directory to contain hints on the graph names to use, especially when there are\na number of files that needed to be loaded either in the same graph or in different graphs.\n\nIn this example the file `example2.nt.gz` needs to be loaded in a different graph than the other\ntwo data files, so we create a `example2.nt.graph` file which contains the graph name for this\ndata file. Note that although the datafile has the `.gz` extension, the graph file does not.\n\nThe other two data files in this directory, `example1.nt` and `example3.nt.gz` do not have\ntheir own graph hint file. In this case the bulkloader sees there is a `global.graph` file in\nthe same directory and uses its contents for these two files.\n\nFinally the `example4.nt.xz` also uses the information from `global.graph` as subdirectories\nautomatically inherit the graph name from their parent directory.\n\nIf the `global.graph` file is not present, the graph argument of the \u003ccode\u003eld_dir_all()\u003c/code\u003e\nfunction is used.\n\n\n\n\n### The ./scripts directory\n\nThis directory can contain a mix of shell (.sh) scripts and Virtuso PL (.sql) scripts that can\nperform functions such as:\n\n  * Installing additional Ubuntu packages.\n  * Loading data from remote locations such as Amazon S3 buckets, Google Drive, or other locations.\n  * Bulkloading data into the virtuoso database.\n  * Installing additional `VAD` packages into the database.\n  * Adding new virtuoso users.\n  * Granting permissions to virtuoso users.\n  * Regenerate freetext indexes or other initial data.\n\nThe scripts are run only once during the initial database creation; subsequent restarts of the\ndocker image will not cause these script to be rerun.\n\nThe scripts are run in alphabetical order, so we suggest starting the script name with a sequence\nnumber, so the ordering is explicit.\n\nFor security purposes, Virtuoso will run the `.sql` scripts in a special mode and will not\nrespond to connections on its SQL (1111) and/or HTTP (8890) ports.\n\nAt the end of each `.sql` script, Virtuoso automatically performs a `checkpoint` to make sure\nthe changes are fully written back to the database. This is very important for scripts\nthat use the bulkloader function `rdf_loader_run()` or for any reason manually change the ACID\nmode of the database.\n\nAfter all the initialization scripts have run to completion, Virtuoso will be started normally\nand start listening to requests on its SQL (1111) and HTTP (8890) ports.\n\n\n#### The 10-bulkload.sql script\n\n\n```SQL\n--\n--  Copyright (C) 2022 OpenLink Software\n--\n\n--\n--  Add all files that end in .nt\n--\nld_dir_all ('data', '*.nt', 'no-graph-1')\n;\n\n--\n--  Add all files that end in .bz2, .gz or .xz to show that the Virtuoso bulkloader \n--  can load compressed files without manual decompression\n--\nld_dir_all ('data', '*.bz2', 'no-graph-3')\n;\n\nld_dir_all ('data', '*.gz', 'no-graph-2')\n;\n\nld_dir_all ('data', '*.xz', 'no-graph-4')\n;\n\n--\n--  Now load all of the files found above into the database\n--\nrdf_loader_run()\n;\n\n--\n--  End of script\n--\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenlink%2Fvos-docker-bulkload-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenlink%2Fvos-docker-bulkload-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenlink%2Fvos-docker-bulkload-example/lists"}