{"id":18811379,"url":"https://github.com/chop-dbhi/sqltojson","last_synced_at":"2025-07-07T06:02:53.338Z","repository":{"id":141917852,"uuid":"57595550","full_name":"chop-dbhi/sqltojson","owner":"chop-dbhi","description":"Tool to build/export JSON documents from nested SQL queries.","archived":false,"fork":false,"pushed_at":"2017-05-09T14:47:40.000Z","size":16,"stargazers_count":1,"open_issues_count":5,"forks_count":2,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-12-29T23:54:49.883Z","etag":null,"topics":["elasticsearch","json","nested-sql-queries","sql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chop-dbhi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-05-01T11:17:40.000Z","updated_at":"2017-05-12T09:13:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"24b6a9b6-878c-4662-889e-7ec413f72f6e","html_url":"https://github.com/chop-dbhi/sqltojson","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fsqltojson","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fsqltojson/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fsqltojson/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chop-dbhi%2Fsqltojson/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chop-dbhi","download_url":"https://codeload.github.com/chop-dbhi/sqltojson/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239748254,"owners_count":19690232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","json","nested-sql-queries","sql"],"created_at":"2024-11-07T23:26:04.799Z","updated_at":"2025-02-19T22:42:56.104Z","avatar_url":"https://github.com/chop-dbhi.png","language":"Go","readme":"# SQL to JSON\n\nTool to build/export JSON documents from nested SQL queries.\n\nCurrently the data output is in the [bulk format](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) for loading into Elasticsearch. In addition, a base [mapping file](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html) is auto-generated by inferring the types from the data.\n\n## Install\n\nThe pre-packaged binary includes drivers for:\n\n- PostgreSQL\n- SQLite\n- MSSQL\n- MySQL/MariaDB\n- Oracle\n- (ODBC coming soon)\n\nDownload it from the [releases](https://github.com/chop-dbhi/sqltojson/releases) page.\n\nTo keep the library simple, the `sqltojson` library does not contain any drivers, but it is trivial to create a program with only the driver you want. See the [cmd/sqltojson main file](./cmd/sqltojson/main.go) for an example.\n\n## Config\n\nCreate a config file. See the included [config.example.yaml](./config.example.yaml) for a real world example.\n\n- `workers` - Number of workers. Defaults to 10.\n- `connections` - Max number of connections to the database. Defaults to 10.\n- `files.data` - Name of the file to write the data to. Defaults to `data.json`. For more control, set the name to `-` which will write the data to stdout. This makes it easy to pipe to `gzip` or perform other operations especially for large exports.\n- `files.mapping` - Name of the file to write the mapping to.\n- `index` - Name of the ES index the data applies to.\n- `type` - Name of the ES document type the data applies to.\n- `schema` - The schema of the documents being created.\n\n## Usage\n\nCreate a config file and then run.\n\n```bash\nsqltojson -config config.yaml\n```\n\nIf `files.data` is set to `-`, the output will be written to stdout. This may be preferable for large exports to the data can be compressed.\n\n```bash\nsqltojson -config config-stdout.yaml | gzip -c \u003e data.json.gz\n```\n\n## Development/Library\n\nRequires:\n\n* [Glide](https://glide.sh)\n* Go \u003e=1.5 with `GO15VENDOREXPERIMENT=1`\n\n```bash\ngit clone https://github.com/chop-dbhi/sqltojson.git \u0026\u0026 cd sqltojson\nglide install\n```\n\n## Elasticsearch\n\nThe output files are named `data.json` and `mapping.json` by default.\n\n### Create Index\n\n```bash\ncurl -XPUT http://192.168.99.100:9200/my_index -d @mapping.json\n```\n\n### Load Data\n\n```bash\ncurl -XPOST http://192.168.99.100:9200/_bulk --data-binary @data.json\n```\n\n## Troubleshooting\n\n**\"Connection reset by peer\" during a data load**\n\nElasticsearch has a limit to how large the request body can be for a single HTTP request. This is specified by the `http.max_content_length` setting which is [100 MB by default](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html).\n\nOne solution is to increase the limit, however all of the data does need to be kept in memory and is limited to about 4 GB, so there may be a hard limit on what is practical.\n\nA better solution is to chunk up the output file into sizes smaller than that setting and load them sequentially. This can be done using the `split` command available on Linux and OS X platforms.\n\nCreate a directory to split the files into.\n\n```bash\nmkdir data\n```\n\n`cd` into and split the file so the new files are written in the directory. The files will be named liked `xaa`, `xab`, etc. Choose an even number of lines where the total file size is less than the `http.max_content_length`.\n\n```bash\ncd data\nsplit -l 10000 ../data.json\n```\n\nLoop over each file in order and bulk load it.\n\n```bash\nfor f in `ls . | sort`; do\n    curl -X POST http://192.168.99.100:9200/_bulk --data-binary \"@$f\"\ndone\n```\n\nOn a side note, the HTTP response is not terribly informative and is on board to be changed to a [413 Request Entity to Large](https://github.com/elastic/elasticsearch/issues/2902).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchop-dbhi%2Fsqltojson","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchop-dbhi%2Fsqltojson","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchop-dbhi%2Fsqltojson/lists"}