{"id":14066959,"url":"https://github.com/tomroh/bcputility","last_synced_at":"2025-04-19T18:09:01.015Z","repository":{"id":43737356,"uuid":"377306794","full_name":"tomroh/bcputility","owner":"tomroh","description":"R package for fast bulk imports/exports from/to SQL Server with the bcp command line utility","archived":false,"fork":false,"pushed_at":"2024-08-01T15:51:03.000Z","size":1362,"stargazers_count":14,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T11:22:53.048Z","etag":null,"topics":["database","r","spatial","sql","sqlserver"],"latest_commit_sha":null,"homepage":"https://bcputility.roh.engineering","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomroh.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"tomroh"}},"created_at":"2021-06-15T22:19:47.000Z","updated_at":"2024-08-01T15:51:06.000Z","dependencies_parsed_at":"2024-04-01T02:31:40.769Z","dependency_job_id":"abaf8fe6-b725-4fee-9b55-3cf5c10611b6","html_url":"https://github.com/tomroh/bcputility","commit_stats":{"total_commits":44,"total_committers":1,"mean_commits":44.0,"dds":0.0,"last_synced_commit":"293b76244d7e8222f5b0c78ee095f4ab7e625096"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomroh%2Fbcputility","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomroh%2Fbcputility/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomroh%2Fbcputility/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomroh%2Fbcputility/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomroh","download_url":"https://codeload.github.com/tomroh/bcputility/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249077794,"owners_count":21209039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","r","spatial","sql","sqlserver"],"created_at":"2024-08-13T07:05:21.197Z","updated_at":"2025-04-19T18:09:00.997Z","avatar_url":"https://github.com/tomroh.png","language":"R","readme":"# bcputility \u003ca href='https://bcputility.delveds.com'\u003e\u003cimg src='man/figures/logo.png' align=\"right\" height=\"104\" /\u003e\u003c/a\u003e\r\n\r\n\u003c!-- badges: start --\u003e\r\n[![CRAN status](https://www.r-pkg.org/badges/version/bcputility)](https://CRAN.R-project.org/package=bcputility)\r\n[![R-CMD-check](https://github.com/tomroh/bcputility/workflows/R-CMD-check/badge.svg)](https://github.com/tomroh/bcputility/actions)\r\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\r\n[![](https://cranlogs.r-pkg.org/badges/grand-total/bcputility?color=green)](https://cran.r-project.org/package=bcputility)\r\n\u003c!-- badges: end --\u003e\r\n\r\n\r\n\r\n**bcputility** is a wrapper for the command line utility program from SQL Server \r\nthat does bulk imports/exports. The package assumes that [bcp](https://docs.microsoft.com/en-us/sql/tools/bcp-utility)\r\nis already installed and is on the system search path. For large inserts to SQL \r\nServer over an ODBC connection (e.g. with the \r\n\"[DBI](https://dbi.r-dbi.org/)\" package), writes can take a very long time as \r\neach row generates an individual insert statement. The bcp Utility greatly \r\nimproves performance of large writes by using bulk inserts.\r\n\r\nAn export function is provided for convenience, but likely will not significantly\r\nimprove performance over other methods.\r\n\r\n## Prerequisites\r\n\r\nThe system dependencies can be downloaded and installed from \r\n[Microsoft](https://learn.microsoft.com/en-us/sql/tools/bcp-utility#download-the-latest-version-of-the-bcp-utility). \r\nIt is recommended to add `bcp` and `sqlcmd` to the system path. \r\n\r\n## Installation\r\n\r\nYou can install the released version of bcputility from \r\n[CRAN](https://CRAN.R-project.org) with:\r\n\r\n```r\r\ninstall.packages(\"bcputility\")\r\n```\r\n\r\nInstall the development version with:\r\n\r\n```r\r\ndevtools::install_github(\"tomroh/bcputility\")\r\n```\r\n\r\nTo check if the prerequisite binaries are on the path: \r\n\r\n```r\r\nbcpVersion()\r\nsqlcmdVersion()\r\n```\r\n\r\nIf `bcp` and `sqlcmd` is not on the system path or you want to override the default, set the option with the full file path:\r\n\r\n```r\r\noptions(bcputility.bcp.path = \"\u003cpath-to-bcp\u003e\")\r\noptions(bcputility.sqlcmd.path = \"\u003cpath-to-sqlcmd\u003e\")\r\n```\r\n\r\n## Usage\r\n\r\nTrusted Connection (default):\r\n\r\n```r\r\nx \u003c- read.csv(\"\u003cfile.csv\u003e\")\r\nconnectArgs \u003c- makeConnectArgs(server = \"\u003cserver\u003e\", database = \"\u003cdatabase\u003e\")\r\nbcpImport(x = x, connectargs = connectArgs, table = \"\u003ctable\u003e\")\r\n```\r\n\r\nSQL Authentication:\r\n\r\n```r\r\nconnectArgs \u003c- makeConnectArgs(server = \"\u003cserver\u003e\", database = \"\u003cdatabase\u003e\",\r\n  username = \"\u003cusername\u003e\", password = \"\u003cpassword\u003e\")\r\nbcpImport(x = x, connectargs = connectArgs, table = table)\r\n```\r\n\r\n## Benchmarks\r\n\r\nBenchmarks were performed with a local installation of SQL Server Express. \r\nWhen testing with a remote SQL Server, performance of *bcp* over *odbc* was further \r\nimproved.\r\n\r\n![](man/figures/benchmarks.png)\r\n\r\n### Import\r\n\r\n```r\r\nlibrary(DBI)\r\nlibrary(data.table)\r\nlibrary(bcputility)\r\nserver \u003c- Sys.getenv('MSSQL_SERVER')\r\ndatabase \u003c- Sys.getenv('MSSQL_DB')\r\ndriver \u003c- 'ODBC Driver 17 for SQL Server'\r\nset.seed(11)\r\nn \u003c- 1000000\r\nimportTable \u003c- data.frame(\r\n  int = sample(x = seq(1L, 10000L, 1L), size = n, replace = TRUE),\r\n  numeric = sample(x = seq(0, 1, length.out = n/100), size = n,\r\n    replace = TRUE),\r\n  character = sample(x = state.abb, size = n, replace = TRUE),\r\n  factor = sample(x = factor(x = month.abb, levels = month.abb),\r\n    size = n, replace = TRUE),\r\n  logical = sample(x = c(TRUE, FALSE), size = n, replace = TRUE),\r\n  date = sample(x = seq(as.Date('2022-01-01'), as.Date('2022-12-31'),\r\n    by = 'days'), size = n, replace = TRUE),\r\n  datetime = sample(x = seq(as.POSIXct('2022-01-01 00:00:00'),\r\n    as.POSIXct('2022-12-31 23:59:59'), by = 'min'), size = n, replace = TRUE)\r\n)\r\nconnectArgs \u003c- makeConnectArgs(server = server, database = database)\r\ncon \u003c- DBI::dbConnect(odbc::odbc(),\r\n                      Driver = \"SQL Server\",\r\n                      Server = server,\r\n                      Database = database)\r\nimportResults \u003c- microbenchmark::microbenchmark(\r\n  bcpImport1000 = {\r\n    bcpImport(importTable,\r\n              connectargs = connectArgs,\r\n              table = 'importTable1',\r\n              bcpOptions = list(\"-b\", 1000, \"-a\", 4096, \"-e\", 10),\r\n              overwrite = TRUE,\r\n              stdout = FALSE)\r\n    },\r\n  bcpImport10000 = {\r\n    bcpImport(importTable,\r\n              connectargs = connectArgs,\r\n              table = 'importTable2',\r\n              bcpOptions = list(\"-b\", 10000, \"-a\", 4096, \"-e\", 10),\r\n              overwrite = TRUE,\r\n              stdout = FALSE)\r\n  },\r\n  bcpImport50000 = {\r\n    bcpImport(importTable,\r\n              connectargs = connectArgs,\r\n              table = 'importTable3',\r\n              bcpOptions = list(\"-b\", 50000, \"-a\", 4096, \"-e\", 10),\r\n              overwrite = TRUE,\r\n              stdout = FALSE)\r\n  },\r\n  bcpImport100000 = {\r\n    bcpImport(importTable,\r\n      connectargs = connectArgs,\r\n      table = 'importTable4',\r\n      bcpOptions = list(\"-b\", 100000, \"-a\", 4096, \"-e\", 10),\r\n      overwrite = TRUE,\r\n      stdout = FALSE)\r\n  },\r\n  dbWriteTable = {\r\n    con \u003c- DBI::dbConnect(odbc::odbc(),\r\n      Driver = driver,\r\n      Server = server,\r\n      Database = database,\r\n      trusted_connection = 'yes')\r\n    DBI::dbWriteTable(con, name = 'importTable5', importTable, overwrite = TRUE)\r\n    },\r\n  times = 30L,\r\n  unit = 'seconds'\r\n)\r\nimportResults\r\n```\r\n\r\n|expr            |       min|        lq|      mean|    median|        uq|      max| neval|\r\n|:---------------|---------:|---------:|---------:|---------:|---------:|--------:|-----:|\r\n|bcpImport1000   | 15.017385| 16.610868| 17.405555| 17.656265| 18.100990| 19.44482|    30|\r\n|bcpImport10000  | 10.091266| 10.657926| 10.926738| 10.916577| 11.208184| 11.46027|    30|\r\n|bcpImport50000  |  8.982498|  9.337509|  9.677375|  9.571526|  9.896179| 10.77709|    30|\r\n|bcpImport100000 |  8.769598|  9.303473|  9.562921|  9.581927|  9.855355| 10.36949|    30|\r\n|dbWriteTable    | 13.570956| 13.820707| 15.154505| 14.159002| 16.378986| 27.28819|    30|\r\n\r\n*Time in seconds*\r\n\r\n### Export Table\r\n\r\n**Note:** *bcp* exports of data may not match the format of ```fwrite```. \r\n```dateTimeAs = 'write.csv'``` was used to make timings comparable, which \r\ndecreased the performance of \"[data.table](https://rdatatable.gitlab.io/data.table/)\". \r\nOptimized write formats for date times from ```fwrite``` outperforms *bcp* for \r\ndata that is small enough to be pulled into memory.\r\n\r\n```r\r\nexportResults \u003c- microbenchmark::microbenchmark(\r\n  bcpExportChar = {\r\n    bcpExport('inst/benchmarks/test1.csv',\r\n              connectargs = connectArgs,\r\n              table = 'importTableInit',\r\n              fieldterminator = ',',\r\n              stdout = FALSE)\r\n    },\r\n  bcpExportNchar = {\r\n    bcpExport('inst/benchmarks/test2.csv',\r\n              connectargs = connectArgs,\r\n              table = 'importTableInit',\r\n              fieldterminator = ',',\r\n              stdout = FALSE)\r\n  },\r\n  fwriteQuery = {\r\n    fwrite(DBI::dbReadTable(con, 'importTableInit'),\r\n           'inst/benchmarks/test3.csv', dateTimeAs = 'write.csv',\r\n           col.names = FALSE)\r\n  },\r\n  times = 30L,\r\n  unit = 'seconds'\r\n)\r\nexportResults\r\n```\r\n\r\n|expr           |      min|       lq|     mean|   median|       uq|      max| neval|\r\n|:--------------|--------:|--------:|--------:|--------:|--------:|--------:|-----:|\r\n|bcpExportChar  | 2.565654| 2.727477| 2.795670| 2.756685| 2.792291| 3.352325|    30|\r\n|bcpExportNchar | 2.589367| 2.704135| 2.765784| 2.734957| 2.797286| 3.479074|    30|\r\n|fwriteQuery    | 7.429731| 7.602853| 7.645852| 7.654730| 7.703634| 7.868419|    30|\r\n\r\n*Time in seconds*\r\n\r\n### Export Query\r\n\r\n```r\r\nquery \u003c- 'SELECT * FROM [dbo].[importTable1] WHERE int \u003c 1000'\r\nqueryResults \u003c- microbenchmark::microbenchmark(\r\n  bcpExportQueryChar = {\r\n    bcpExport('inst/benchmarks/test4.csv',\r\n              connectargs = connectArgs,\r\n              query = query,\r\n              fieldterminator = ',',\r\n              stdout = FALSE)\r\n  },\r\n  bcpExportQueryNchar = {\r\n    bcpExport('inst/benchmarks/test5.csv',\r\n              connectargs = connectArgs,\r\n              query = query,\r\n              fieldterminator = ',',\r\n              stdout = FALSE)\r\n  },\r\n  fwriteQuery = {\r\n    fwrite(DBI::dbGetQuery(con, query),\r\n           'inst/benchmarks/test6.csv', dateTimeAs = 'write.csv',\r\n           col.names = FALSE)\r\n  },\r\n  times = 30L,\r\n  unit = 'seconds'\r\n)\r\nqueryResults\r\n```\r\n\r\n|expr                |       min|        lq|      mean|    median|        uq|       max| neval|\r\n|:-------------------|---------:|---------:|---------:|---------:|---------:|---------:|-----:|\r\n|bcpExportQueryChar  | 0.3444491| 0.4397317| 0.4557119| 0.4490924| 0.4615573| 0.7237182|    30|\r\n|bcpExportQueryNchar | 0.3305265| 0.4444705| 0.4412670| 0.4500690| 0.4605971| 0.4815894|    30|\r\n|fwriteQuery         | 0.6737879| 0.7141933| 0.7421377| 0.7311998| 0.7548233| 0.9143555|    30|\r\n\r\n*Time in seconds*\r\n\r\n### Import Geometry\r\n\r\nImporting spatial data from 'sf' objects is also supported. The sql statements \r\nafter import are to produce equivalent tables in the database.\r\n\r\n```r\r\nlibrary(sf)\r\nnc \u003c- st_read(system.file(\"gpkg/nc.gpkg\", package = \"sf\"))\r\ndivN \u003c- 10\r\nshp1 \u003c- cbind(nc[sample.int(nrow(nc), n / divN, replace = TRUE),],\r\n  importTable[seq_len(n / divN), ],\r\n  id = seq_len(n / divN))\r\ngeometryResults \u003c- microbenchmark::microbenchmark(\r\n  bcpImportGeometry = {\r\n    bcpImport(shp1,\r\n      connectargs = connectArgs,\r\n      table = 'shp1',\r\n      overwrite = TRUE,\r\n      stdout = FALSE,\r\n      spatialtype = 'geometry',\r\n      bcpOptions = list(\"-b\", 50000, \"-a\", 4096, \"-m\", 0))\r\n  },\r\n  odbcImportGeometry = {\r\n    con \u003c- DBI::dbConnect(odbc::odbc(),\r\n      driver = driver,\r\n      server = server,\r\n      database = database,\r\n      trusted_connection = 'yes')\r\n    tableName \u003c- 'shp2'\r\n    spatialType \u003c- 'geometry'\r\n    geometryColumn \u003c- 'geom'\r\n    binaryColumn \u003c- 'geomWkb'\r\n    srid \u003c- sf::st_crs(nc)$epsg\r\n    shpBin2 \u003c- data.table(shp1)\r\n    data.table::set(x = shpBin2, j = binaryColumn,\r\n      value = blob::new_blob(lapply(sf::st_as_binary(shpBin2[[geometryColumn]]),\r\n        as.raw)))\r\n    data.table::set(x = shpBin2, j = geometryColumn, value = NULL)\r\n    dataTypes \u003c- DBI::dbDataType(con, shpBin2)\r\n    dataTypes[binaryColumn] \u003c- 'varbinary(max)'\r\n    DBI::dbWriteTable(conn = con, name = tableName, value = shpBin2,\r\n      overwrite = TRUE, field.types = dataTypes)\r\n    DBI::dbExecute(conn = con, sprintf('alter table %1$s add %2$s %3$s;',\r\n      tableName, geometryColumn, spatialType))\r\n    DBI::dbExecute(conn = con,\r\n      sprintf('UPDATE %1$s\r\n    SET geom = %3$s::STGeomFromWKB([%4$s], %2$d);\r\n    ALTER TABLE %1$s DROP COLUMN [%4$s];', tableName, srid, spatialType,\r\n        binaryColumn)\r\n    )\r\n  },\r\n  bcpImportGeography = {\r\n    bcpImport(shp1,\r\n      connectargs = connectArgs,\r\n      table = 'shp3',\r\n      overwrite = TRUE,\r\n      stdout = FALSE,\r\n      spatialtype = 'geography',\r\n      bcpOptions = list(\"-b\", 50000, \"-a\", 4096, \"-m\", 0))\r\n  },\r\n  odbcImportGeography = {\r\n    con \u003c- DBI::dbConnect(odbc::odbc(),\r\n      driver = driver,\r\n      server = server,\r\n      database = database,\r\n      trusted_connection = 'yes')\r\n    tableName \u003c- 'shp4'\r\n    spatialType \u003c- 'geography'\r\n    geometryColumn \u003c- 'geom'\r\n    binaryColumn \u003c- 'geomWkb'\r\n    srid \u003c- sf::st_crs(nc)$epsg\r\n    shpBin4 \u003c- data.table(shp1)\r\n    data.table::set(x = shpBin4, j = binaryColumn,\r\n      value = blob::new_blob(lapply(sf::st_as_binary(shpBin4[[geometryColumn]]),\r\n        as.raw)))\r\n    data.table::set(x = shpBin4, j = geometryColumn, value = NULL)\r\n    dataTypes \u003c- DBI::dbDataType(con, shpBin4)\r\n    dataTypes[binaryColumn] \u003c- 'varbinary(max)'\r\n    DBI::dbWriteTable(conn = con, name = tableName, value = shpBin4,\r\n      overwrite = TRUE, field.types = dataTypes)\r\n    DBI::dbExecute(conn = con, sprintf('alter table %1$s add %2$s %3$s;',\r\n      tableName, geometryColumn, spatialType))\r\n    DBI::dbExecute(conn = con,\r\n      sprintf('UPDATE %1$s\r\n    SET geom = %3$s::STGeomFromWKB([%4$s], %2$d);\r\n    ALTER TABLE %1$s DROP COLUMN [%4$s];', tableName, srid, spatialType,\r\n        binaryColumn)\r\n    )\r\n    DBI::dbExecute(conn = con,\r\n      sprintf(\r\n        'UPDATE %1$s SET [%2$s] = [%2$s].MakeValid().ReorientObject().MakeValid()\r\n   WHERE [%2$s].MakeValid().EnvelopeAngle() \u003e 90;',\r\n        tableName, geometryColumn))\r\n  },\r\n  times = 30L,\r\n  unit = 'seconds'\r\n)\r\ngeometryResults\r\n```\r\n\r\n|expr                |      min|       lq|     mean|   median|       uq|       max| neval|\r\n|:-------------------|--------:|--------:|--------:|--------:|--------:|---------:|-----:|\r\n|bcpImportGeometry   | 18.01451| 19.48747| 20.68834| 20.45136| 21.74212|  26.87033|    30|\r\n|odbcImportGeometry  | 18.29721| 20.63363| 22.35044| 21.29087| 24.04490|  27.81112|    30|\r\n|bcpImportGeography  | 71.23260| 75.04588| 82.65286| 76.36985| 96.68469| 102.70909|    30|\r\n|odbcImportGeography | 73.29818| 76.12481| 84.58432| 77.93419| 97.36155| 107.00186|    30|\r\n\r\n*Time in seconds*\r\n","funding_links":["https://github.com/sponsors/tomroh"],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomroh%2Fbcputility","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomroh%2Fbcputility","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomroh%2Fbcputility/lists"}