{"id":20668692,"url":"https://github.com/azure-samples/smartbulkcopy","last_synced_at":"2025-04-19T18:08:55.923Z","repository":{"id":38617432,"uuid":"201573760","full_name":"Azure-Samples/smartbulkcopy","owner":"Azure-Samples","description":"High-Speed Bulk Copy tool to move data from one Azure SQL / SQL Server database to another. Smartly uses logical or physical partitions to maximize speed.","archived":false,"fork":false,"pushed_at":"2024-01-16T19:38:17.000Z","size":327,"stargazers_count":55,"open_issues_count":12,"forks_count":22,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-04-11T17:03:13.288Z","etag":null,"topics":["azure-sql-database","azure-sql-server","bulk-copy","sql-server"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Azure-Samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-10T03:25:58.000Z","updated_at":"2024-08-02T23:23:34.937Z","dependencies_parsed_at":"2024-01-13T11:13:38.856Z","dependency_job_id":"adfb8f4d-a46e-4778-a289-5a9adc164a3e","html_url":"https://github.com/Azure-Samples/smartbulkcopy","commit_stats":null,"previous_names":["yorek/smartbulkcopy"],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fsmartbulkcopy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fsmartbulkcopy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fsmartbulkcopy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fsmartbulkcopy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Azure-Samples","download_url":"https://codeload.github.com/Azure-Samples/smartbulkcopy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249758599,"owners_count":21321557,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-sql-database","azure-sql-server","bulk-copy","sql-server"],"created_at":"2024-11-16T20:10:54.564Z","updated_at":"2025-04-19T18:08:55.905Z","avatar_url":"https://github.com/Azure-Samples.png","language":"C#","readme":"---\npage_type: sample\nlanguages:\n- tsql\n- sql\n- csharp\nproducts:\n- azure-sql-database\n- sql-server\n- azure-sql-managed-instance\n- azure-sqlserver-vm\n- azure\n- dotnet\ndescription: \"Smart, High-Speed, Bulk Copy tool to move data from one Azure SQL or SQL Server database to another\"\nurlFragment: smart-bulk-copy\n---\n\n# Smart Bulk Copy\n\n\u003c!-- \nGuidelines on README format: https://review.docs.microsoft.com/help/onboard/admin/samples/concepts/readme-template?branch=master\n\nGuidance on onboarding samples to docs.microsoft.com/samples: https://review.docs.microsoft.com/help/onboard/admin/samples/process/onboarding?branch=master\n\nTaxonomies for products and languages: https://review.docs.microsoft.com/new-hope/information-architecture/metadata/taxonomies?branch=master\n--\u003e\n\n![License](https://img.shields.io/badge/license-MIT-green.svg) ![Run Tests](https://github.com/yorek/smartbulkcopy/workflows/Run%20Tests/badge.svg) \n\n*Latest Stable Version: 1.9.9*\n\nSmart, High-Speed, Bulk Copy tool to move data from one Azure SQL / SQL Server database to another. Smartly uses logical or physical partitions to maximize transfer speed using parallel copy tasks.\n\nIt can be also used to efficiently and quickly move data from two instances of SQL Server running in two different cloud providers or to move from on-premises to the cloud.\n\nSmart Bulk Copy is also available as a [Docker Image](https://hub.docker.com/repository/docker/yorek/smartbulkcopy). To run Smart Bulk Copy via docker, you have to map a volume where the desired .config file can be found. For example (on Windows):\n\n```\ndocker run -it -v c:\\work\\_git\\smart-bulk-copy\\client\\configs:/app/client/configs yorek/smartbulkcopy:latest /app/client/configs/smartbulkcopy.config.json\n```\n\nYou can also run Smart Bulk Copy using [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview). Use the provided `azure-deploy.sh` script to create an ACI and execute Smart Bulk Copy. Make sure you have created the `smartbulkcopy.config.json` file in the `client/configs` folder before running the `.sh` script.\n\n## How it works\n\nSmart Bulk Copy uses [Bulk Copy API](https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy) with parallel tasks. A source table is split in partitions, and each partition is copied in parallel with others, up to a defined maximum, in order to use all the available network bandwidth and all the cloud or server resources available to minimize the load times.\n\n### Partitioned Source Tables\n\nWhen a source table is partitioned, it uses the physical partitions to execute several queries like the following:\n\n```sql\nSELECT * FROM \u003csourceTable\u003e WHERE $partition.\u003cpartitionFunction\u003e(\u003cpartitionColumn\u003e) = \u003cn\u003e\n```\n\nQueries are executed in parallel to load, always in parallel, data into the destination table. `TABLOCK` options is used - when possible and needed - on the table to allow fully parallelizable bulk inserts. `ORDER` option is also used when possible to minimize the sort operations on the destination table, for example when inserting into a table with an existing clustered rowstore index.\n\n### Non-Partitioned Source Tables\n\nIf a source table is not partitioned, then Smart Bulk Copy will use the `%%PhysLoc%%` virtual column to logically partition tables into non-overlapping partitions that can be safely read in parallel. `%%PhysLoc%%` is *not* documented, but more info are available here:\n\n[Where is a record really located?](https://techcommunity.microsoft.com/t5/Premier-Field-Engineering/Where-is-a-record-really-located/ba-p/370972)\n\nIf the configuration file specifies a value greater than 1 for `logical-partitions` the following query will be used to read the logical partitions in parallel:\n\n```sql\nSELECT * FROM \u003csourceTable\u003e WHERE ABS(CAST(%%PhysLoc%% AS BIGINT)) % \u003clogical-partitions-count\u003e = \u003cn\u003e\n```\n\n*PLEASE NOTE* that the physical position of a row may change at any time if there is any activity on the database (updates, index reorgs, etc...) so it is recommended that this approach is used only in three cases:\n\n1. You're absolutely sure there is no activity of any kind on the source database, or\n2. You're using a database snapshot as the source database\n3. You're using a database set in READ_ONLY mode\n\n## Heaps, Clustered Rowstores, Clustered Columnstores\n\nFrom version 1.7 Smart Bulk Copy will smartly copy tables with no clustered index (heaps), and tables with clustered index (rowstore or columnstore it doesn't matter.)\n\nCouple of notes for tables with Clustered Columnstore index:\n- Smart Bulk Copy will always use a Batch Size of a minimum of 102400 rows, no matter what specified in the configuration as per [best practices](https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-data-loading-guidance?view=sql-server-ver15#plan-bulk-load-sizes-to-minimize-delta-rowgroups). If you have columnstore it is generally recommended to increase the value to 1048576 in order to maximize compression and [reduce number of rowgroups](https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-data-loading-guidance?view=sql-server-ver15#plan-bulk-load-sizes-to-minimize-delta-rowgroups).\n- When copying a Columnstore table, you may see very low values (\u003c20Mb/Sec) for the \"Log Flush Speed\". *This is correct and expected* as Columnstore is extremely compressed and thus the log generation rate (which is what is measured by the Log Flush Speed) is much lower than with Rowstore tables.\n\n## How to use it\n\nDownload or clone the repository, make sure you have .NET Core 3.1 installed and then build Smart Bulk Copy:\n\n```bash\ndotnet build\n```\n\nThen, in the `client` folder, create a `smartbulkcopy.config` file from the provided `client/config/smartbulkcopy.config.template`. If you want to start right away just, provide source and destination connection strings and leave all the options as is. Make sure the source database is a database snapshot:\n\n[Create a Database Snapshot](https://docs.microsoft.com/en-us/sql/relational-databases/databases/create-a-database-snapshot-transact-sql?view=sql-server-2017)\n\nOr that the database is set to be in Read-Only mode:\n\n[Setting the database to READ_ONLY](https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-database-transact-sql-set-options?view=sql-server-2017#b-setting-the-database-to-read_only)\n\nThen just run:\n\n```bash\ncd client\ndotnet run\n```\n\nand Smart Bulk Copy will start to copy data from source database to destination database. Please keep in mind that *all destination tables will be truncated by default*. This means that Foreign key constraints must be dropped in the destination database before copying. Read more about `TRUNCATE TABLE` restrictions here: [TRUNCATE TABLE](https://docs.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-2017#restrictions)\n\n## Configuration Notes\n\nHere's how you can change Smart Bulk Copy configuration to better suits your needs. Everything is conveniently found in `smartbulkcopy.config` file. Aside from the obvious source and destination connection strings, here's the configuration options you can use:\n\n### Tables to copy\n\n`tables`: an array of string values the contains the two-part names of the table you want to copy. For example:\n\n```\n'tables': ['dbo.Table1', 'Sales.Customers']\n```\n\nAn asterisk `*` will be expanded to all tables available in the source database:\n\n```\n'tables': ['*']\n```\n\nYou can use schema to limit the wildcard scope:\n\n```\n'tables': ['dbo.*']\n```\n\nFrom **version 1.7.1** you can also specify tables to be included and excluded:\n\n```\n\"tables\": {\n    \"include\": [\"dbo.*\"],\n    \"exclude\": [\"dbo.ORDERS\"]\n}\n```\n\n### Configuration Options\n\nSmart Bulk Copy is highly configurable. Read more in the dedicated document: [Smart Bulk Copy Configuration Options](./docs/CONFIG.md)\n\n## Notes on Azure SQL\n\nAzure SQL is log-rated as described in [Transaction Log Rate Governance](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-resource-limits-database-server#transaction-log-rate-governance) and it can do 96 MB/sec of log flushing. Smart Bulk Load will report the detected log flush speed every 5 seconds so that you can check if you can actually increase the number of parallel task to go faster or you're already at the limit. Please remember that 96 MB/Sec are done with higher SKU, so if you're already using 7 parallel tasks and you're not seeing something close to 96 MB/Sec please check that\n\n1. You have enough network bandwidth (this should not be a problem if you're copying data from cloud to cloud)\n2. You're not using some very low SKU (like [P1 or lower](https://docs.microsoft.com/en-us/azure/azure-sql/database/resource-limits-dtu-single-databases) or just [2 vCPU](https://docs.microsoft.com/en-us/azure/azure-sql/database/resource-limits-vcore-single-databases)). In this case move to an higher SKU for the bulk load duration. \n\nThere are a couple of exceptions to what just described:\n- [Azure SQL Hyperscale](https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale) always provides 100 MB/Sec of maximum log throughput, no matter the number of vCores. Of course, if using a small number of cores on Hyperscale, other factors (for example: sorting when inserting into a table with indexes) could come into play and prevent you to reach the mentioned 100 Mb/Sec. \n- [M-series](https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers-vcore?tabs=azure-portal#m-series) that can do up to 256 MB/sec of log throughput.\n\n## Observed Performances\n\nTests have been ran using the `LINEITEM` table of TPC-H 10GB test database. Uncompressed table size is around 8.8 GB with 59,986,052 rows. Source database was a SQL Server 2017 VM running on Azure and the target was Azure SQL Hyperscale Gen8 8vCores. Smart Bulk Copy was running on the same Virtual Machine where also source database was hosted. Both the VM and the Azure SQL database were in the same region. \nUsed configuration settings:\n\n```json\n\"tasks\": 7,\n\"logical-partitions\": \"auto\",\n\"batch-size\": 100000\n```\n\nHere's the result of the tests:\n\n|Table|Copy Time (in sec)|\n|---|---|\n|HEAP|135 |\n|HEAP, PARTITIONED|**111**|\n|CLUSTERED ROWSTORE|505|\n|CLUSTERED ROWSTORE, PARTITIONED |207|\n|CLUSTERED COLUMNSTORE|315|\n|CLUSTERED COLUMNSTORE, PARTITIONED |196|\n\n## Questions and Answers\n\nAs the document was getting longer and longer, it has been moved here: [Smart Bulk Copy FAQ](./docs/FAQ.md)\n\n## Tests\n\nThis tool has been tested against the following sample database with success:\n\n- TPC-H\n- TPC-E\n- AdventureWorks2012\n- AdventureWorks2014\n- AdventureWorksDW2012\n- AdventureWorksDW2014\n\nNote that Foreign Keys and Views were dropped from target table before starting bulk copy\n\nSmart Bulk Copy has been tested with the following SQL Server engine versions:\n\n- Azure SQL Database\n- Azure SQL Managed Instance\n- SQL Server 2019\n- SQL Server 2017\n- SQL Server 2016\n- SQL Server 2014\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazure-samples%2Fsmartbulkcopy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fazure-samples%2Fsmartbulkcopy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazure-samples%2Fsmartbulkcopy/lists"}