{"id":15893000,"url":"https://github.com/roldanramon/kms_bigdata_loaddatato_sqlserver","last_synced_at":"2026-04-29T18:31:11.927Z","repository":{"id":257001416,"uuid":"856886187","full_name":"RoldanRamon/kms_BigData_LoadDataTo_SQLServer","owner":"RoldanRamon","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-13T17:55:57.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T08:14:38.441Z","etag":null,"topics":["python","r","sql"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RoldanRamon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-13T11:52:01.000Z","updated_at":"2024-09-13T17:56:00.000Z","dependencies_parsed_at":"2024-09-14T08:59:23.066Z","dependency_job_id":"1500e912-6dbb-44fc-9ffa-a091d31aa212","html_url":"https://github.com/RoldanRamon/kms_BigData_LoadDataTo_SQLServer","commit_stats":null,"previous_names":["roldanramon/kms_bigdata_loaddatato_sqlserver"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoldanRamon%2Fkms_BigData_LoadDataTo_SQLServer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoldanRamon%2Fkms_BigData_LoadDataTo_SQLServer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoldanRamon%2Fkms_BigData_LoadDataTo_SQLServer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoldanRamon%2Fkms_BigData_LoadDataTo_SQLServer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RoldanRamon","download_url":"https://codeload.github.com/RoldanRamon/kms_BigData_LoadDataTo_SQLServer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246863556,"owners_count":20846273,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","r","sql"],"created_at":"2024-10-06T08:06:03.425Z","updated_at":"2026-04-29T18:31:11.894Z","avatar_url":"https://github.com/RoldanRamon.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 **Efficient Ways to Load Large Datasets into SQL Server** 🗄️\n\nHandling large datasets can be challenging, especially when transferring them to a SQL Server hosted on-premises. This project demonstrates **three efficient ways** to upload datasets larger than **3 GB** from a local machine to an on-premises SQL Server. Whether you're dealing with performance bottlenecks or want to explore different approaches, these methods will help you move data quickly and effectively. 💾\n\n---\n\n## ⚡ Aparklyr + dbplyr 🌟\n\nFor large datasets, this approach leverages Apache Spark to distribute data processing, making it ideal for handling massive files (3 GB+). dbplyr allows you to write dplyr-like code that translates directly into SQL commands, all executed within the Spark environment. Here's a quick \n\n## 🏎️ Approach 3: data.table + ODBC + DBI 💨\n\nWhen speed is crucial, the combination of data.table and DBI delivers blazing performance. data.table is incredibly fast for in-memory operations, and DBI ensures a smooth and secure connection to your SQL Server.\n\n## 🔍 Why These Approaches?\n\nScalability: Handle datasets over 3 GB seamlessly.\nFlexibility: Choose the approach that best suits your data structure and workflow.\nEfficiency: Leverage R's ecosystem for both in-memory and distributed data processing.\n\n## 📚 Getting Started\n\n1. Install the required packages:\n\n```r\ninstall.packages(c(\"sparklyr\", \"dplyr\", \"dbplyr\", \"DBI\", \"odbc\", \"data.table\", \"janitor\"))\n```\n\n2. Clone this repository (SSH):\n\n```r\ngit clone git@github.com:RoldanRamon/kms_BigData_LoadDataTo_SQLServer.git\n```\n\n3. Ensure you have Java installed for sparklyr:\n\nInstall Java for Windows\nInstall Java for Linux\nInstall Java for Mac\n\n4. Run one of the provided scripts to see the methods in action!\n\n## 🏗️ Project Structure\n\n```r\n📁 root/\n├── 📄 README.md\n├── 📂 scripts/\n│   ├── odbc_dplyr.R        # Approach 1: ODBC + dplyr\n│   ├── sparklyr_dbplyr.R   # Approach 2: Sparklyr + dbplyr\n│   └── data_table_dbi.R    # Approach 3: data.table + DBI\n│   ├── initial_data_read.py  # Python script used to read the data initially\n│   └── create_tables.sql   # SQL script to create tables with correct data types\n└── 📂 data/\n    └── large-dataset.csv   # Example dataset (replace with your own)\n```\n\n### 🐍 Python Script (initial_data_read.py)\nThis Python script is used to read the large dataset before it's passed to R for further processing and loading into SQL Server.\n\n### 🛢️ SQL Script (create_tables.sql)\nThis SQL script is used to create the necessary tables in SQL Server with the correct primitive data types before loading the dataset.\n\n## 📞 Support\nFeel free to raise an issue if you run into any problems or have questions. You can also reach me on LinkedIn or via email.\n\n## 🎉 Contributions\nContributions are welcome! If you'd like to improve the scripts or add new methods, feel free to open a pull request.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froldanramon%2Fkms_bigdata_loaddatato_sqlserver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froldanramon%2Fkms_bigdata_loaddatato_sqlserver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froldanramon%2Fkms_bigdata_loaddatato_sqlserver/lists"}