{"id":36772430,"url":"https://github.com/gsiems/bac-tract","last_synced_at":"2026-01-12T13:06:00.266Z","repository":{"id":57559067,"uuid":"151894296","full_name":"gsiems/bac-tract","owner":"gsiems","description":"Extract data from bacpac files","archived":false,"fork":false,"pushed_at":"2025-03-15T20:28:52.000Z","size":118,"stargazers_count":5,"open_issues_count":1,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-15T21:32:33.871Z","etag":null,"topics":["etl","mssql","oracle","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsiems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-07T00:54:24.000Z","updated_at":"2025-03-15T20:28:48.000Z","dependencies_parsed_at":"2024-06-09T16:13:42.526Z","dependency_job_id":"f7df37aa-1861-4a87-9a18-1f45b5b7c1e5","html_url":"https://github.com/gsiems/bac-tract","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/gsiems/bac-tract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsiems%2Fbac-tract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsiems%2Fbac-tract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsiems%2Fbac-tract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsiems%2Fbac-tract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsiems","download_url":"https://codeload.github.com/gsiems/bac-tract/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsiems%2Fbac-tract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28338992,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T12:22:26.515Z","status":"ssl_error","status_checked_at":"2026-01-12T12:22:10.856Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etl","mssql","oracle","postgresql"],"created_at":"2026-01-12T13:06:00.210Z","updated_at":"2026-01-12T13:06:00.257Z","avatar_url":"https://github.com/gsiems.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bac-tract\n\nExtract data from MS SQL-Server bacpac files (BACpac-exTRACT).\n\nA bacpac file is a means of getting data out of MS SQL-Server instances.\n\nNB a bacpac file is simply a zip archive of other files and that the\nfiles of interest are the model.xml file and the exported data files\nunder the Data directory.\n\nThe commands/tools available consist of:\n\n* bp2col: Extracts column metadata for one or more tables from an unzipped bacpac file\n\n* bp2csv: Extracts one or more tables from an unzipped bacpac file and writes the output to comma-separated file(s)\n\n* bp2ddl: Generates table creation DDL for one or more tables from an unzipped bacpac file\n\n* bp2ora: Extracts one or more tables from an unzipped bacpac file and writes the output to Oracle SQL*Loader control and data files\n\n* bp2pg: Extracts one or more tables from an unzipped bacpac file and writes the output to pg_dump file(s)\n\nNB that these tools all require that the bacpac file has already been unzipped.\n\n\nCommon flags used by the tools are:\n\n```\n\n    -b Base directory containing the unzipped bacpac file.\n\n    -c The number of rows of data to extract per table (bp2csv, bp2ora,\n        bp2pg). Defaults to extracting all rows of data.\n\n    -d The SQL dialect to output (bp2ddl). Valid dialects are\n        Ora (Oracle), Pg (Postresql), and Std (Standard).\n\n    -e The column meta-data exceptions file to use (should there be a need).\n\n    -f The file to read that contains the names of the tables to\n        extract (the tables are listed one per line).\n\n    -t The name of the table to extract.\n\n    -w The number of parallel workers to use (bp2ora only) for\n        extracting the data.\n\n    -debug Write debugging information to STDOUT (bp2csv, bp2ora, bp2pg).\n\n```\n\n# Column meta-data exceptions\n\nThere are sometimes issues when extracting the data from the bacpac due\nto the data being stored slightly differently from what the data model\nindicates. When this happens the extraction will crash shortly after\nencountering one of the anomolies. In that it is unknown how to predict\nwhere the anomolies will be found there is support for overriding the\nmodel meta-data to help the data extraction do the right thing.\n\nExamples of the issues seen so far\n\n 1. The first issue is of a not-null char column not parsing the same\n as all the other not-null char columns. When a char column is defined\n as not nullable then the typical behavior is to not insert the \"size\n bytes\" data as it is not needed. However, in one of the tables tested\n a a not-null char column also has size bytes data. The code attempts\n to, and (so far) mostly succeeds, in mitigating this behavior without\n needing to use any meta-data exceptions.\n\n 2. The second issue involves the datafile sometimes having six null\n (0x00) bytes inserted between the size bytes and data bytes of varchar\n columns. Since no string data should start with null bytes this issue\n appears to have a straight-forward mmitigation. This behavior has been\n observed in two to three of the 200+ tables tested. Fortunately, this\n issue is managed by the data extraction code without need for using\n meta-data exceptions.\n\n 3. The third issue appears to involve inserting a set of six 0xff\n bytes before not-null integer columns. This behavior has been observed\n in 3 of the 200+ tables tested. Programatically identifying and\n mitigating this issue has, so far, proved more difficult than the\n first two issues.\n\n 4. The forth issue found is similar to the first in that a not-null\n column has size bytes-- in this case it is the bit datatype.\n\nAn example exceptions file is included under the cmd directory (see\ncolExceptionsExample.json). The file consists of a JSON array of one or\nmore exceptions, one exception per problematic column. While most of\nthe elements in the file should be self explanatory the isAdulterated\nelement is not-- this element is currently only used for indicating\nthose integer columns that exhibit the behavior in issue three above.\n\nIt should be noted that the tested bacpac files apparently do load\ncorrectly into MS SQL-Server such that these issues aren't visible to\nMS SQL-Server environments. Whether this is due to buggy behavior in\nthe bacpac exporting code that the bacpac importing code is able work\naround, or whether this is intentional (anti-competitive?) behavior I\ncannot say although MS history impies that it could be either or both.\n\nRunning the command with the debug flag set can be used to assist in\ntroubleshooting these anomolies if/when they occur where the\ninformation is written to STDOUT and is very verbose. As each column of\ndata is read the column meta-data, the steps taken in parsing it, and a\nsynopsis of the read data is output. The following example shows the\nresults of parsing two columns from one row of data. Each column is\nseparated by a blank line. The first line of output contains the column\nmeta-data (columnName, dataType, Length, Precision, Scale, and\nisNullable). The followup lines detail the function used to read the\ndata, reading of storage bytes (nullable columns only), reading of data\nbytes, a synopsis of the data (hex-dump and plain text) extracted, and\nwhether the column was null or not.\n\n```\n\"column_name\" int 0, 0, 0, true\nFunc readInteger\nreadStoredSize: Attempting to read 1 bytes\nBytes: 0x04\nreadInteger: Attempting to read 4 bytes\nBytes: 0x8a 0x25 0x00 0x00\nStr: 9610\nIsNull: false\n\n\"column_name\" varchar 1500, 0, 0, true\nFunc readString\nreadStoredSize: Attempting to read 2 bytes\nBytes: 0xc0 0x03\nreadString: Attempting to read 960 bytes\nBytes: 0x4c 0x00 0x69 0x00 0x63 0x00 0x65 0x00 0x6e 0x00 0x73 0x00 0x65 0x00 0x65 0x00 0x20 0x00 0x6d 0x00 0x6f 0x00 0x64 0x00  ... 0x65 0x00 0x2e 0x00\nStr: Licensee modifications t ... nce.\nIsNull: false\n\n```\n\nNB changing the code to allow enabling debug via command-line flag does\nimpose a ~4% penalty on performance even when not used (edit\nbactrac/main.go to completely disable this and get that performance\nback).\n\n# Supported datatypes\n\nMost of the supported datatypes are based on reverse engineering existing\nbacpac files where there is sufficient data to be able to properly\nextract/translate the data. Other supported datatypes rely on there being\nsufficient information found to reasonably attempt extracting/translating\nthe data.\n\nSupported datatypes consist of:\n\n * bigint\n * binary (parse only)\n * bit\n * char\n * date (have no suitable bacpac for testing)\n * datetime2\n * datetime\n * decimal\n * float\n * geography (parse, translate point types to WKT-ish form)\n * int\n * money\n * ntext\n * nvarchar\n * real\n * smalldatetime\n * smallint\n * smallmoney (have no suitable bacpac for testing)\n * text\n * time (have no suitable bacpac for testing)\n * tinyint\n * uniqueidentifier (have no suitable bacpac for testing)\n * varbinary (parse only)\n * varchar\n\nNB that the CollationLcid for the bacpac files examined is 1033 and\nthat it is unknown what impact other collations might have on the\nparsing and interpreting of bacpack file data.\n\nNB these tools require an already un-zipped bacpac file. Writing the\ntools to work with the zipped bacpac file was considered out of scope and\nnot really necessary.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsiems%2Fbac-tract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsiems%2Fbac-tract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsiems%2Fbac-tract/lists"}