{"id":15060691,"url":"https://github.com/bxparks/bigquery-schema-generator","last_synced_at":"2025-04-12T14:17:28.169Z","repository":{"id":38802442,"uuid":"115892691","full_name":"bxparks/bigquery-schema-generator","owner":"bxparks","description":"Generates the BigQuery schema from newline-delimited JSON or CSV data records.","archived":false,"fork":false,"pushed_at":"2024-01-13T00:20:34.000Z","size":5784,"stargazers_count":243,"open_issues_count":3,"forks_count":50,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-04-12T14:17:26.445Z","etag":null,"topics":["bigquery","bigquery-schema","google-bigquery","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bxparks.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-01T00:59:49.000Z","updated_at":"2025-04-01T06:40:16.000Z","dependencies_parsed_at":"2023-02-01T04:45:58.898Z","dependency_job_id":"8dd816ad-38ee-4faf-8fd0-8ef6b5230429","html_url":"https://github.com/bxparks/bigquery-schema-generator","commit_stats":{"total_commits":184,"total_committers":9,"mean_commits":"20.444444444444443","dds":"0.11413043478260865","last_synced_commit":"11bf0a2bd90a4f00385528c13caef7c0d3125b87"},"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bxparks%2Fbigquery-schema-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bxparks%2Fbigquery-schema-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bxparks%2Fbigquery-schema-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bxparks%2Fbigquery-schema-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bxparks","download_url":"https://codeload.github.com/bxparks/bigquery-schema-generator/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248578876,"owners_count":21127714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","bigquery-schema","google-bigquery","python3"],"created_at":"2024-09-24T23:03:10.478Z","updated_at":"2025-04-12T14:17:28.147Z","avatar_url":"https://github.com/bxparks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BigQuery Schema Generator\n\n[![BigQuery Schema Generator CI](https://github.com/bxparks/bigquery-schema-generator/actions/workflows/pythonpackage.yml/badge.svg)](https://github.com/bxparks/bigquery-schema-generator/actions/workflows/pythonpackage.yml)\n\nThis script generates the BigQuery schema from the newline-delimited data\nrecords on the STDIN. The records can be in JSON format or CSV format. The\nBigQuery data importer (`bq load`) uses only the\n[first 500 records](https://cloud.google.com/bigquery/docs/schema-detect)\nwhen the schema auto-detection feature is enabled. In contrast, this script uses\nall data records to generate the schema.\n\nUsage:\n```\n$ generate-schema \u003c file.data.json \u003e file.schema.json\n$ generate-schema --input_format csv \u003c file.data.csv \u003e file.schema.json\n```\n\n**Version**: 1.6.1 (2024-01-12)\n\n**Changelog**: [CHANGELOG.md](CHANGELOG.md)\n\n## Table of Contents\n\n* [Background](#Background)\n* [Installation](#Installation)\n    * [Ubuntu Linux](#UbuntuLinux)\n    * [MacOS](#MacOS)\n        * [MacOS 12 (Monterey)](#MacOS12)\n        * [MacOS 11 (Big Sur)](#MacOS11)\n        * [MacOS 10.14 (Mojave)](#MacOS1014)\n* [Usage](#Usage)\n    * [Command Line](#CommandLine)\n    * [Schema Output](#SchemaOutput)\n    * [Command Line Flag Options](#FlagOptions)\n        * [Help (`--help`)](#Help)\n        * [Input Format (`--input_format`)](#InputFormat)\n        * [Keep Nulls (`--keep_nulls`)](#KeepNulls)\n        * [Quoted Values Are Strings(`--quoted_values_are_strings`)](#QuotedValuesAreStrings)\n        * [Infer Mode (`--infer_mode`)](#InferMode)\n        * [Debugging Interval (`--debugging_interval`)](#DebuggingInterval)\n        * [Debugging Map (`--debugging_map`)](#DebuggingMap)\n        * [Sanitize Names (`--sanitize_names`)](#SanitizedNames)\n        * [Ignore Invalid Lines (`--ignore_invalid_lines`)](#IgnoreInvalidLines)\n        * [Existing Schema Path (`--existing_schema_path`)](#ExistingSchemaPath)\n        * [Preserve Input Sort Order\n          (`--preserve_input_sort_order`)](#PreserveInputSortOrder)\n    * [Using as a Library](#UsingAsLibrary)\n        * [`SchemaGenerator.run()`](#SchemaGeneratorRun)\n        * [`SchemaGenerator.deduce_schema()` from\n          File](#SchemaGeneratorDeduceSchemaFromFile)\n        * [`SchemaGenerator.deduce_schema()` from\n          Dict](#SchemaGeneratorDeduceSchemaFromDict)\n        * [`SchemaGenerator.deduce_schema()` from\n          DictReader](#SchemaGeneratorDeduceSchemaFromCsvDictReader)\n* [Schema Types](#SchemaTypes)\n    * [Supported Types](#SupportedTypes)\n    * [Type Inference](#TypeInference)\n* [Examples](#Examples)\n* [Benchmarks](#Benchmarks)\n* [System Requirements](#SystemRequirements)\n* [License](#License)\n* [Feedback and Support](#Feedback)\n* [Authors](#Authors)\n\n\u003ca name=\"Background\"\u003e\u003c/a\u003e\n## Background\n\nData can be imported into [BigQuery](https://cloud.google.com/bigquery/) using\nthe [bq](https://cloud.google.com/bigquery/bq-command-line-tool) command line\ntool. It accepts a number of data formats including CSV or newline-delimited\nJSON. The data can be loaded into an existing table or a new table can be\ncreated during the loading process. The structure of the table is defined by\nits [schema](https://cloud.google.com/bigquery/docs/schemas). The table's\nschema can be defined manually or the schema can be\n[auto-detected](https://cloud.google.com/bigquery/docs/schema-detect#auto-detect).\n\nWhen the auto-detect feature is used, the BigQuery data importer examines only\nthe [first 500 records](https://cloud.google.com/bigquery/docs/schema-detect)\nof the input data. In many cases, this is sufficient\nbecause the data records were dumped from another database and the exact schema\nof the source table was known. However, for data extracted from a service\n(e.g. using a REST API) the record fields could have been organically added\nat later dates. In this case, the first 500 records do not contain fields which\nare present in later records. The **bq load** auto-detection fails and the data\nfails to load.\n\nThe **bq load** tool does not support the ability to process the entire dataset\nto determine a more accurate schema. This script fills in that gap. It\nprocesses the entire dataset given in the STDIN and outputs the BigQuery schema\nin JSON format on the STDOUT. This schema file can be fed back into the **bq\nload** tool to create a table that is more compatible with the data fields in\nthe input dataset.\n\n\u003ca name=\"Installation\"\u003e\u003c/a\u003e\n## Installation\n\n**Prerequisite**: You need have Python 3.6 or higher.\n\nInstall from [PyPI](https://pypi.python.org/pypi) repository using `pip3`. There\nare too many ways to install packages in Python. The following are in order\nhighest to lowest recommendation:\n\n1) If you are using a virtual environment (such as\n[venv](https://docs.python.org/3/library/venv.html)), then use:\n```\n$ pip3 install bigquery_schema_generator\n```\n\n2) If you aren't using a virtual environment you can install into\nyour local Python directory:\n\n```\n$ pip3 install --user bigquery_schema_generator\n```\n\n3) If you want to install the package for your entire system globally, use\n```\n$ sudo -H pip3 install bigquery_schema_generator\n```\nbut realize that you will be running code from PyPI as `root` so this has\nsecurity implications.\n\nSometimes, your Python environment gets into a complete mess and the `pip3`\ncommand won't work. Try typing `python3 -m pip` instead.\n\nA successful install should print out something like the following (the version\nnumber may be different):\n```\nCollecting bigquery-schema-generator\nInstalling collected packages: bigquery-schema-generator\nSuccessfully installed bigquery-schema-generator-1.1\n```\n\nThe shell script `generate-schema` will be installed somewhere in your system,\ndepending on how your Python environment is configured. See below for\nsome notes for Ubuntu Linux and MacOS.\n\n\u003ca name=\"UbuntuLinux\"\u003e\u003c/a\u003e\n### Ubuntu Linux (18.04, 20.04, 22.04)\n\nAfter running `pip3 install bigquery_schema_generator`, the `generate-schema`\nscript may be installed in one the following locations:\n\n* `/usr/bin/generate-schema`\n* `/usr/local/bin/generate-schema`\n* `$HOME/.local/bin/generate-schema`\n* `$HOME/.virtualenvs/{your_virtual_env}/bin/generate-schema`\n\n\u003ca name=\"MacOS\"\u003e\u003c/a\u003e\n### MacOS\n\nI don't have any Macs which are able to run the latest macOS, and I don't use\nthem much for software development these days, but here are some notes on older\nversions of macOS in case they help.\n\n\u003ca name=\"MacOS12\"\u003e\u003c/a\u003e\n#### MacOS 12 (Monterey)\n\nPython 2 or 3 is not installed by default on Monterey. If you try to run\n`python3` on the command line, a dialog box asks you to install the\n[Xcode](https://developer.apple.com/support/xcode/) development package. It\napparently takes over an hour at 10 MB/s.\n\nYou can instead install Python 3 using\n[Homebrew](https://docs.brew.sh/Homebrew-and-Python), by installing `brew`, and\ntyping `$ brew install python`. Currently, it downloads Python 3.10 in about 1-2\nminutes and installs the `python3` and `pip3` binaries into\n`/usr/local/bin/python3` and `/usr/local/bin/pip3`. Using `brew` seems to be\neasiest option, so let's assume that Python 3 was installed through that.\n\nIf you run:\n```\n$ pip3 install bigquery_schema_generator\n```\nthe package will be installed at `/usr/local/lib/python3.10/site-packages/`, and\nthe `generate-schema` script will be installed at\n`/usr/local/bin/generate-schema`.\n\nIf you use the `--user` flag:\n```\n$ pip3 install --user bigquery_schema_generator\n```\nthe package will be installed at\n`$HOME/Library/Python/3.10/lib/python/site-packages/`, and the `generate-schema`\nscript will be installed at `$HOME/Library/Python/3.10/bin/generate-schema`.\n\nYou may need to add the `$HOME/Library/Python/3.10/bin` directory to your\n`$PATH` variable in your `$HOME/.bashrc` file.\n\n\u003ca name=\"MacOS11\"\u003e\u003c/a\u003e\n#### MacOS 11 (Big Sur)\n\nPython 2.7.16 is installed by default on Big Sur as `/usr/bin/python`. If you\ntry to run `python3` on the command line, a dialog box asks you to install\nthe [Xcode](https://developer.apple.com/support/xcode/) development package will\nbe installed, which I think installs Python 3.8 as `/usr/bin/python3` (I can't\nremember, it was installed a long time ago.)\n\nYou can instead install Python 3 using\n[Homebrew](https://docs.brew.sh/Homebrew-and-Python), by installing `brew`, and\ntyping `$ brew install python`. Currently, it downloads Python 3.10 in about 1-2\nminutes and installs the `python3` and `pip3` binaries into\n`/usr/local/bin/python3` and `/usr/local/bin/pip3`. Using `brew` seems to be\neasiest option, so let's assume that Python 3 was installed through that.\n\nIf you run:\n```\n$ pip3 install bigquery_schema_generator\n```\nthe package will be installed at `/usr/local/lib/python3.10/site-packages/`, and\nthe `generate-schema` script will be installed at\n`/usr/local/bin/generate-schema`.\n\nIf you use the `--user` flag:\n```\n$ pip3 install --user bigquery_schema_generator\n```\nthe package will be installed at\n`$HOME/Library/Python/3.10/lib/python/site-packages/`, and the `generate-schema`\nscript will be installed at `$HOME/Library/Python/3.10/bin/generate-schema`.\n\nYou may need to add the `$HOME/Library/Python/3.10/bin` directory to your\n`$PATH` variable in your `$HOME/.bashrc` file.\n\n\u003ca name=\"MacOS1014\"\u003e\u003c/a\u003e\n#### MacOS 10.14 (Mojave)\n\nThis MacOS version comes with Python 2.7 only. To install Python 3, you can\ninstall using:\n\n1)) Downloading the [macos installer directly from\n   Python.org](https://www.python.org/downloads/macos/).\n\nThe python3 binary will be located at `/usr/local/bin/python3`, and the\n`/usr/local/bin/pip3` is a symlink to\n`/Library/Frameworks/Python.framework/Versions/3.6/bin/pip3`.\n\nSo running\n\n```\n$ pip3 install --user bigquery_schema_generator\n```\n\nwill install `generate-schema` at\n`/Library/Frameworks/Python.framework/Versions/3.6/bin/generate-schema`.\n\nThe Python installer updates `$HOME/.bash_profile` to add\n`/Library/Frameworks/Python.framework/Versions/3.6/bin` to the `$PATH`\nenvironment variable. So you should be able to run the `generate-schema`\ncommand without typing in the full path.\n\n2)) Using [Homebrew](https://docs.brew.sh/Homebrew-and-Python).\n\nIn this environment, the `generate-schema` script will probably be installed in\n`/usr/local/bin` but I'm not completely certain.\n\n\u003ca name=\"Usage\"\u003e\u003c/a\u003e\n## Usage\n\n\u003ca name=\"CommandLine\"\u003e\u003c/a\u003e\n### Command Line\n\nThe `generate_schema.py` script accepts a newline-delimited JSON or\nCSV data file on the STDIN. JSON input format has been tested extensively.\nCSV input format was added more recently (in v0.4) using the `--input_format\ncsv` flag. The support is not as robust as JSON file. For example, CSV format\nsupports only the comma-separator, and does not support the pipe (`|`) or tab\n(`\\t`) character.\n\n**Side Note**: The `input_format` parameter now supports (v1.6.0) the\n`csvdictreader` option which allows using the\n[csv.DictReader](https://docs.python.org/3/library/csv.html) class that can be\ncustomized to handle different delimiters such as tabs. But this requires\ncreating a custom Python script using `bigquery_schema_generator` as a library.\nSee [SchemaGenerator.deduce_schema() from\ncsv.DictReader](#SchemaGeneratorDeduceSchemaFromCsvDictReader) section below. It\nis probably possible to enable this functionality through the command line\nscript, but it was not obvious how to expose the various options of\n`csv.DictReader` through the command line flags. I didn't spend any time on this\nproblem because this is not a feature that I use personally.)\n\nUnlike `bq load`, the `generate_schema.py` script reads every record in the\ninput data file to deduce the table's schema. It prints the JSON formatted\nschema file on the STDOUT.\n\nThere are at least 3 ways to run this script:\n\n**1) Shell script**\n\nIf you installed using `pip3`, then it should have installed a small helper\nscript named `generate-schema` in your local `./bin` directory of your current\nenvironment (depending on whether you are using a virtual environment).\n\n```\n$ generate-schema \u003c file.data.json \u003e file.schema.json\n```\n\n**2) Python module**\n\nYou can invoke the module directly using:\n```\n$ python3 -m bigquery_schema_generator.generate_schema \u003c file.data.json \u003e file.schema.json\n```\nThis is essentially what the `generate-schema` command does.\n\n**3) Python script**\n\nIf you retrieved this code from its\n[GitHub repository](https://github.com/bxparks/bigquery-schema-generator),\nthen you can invoke the Python script directly:\n```\n$ ./generate_schema.py \u003c file.data.json \u003e file.schema.json\n```\n\n\u003ca name=\"SchemaOutput\"\u003e\u003c/a\u003e\n### Using the Schema Output\n\nThe resulting schema file can be given to the **bq load** command using the\n`--schema` flag:\n```\n\n$ bq load --source_format NEWLINE_DELIMITED_JSON \\\n    --ignore_unknown_values \\\n    --schema file.schema.json \\\n    mydataset.mytable \\\n    file.data.json\n```\nwhere `mydataset.mytable` is the target table in BigQuery.\n\nFor debugging purposes, here is the equivalent `bq load` command using schema\nautodetection:\n\n```\n$ bq load --source_format NEWLINE_DELIMITED_JSON \\\n    --autodetect \\\n    mydataset.mytable \\\n    file.data.json\n```\n\nIf the input file is in CSV format, the first line will be the header line which\nenumerates the names of the columns. But this header line must be skipped when\nimporting the file into the BigQuery table. We accomplish this using\n`--skip_leading_rows` flag:\n```\n$ bq load --source_format CSV \\\n    --schema file.schema.json \\\n    --skip_leading_rows 1 \\\n    mydataset.mytable \\\n    file.data.csv\n```\n\nHere is the equivalent `bq load` command for CSV files using autodetection:\n```\n$ bq load --source_format CSV \\\n    --autodetect \\\n    mydataset.mytable \\\n    file.data.csv\n```\n\nA useful flag for `bq load`, particularly for JSON files,  is\n`--ignore_unknown_values`, which causes `bq load` to ignore fields in the input\ndata which are not defined in the schema. When `generate_schema.py` detects an\ninconsistency in the definition of a particular field in the input data, it\nremoves the field from the schema definition. Without the\n`--ignore_unknown_values`, the `bq load` fails when the inconsistent data record\nis read.\n\nAnother useful flag during development and debugging is `--replace` which\nreplaces any existing BigQuery table.\n\nAfter the BigQuery table is loaded, the schema can be retrieved using:\n\n```\n$ bq show --schema mydataset.mytable | python3 -m json.tool\n```\n\n(The `python -m json.tool` command will pretty-print the JSON formatted schema\nfile. An alternative is the [jq command](https://stedolan.github.io/jq/).)\nThe resulting schema file should be identical to `file.schema.json`.\n\n\u003ca name=\"FlagOptions\"\u003e\u003c/a\u003e\n### Command Line Flag Options\n\nThe `generate_schema.py` script supports a handful of command line flags\nas shown by the `--help` flag below.\n\n\u003ca name=\"Help\"\u003e\u003c/a\u003e\n#### Help (`--help`)\n\nPrint the built-in help strings:\n\n```bash\n$ generate-schema --help\nusage: generate-schema [-h] [--input_format INPUT_FORMAT] [--keep_nulls]\n                       [--quoted_values_are_strings] [--infer_mode]\n                       [--debugging_interval DEBUGGING_INTERVAL]\n                       [--debugging_map] [--sanitize_names]\n                       [--ignore_invalid_lines]\n                       [--existing_schema_path EXISTING_SCHEMA_PATH]\n                       [--preserve_input_sort_order]\n\nGenerate BigQuery schema from JSON or CSV file.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --input_format INPUT_FORMAT\n                        Specify an alternative input format ('csv', 'json',\n                        'dict')\n  --keep_nulls          Print the schema for null values, empty arrays or\n                        empty records\n  --quoted_values_are_strings\n                        Quoted values should be interpreted as strings\n  --infer_mode          Determine if mode can be 'NULLABLE' or 'REQUIRED'\n  --debugging_interval DEBUGGING_INTERVAL\n                        Number of lines between heartbeat debugging messages\n  --debugging_map       Print the metadata schema_map instead of the schema\n  --sanitize_names      Forces schema name to comply with BigQuery naming\n                        standard\n  --ignore_invalid_lines\n                        Ignore lines that cannot be parsed instead of stopping\n  --existing_schema_path EXISTING_SCHEMA_PATH\n                        File that contains the existing BigQuery schema for a\n                        table. This can be fetched with: `bq show --schema\n                        \u003cproject_id\u003e:\u003cdataset\u003e:\u003ctable_name\u003e\n  --preserve_input_sort_order\n                        Preserve the original ordering of columns from input\n                        instead of sorting alphabetically. This only impacts\n                        `input_format` of json or dict\n\n```\n\n\u003ca name=\"InputFormat\"\u003e\u003c/a\u003e\n#### Input Format (`--input_format`)\n\nSpecifies the format of the input file as a string. It must be one of `json`\n(default), `csv`, or `dict`:\n\n* `json`\n    * a \"file-like\" object containing newline-delimited JSON\n* `csv`\n    * a \"file-like\" object containing newline-delimited CSV\n* `dict`\n    * a `list` of Python `dict` objects corresponding to list of\n      newline-delimited JSON, in other words `List[Dict[str, Any]]`\n    * applies only if `SchemaGenerator` is used as a library through the\n      `run()` or `deduce_schema()` method\n    * useful if the input data (usually JSON) has already been read into memory\n      and parsed from newline-delimited JSON into native Python dict objects.\n\nIf `csv` file is specified, the `--keep_nulls` flag is automatically activated.\nThis is required because CSV columns are defined positionally, so the schema\nfile must contain all the columns specified by the CSV file, in the same\norder, even if the column contains an empty value for every record.\n\nSee [Issue #26](https://github.com/bxparks/bigquery-schema-generator/issues/26)\nfor implementation details.\n\n\u003ca name=\"KeepNulls\"\u003e\u003c/a\u003e\n#### Keep Nulls (`--keep_nulls`)\n\nNormally when the input data file contains a field which has a null, empty\narray or empty record as its value, the field is suppressed in the schema file.\nThis flag enables this field to be included in the schema file.\n\nIn other words, using a data file containing just nulls and empty values:\n```bash\n$ generate_schema\n{ \"s\": null, \"a\": [], \"m\": {} }\n^D\nINFO:root:Processed 1 lines\n[]\n```\n\nWith the `keep_nulls` flag, we get:\n```bash\n$ generate-schema --keep_nulls\n{ \"s\": null, \"a\": [], \"m\": {} }\n^D\nINFO:root:Processed 1 lines\n[\n  {\n    \"mode\": \"REPEATED\",\n    \"type\": \"STRING\",\n    \"name\": \"a\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"fields\": [\n      {\n        \"mode\": \"NULLABLE\",\n        \"type\": \"STRING\",\n        \"name\": \"__unknown__\"\n      }\n    ],\n    \"type\": \"RECORD\",\n    \"name\": \"d\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"type\": \"STRING\",\n    \"name\": \"s\"\n  }\n]\n```\n\n\u003ca name=\"QuotedValuesAreStrings\"\u003e\u003c/a\u003e\n#### Quoted Values Are Strings (`--quoted_values_are_strings`)\n\nBy default, quoted values are inspected to determine if they can be interpreted\nas `DATE`, `TIME`, `TIMESTAMP`, `BOOLEAN`, `INTEGER` or `FLOAT`. This is\nconsistent with the algorithm used by `bq load`. However, for the `BOOLEAN`,\n`INTEGER`, or `FLOAT` types, it is sometimes more useful to interpret those as\nnormal strings instead. This flag disables type inference for `BOOLEAN`,\n`INTEGER` and `FLOAT` types inside quoted strings.\n\n```bash\n$ generate-schema\n{ \"name\": \"1\" }\n^D\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"name\",\n    \"type\": \"INTEGER\"\n  }\n]\n\n$ generate-schema --quoted_values_are_strings\n{ \"name\": \"1\" }\n^D\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"name\",\n    \"type\": \"STRING\"\n  }\n]\n```\n\n\u003ca name=\"InferMode\"\u003e\u003c/a\u003e\n#### Infer Mode (`--infer_mode`)\n\nSet the schema `mode` of a field to `REQUIRED` instead of the default\n`NULLABLE` if the field contains a non-null or non-empty value for every data\nrecord in the input file. This option is available only for CSV\n(`--input_format csv`) files. It is theoretically possible to implement this\nfeature for JSON files, but too difficult to implement in practice because\nfields are often completely missing from a given JSON record (instead of\nexplicitly being defined to be `null`).\n\nIn addition to the above, this option, when used in conjunction with\n`--existing_schema_map`, will allow fields to be relaxed from REQUIRED to\nNULLABLE if they were REQUIRED in the existing schema and NULL rows are found in\nthe new data we are inferring a schema from. In this case it can be used with\neither input_format, CSV or JSON.\n\nSee [Issue #28](https://github.com/bxparks/bigquery-schema-generator/issues/28)\nfor implementation details.\n\n\u003ca name=\"DebuggingInterval\"\u003e\u003c/a\u003e\n#### Debugging Interval (`--debugging_interval`)\n\nBy default, the `generate_schema.py` script prints a short progress message\nevery 1000 lines of input data. This interval can be changed using the\n`--debugging_interval` flag.\n\n```bash\n$ generate-schema --debugging_interval 50 \u003c file.data.json \u003e file.schema.json\n```\n\n\u003ca name=\"DebuggingMap\"\u003e\u003c/a\u003e\n#### Debugging Map (`--debugging_map`)\n\nInstead of printing out the BigQuery schema, the `--debugging_map` prints out\nthe bookkeeping metadata map which is used internally to keep track of the\nvarious fields and their types that were inferred using the data file. This\nflag is intended to be used for debugging.\n\n```bash\n$ generate-schema --debugging_map \u003c file.data.json \u003e file.schema.json\n```\n\n\u003ca name=\"SanitizedNames\"\u003e\u003c/a\u003e\n#### Sanitize Names (`--sanitize_names`)\n\nBigQuery column names are [restricted to certain characters and\nlength](https://cloud.google.com/bigquery/docs/schemas#column_names):\n* it must contain only letters (a-z, A-Z), numbers (0-9), or underscores\n* it must start with a letter or underscore\n* the maximum length is 128 characters\n* column names are case-insensitive\n\nFor CSV files, the `bq load` command seems to automatically convert invalid\ncolumn names into valid column names. This flag attempts to perform some of the\nsame transformations, to avoid having to scan through the input data twice to\ngenerate the schema file. The transformations are:\n\n* any character outside of ASCII letters, numbers and underscore\n  (`[a-zA-Z0-9_]`) are converted to an underscore. For example `go\u00262#there!` is\n  converted to `go_2_there_`;\n* names longer than 128 characters are truncated to 128.\n\nMy recollection is that the `bq load` command does *not* normalize the JSON key\nnames. Instead it prints an error message. So the `--sanitize_names` flag is\nuseful mostly for CSV files. For JSON files, you'll have to do a second pass\nthrough the data files to cleanup the column names anyway. See\n[Issue #14](https://github.com/bxparks/bigquery-schema-generator/issues/14) and\n[Issue #33](https://github.com/bxparks/bigquery-schema-generator/issues/33).\n\n\u003ca name=\"IgnoreInvalidLines\"\u003e\u003c/a\u003e\n#### Ignore Invalid Lines (`--ignore_invalid_lines`)\n\nBy default, if an error is encountered on a particular line, processing stops\nimmediately with an exception. This flag causes invalid lines to be ignored and\nprocessing continues. A list of all errors and their line numbers will be\nprinted on the STDERR after processing the entire file.\n\nThis flag is currently most useful for JSON files, to ignore lines which do not\nparse correctly as a JSON object.\n\nThis flag is probably not useful for CSV files. CSV files are processed by the\n`DictReader` class which performs its own line processing internally, including\nextracting the column names from the first line of the file. If the `DictReader`\ndoes throw an exception on a given line, we would not be able to catch it and\ncontinue processing. Fortunately, CSV files are fairly robust, and the schema\ndeduction logic will handle any missing or extra columns gracefully.\n\nFixes\n[Issue #49](https://github.com/bxparks/bigquery-schema-generator/issues/49).\n\n\u003ca name=\"ExistingSchemaPath\"\u003e\u003c/a\u003e\n#### Existing Schema Path (`--existing_schema_path`)\n\nThere are cases where we would like to start from an existing BigQuery table\nschema rather than starting from scratch with a new batch of data we would like\nto load. In this case we can specify the path to a local file on disk that is\nour existing bigquery table schema. This can be generated via the following `bq\nshow --schema` command:\n```bash\nbq show --schema \u003cPROJECT_ID\u003e:\u003cDATASET_NAME\u003e.\u003cTABLE_NAME\u003e \u003e existing_table_schema.json\n```\n\nWe can then run generate-schema with the additional option\n```bash\n--existing_schema_path existing_table_schema.json\n```\n\nThere is some subtle interaction between the `--existing_schema_path` and fields\nwhich are marked with a `mode` of `REQUIRED` in the existing schema. If the new\ndata contains a `null` value (either in a CSV or JSON data file), it is not\nclear if the schema should be changed to `mode=NULLABLE` or whether the new data\nshould be ignored and the schema should remain `mode=REQUIRED`. The choice is\ndetermined by overloading the `--infer_mode` flag:\n\n* If `--infer_mode` is given, the new schema will be allowed to revert back to\n  `NULLABLE`.\n* If `--infer_mode` is not given, the offending new record will be ignored\n  and the new schema will remain `REQUIRED`.\n\nSee discussion in\n[PR #57](https://github.com/bxparks/bigquery-schema-generator/pull/57) for\nmore details.\n\n\u003ca name=\"PreserveInputSortOrder\"\u003e\u003c/a\u003e\n#### Preserve Input Sort Order (`--preserve_input_sort_order`)\n\nBy default, the order of columns in the BQ schema file is sorted\nlexicographically, which matched the original behavior of `bq load\n--autodetect`. If the `--preserve_input_sort_order` flag is given, the columns\nin the resulting schema file is not sorted, but preserves the order of\nappearance in the input JSON data. For example, the following JSON data with\nthe `--preserve_input_sort_order` flag will produce:\n\n```bash\n$ generate-schema --preserve_input_sort_order\n{ \"s\": \"string\", \"i\": 3, \"x\": 3.2, \"b\": true }\n^D\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"s\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"i\",\n    \"type\": \"INTEGER\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"x\",\n    \"type\": \"FLOAT\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"b\",\n    \"type\": \"BOOLEAN\"\n  }\n]\n```\n\nIt is possible that each JSON record line contains only a partial subset of the\ntotal possible columns in the data set. The order of the columns in the BQ\nschema will then be the order that each column was first *seen* by the\nscript:\n\n```bash\n$ generate-schema --preserve_input_sort_order\n{ \"s\": \"string\", \"i\": 3 }\n{ \"x\": 3.2, \"s\": \"string\", \"i\": 3 }\n{ \"b\": true, \"x\": 3.2, \"s\": \"string\", \"i\": 3 }\n^D\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"s\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"i\",\n    \"type\": \"INTEGER\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"x\",\n    \"type\": \"FLOAT\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"b\",\n    \"type\": \"BOOLEAN\"\n  }\n]\n```\n\n**Note**: In Python 3.6 (the earliest version of Python supported by this\nproject), the order of keys in a `dict` was the insertion-order, but this\nordering was an implementation detail, and not guaranteed. In Python 3.7, that\nordering was made permanent. So the `--preserve_input_sort_order` flag\n**should** work in Python 3.6 but is not guaranteed.\n\nSee discussion in\n[PR #75](https://github.com/bxparks/bigquery-schema-generator/pull/75) for\nmore details.\n\n\u003ca name=\"UsingAsLibrary\"\u003e\u003c/a\u003e\n### Using As a Library\n\nThe `SchemaGenerator` class can be used programmatically as a library from a\nlarger Python application.\n\n\u003ca name=\"SchemaGeneratorRun\"\u003e\u003c/a\u003e\n#### `SchemaGenerator.run()`\n\nThe `bigquery_schema_generator` module can be used as a library by an external\nPython client code by creating an instance of `SchemaGenerator` and calling the\n`run(input, output)` method:\n\n```python\nfrom bigquery_schema_generator.generate_schema import SchemaGenerator\n\ngenerator = SchemaGenerator(\n    input_format=input_format,\n    infer_mode=infer_mode,\n    keep_nulls=keep_nulls,\n    quoted_values_are_strings=quoted_values_are_strings,\n    debugging_interval=debugging_interval,\n    debugging_map=debugging_map,\n    sanitize_names=sanitize_names,\n    ignore_invalid_lines=ignore_invalid_lines,\n    preserve_input_sort_order=preserve_input_sort_order,\n)\n\nFILENAME = \"...\"\n\nwith open(FILENAME) as input_file:\n    generator.run(input_file=input_file, output_file=output_file)\n```\n\nThe `input_format` is one of `json`, `csv`, and `dict` as described in the\n[Input Format](#InputFormat) section above. The `input_file` must match the\nformat given by this parameter.\n\nSee [generatorrun.py](examples/generatorrun.py) for an example.\n\n\u003ca name=\"SchemaGeneratorDeduceSchemaFromFile\"\u003e\u003c/a\u003e\n#### `SchemaGenerator.deduce_schema()` from File\n\nIf you need to process the generated schema programmatically, create an instance\nof `SchemaGenerator` using the appropriate `input_format` option, use the\n`deduce_schema()` method to read in the file, then postprocess the resulting\n`schema_map` and `error_log` data structures.\n\nThe following reads in a JSON file (see [jsoneader.py](examples/jsoneader.py)):\n\n```python\nimport json\nimport logging\nimport sys\nfrom bigquery_schema_generator.generate_schema import SchemaGenerator\n\nFILENAME = \"jsonfile.json\"\n\ngenerator = SchemaGenerator(\n    input_format='json',\n    quoted_values_are_strings=True,\n)\n\nwith open(FILENAME) as file:\n    schema_map, errors = generator.deduce_schema(file)\n\nfor error in errors:\n    logging.info(\"Problem on line %s: %s\", error['line_number'], error['msg'])\n\nschema = generator.flatten_schema(schema_map)\njson.dump(schema, sys.stdout, indent=2)\nprint()\n```\n\nThe following reads a CSV file (see [csvreader.py](examples/csvreader.py)):\n\n```python\n...(same as above)...\n\ngenerator = SchemaGenerator(\n    input_format='csv',\n    infer_mode=True,\n    quoted_values_are_strings=True,\n    sanitize_names=True,\n)\n\nwith open(FILENAME) as file:\n    schema_map, errors = generator.deduce_schema(file)\n\n...(same as above)...\n```\n\nThe `deduce_schema()` also supports starting from an existing `schema_map`\ninstead of starting from scratch. This is the internal version of the\n`--existing_schema_path` functionality.\n\n```python\nschema_map1, errors = generator.deduce_schema(input_data=data1)\nschema_map2, errors = generator.deduce_schema(\n    input_data=data1, schema_map=schema_map1\n)\n```\n\nThe `input_data` must match the `input_format` given in the constructor. The\nformat is described in the [Input Format](#InputFormat) section above.\n\n\u003ca name=\"SchemaGeneratorDeduceSchemaFromDict\"\u003e\u003c/a\u003e\n#### `SchemaGenerator.deduce_schema()` from Iterable of Dict\n\nIf the JSON data set has already been read into memory into an array or iterable\nof Python `dict` objects, the `SchemaGenerator` can process that too using the\n`input_format='dict'` option. Here is an example from\n[dictreader.py](examples/dictreader.py):\n\n\n```Python\nimport json\nimport logging\nimport sys\nfrom bigquery_schema_generator.generate_schema import SchemaGenerator\n\ngenerator = SchemaGenerator(input_format='dict')\ninput_data = [\n    {\n        's': 'string',\n        'b': True,\n    },\n    {\n        'd': '2021-08-18',\n        'x': 3.1\n    },\n]\nschema_map, errors = generator.deduce_schema(input_data)\nschema = generator.flatten_schema(schema_map)\njson.dump(schema, sys.stdout, indent=2)\nprint()\n```\n\n**Note**: The `input_format='dict'` option supports any `input_data` object\nwhich acts like an iterable of `dict`. The data does not have to be loaded into\nmemory.\n\n\u003ca name=\"SchemaGeneratorDeduceSchemaFromCsvDictReader\"\u003e\u003c/a\u003e\n#### `SchemaGenerator.deduce_schema()` from csv.DictReader\n\nThe `input_format='csvdictreader'` option is similar to `input_format='dict'`\nbut sort of acts like `input_format='csv'`. It supports any object that behaves\nlike an iterable of `dict`, but it is intended to be used with the\n[csv.DictReader](https://docs.python.org/3/library/csv.html) object.\n\nThe difference between `'dict'` and `'csvdictreader'` is the assumption made\nabout the shape of the data. The `'csvdictreader'` option assumes that the data\nis tabular like a CSV file, with every row usually containing an entry for every\ncolumn. The `'dict'` option does not make that assumption, and the data can be\nmore hierarchical with some rows containing partial sets of columns.\n\nThis semantic difference means that `'csvdictreader'` supports options which\napply to `'csv'` files. In particular, the `infer_mode=True` option can be used\nto determine if the `mode` field can be `REQUIRED` instead of `NULLABLE` if the\nscript finds that all columns are defined in every row.\n\nHere is an example from [tsvreader.py](examples/tsvreader.py) which reads a\ntab-separate file (TSV):\n\n```python\nimport csv\nimport json\nimport sys\nfrom bigquery_schema_generator.generate_schema import SchemaGenerator\n\nFILENAME = \"tsvfile.tsv\"\n\ngenerator = SchemaGenerator(input_format='dict')\nwith open(FILENAME) as file:\n    reader = csv.DictReader(file, delimiter='\\t')\n    schema_map, errors = generator.deduce_schema(reader)\n\nschema = generator.flatten_schema(schema_map)\njson.dump(schema, sys.stdout, indent=2)\nprint()\n```\n\n\u003ca name=\"SchemaTypes\"\u003e\u003c/a\u003e\n## Schema Types\n\n\u003ca name=\"SupportedTypes\"\u003e\u003c/a\u003e\n### Supported Types\n\nThe `bq show --schema` command produces a JSON schema file that uses the\nolder [Legacy SQL date types](https://cloud.google.com/bigquery/data-types).\nFor compatibility, **generate-schema** script will also generate a schema file\nusing the legacy data types.\n\nThe supported types are:\n\n* `BOOLEAN`\n* `INTEGER`\n* `FLOAT`\n* `STRING`\n* `TIMESTAMP`\n* `DATE`\n* `TIME`\n* `RECORD`\n\nThe `generate-schema` script supports both `NULLABLE` and `REPEATED` modes of\nall of the above types.\n\nThe supported format of `TIMESTAMP` is as close as practical to the\n[bq load format](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp-type):\n```\nYYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]\n```\nwhich appears to be an extension of the\n[ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601).\nThe difference from `bq load` is that the `[time zone]` component can be only\n* `Z`\n* `UTC` (same as `Z`)\n* `(+|-)H[H][:M[M]]`\n\nNote that BigQuery supports up to 6 decimal places after the integer 'second'\ncomponent. `generate-schema` follows the same restriction for compatibility. If\nyour input file contains more than 6 decimal places, you need to write a data\ncleansing filter to fix this.\n\nThe suffix `UTC` is not standard ISO 8601 nor\n[documented by Google](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time-zones)\nbut the `UTC` suffix is used by `bq extract` and the web interface. (See\n[Issue 19](https://github.com/bxparks/bigquery-schema-generator/issues/19).)\n\nTimezone names from the [tz database](http://www.iana.org/time-zones) (e.g.\n\"America/Los_Angeles\") are _not_ supported by `generate-schema`.\n\nThe following types are _not_ supported at all:\n\n* `BYTES`\n* `DATETIME` (unable to distinguish from `TIMESTAMP`)\n\n\u003ca name=\"TypeInference\"\u003e\u003c/a\u003e\n### Type Inference Rules\n\nThe `generate-schema` script attempts to emulate the various type conversion and\ncompatibility rules implemented by **bq load**:\n\n* `INTEGER` can upgrade to `FLOAT`\n    * if a field in an early record is an `INTEGER`, but a subsequent record\n      shows this field to have a `FLOAT` value, the type of the field will be\n      upgraded to a `FLOAT`\n    * the reverse does not happen, once a field is a `FLOAT`, it will remain a\n      `FLOAT`\n* conflicting `TIME`, `DATE`, `TIMESTAMP` types upgrades to `STRING`\n    * if a field is determined to have one type of \"time\" in one record, then\n      subsequently a different \"time\" type, then the field will be assigned a\n      `STRING` type\n* `NULLABLE RECORD` can upgrade to a `REPEATED RECORD`\n    * a field may be defined as `RECORD` (aka \"Struct\") type with `{ ... }`\n    * if the field is subsequently read as an array with a `[{ ... }]`, the\n      field is upgraded to a `REPEATED RECORD`\n* a primitive type (`FLOAT`, `INTEGER`, `STRING`) cannot upgrade to a `REPEATED`\n  primitive type\n    * there's no technical reason why this cannot be allowed, but **bq load**\n      does not support it, so we follow its behavior\n* a `DATETIME` field is always inferred to be a `TIMESTAMP`\n    * the format of these two fields is identical (in the absence of timezone)\n    * we follow the same logic as **bq load** and always infer these as\n      `TIMESTAMP`\n* `BOOLEAN`, `INTEGER`, and `FLOAT` can appear inside quoted strings\n    * In other words, `\"true\"` (or `\"True\"` or `\"false\"`, etc) is considered a\n      BOOLEAN type, `\"1\"` is considered an INTEGER type, and `\"2.1\"` is\n      considered a FLOAT type. Luigi Mori (jtschichold@) added additional logic\n      to replicate the type conversion logic used by `bq load` for these\n      strings.\n    * This type inference inside quoted strings can be disabled using the\n      `--quoted_values_are_strings` flag\n    * (See [Issue #22](https://github.com/bxparks/bigquery-schema-generator/issues/22) for more details.)\n* `INTEGER` values overflowing a 64-bit signed integer upgrade to `FLOAT`\n    * integers greater than `2^63-1` (9223372036854775807)\n    * integers less than `-2^63` (-9223372036854775808)\n    * (See [Issue #18](https://github.com/bxparks/bigquery-schema-generator/issues/18) for more details)\n\n\u003ca name=\"Examples\"\u003e\u003c/a\u003e\n## Examples\n\nHere is an example of a single JSON data record on the STDIN (the `^D` below\nmeans typing Control-D, which indicates \"end of file\" under Linux and MacOS):\n\n```bash\n$ generate-schema\n{ \"s\": \"string\", \"b\": true, \"i\": 1, \"x\": 3.1, \"t\": \"2017-05-22T17:10:00-07:00\" }\n^D\nINFO:root:Processed 1 lines\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"b\",\n    \"type\": \"BOOLEAN\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"i\",\n    \"type\": \"INTEGER\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"s\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"t\",\n    \"type\": \"TIMESTAMP\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"x\",\n    \"type\": \"FLOAT\"\n  }\n]\n```\n\nIn most cases, the data file will be stored in a file:\n```bash\n$ cat \u003e file.data.json\n{ \"a\": [1, 2] }\n{ \"i\": 3 }\n^D\n\n$ generate-schema \u003c file.data.json \u003e file.schema.json\nINFO:root:Processed 2 lines\n\n$ cat file.schema.json\n[\n  {\n    \"mode\": \"REPEATED\",\n    \"name\": \"a\",\n    \"type\": \"INTEGER\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"i\",\n    \"type\": \"INTEGER\"\n  }\n]\n```\n\nHere is the schema generated from a CSV input file. The first line is the header\ncontaining the names of the columns, and the schema lists the columns in the\nsame order as the header:\n```bash\n$ generate-schema --input_format csv\ne,b,c,d,a\n1,x,true,,2.0\n2,x,,,4\n3,,,,\n^D\nINFO:root:Processed 3 lines\n[\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"e\",\n    \"type\": \"INTEGER\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"b\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"c\",\n    \"type\": \"BOOLEAN\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"d\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"a\",\n    \"type\": \"FLOAT\"\n  }\n]\n```\n\nHere is an example of the schema generated with the `--infer_mode` flag:\n```bash\n$ generate-schema --input_format csv --infer_mode\nname,surname,age\nJohn\nMichael,,\nMaria,Smith,30\nJoanna,Anders,21\n^D\nINFO:root:Processed 4 lines\n[\n  {\n    \"mode\": \"REQUIRED\",\n    \"name\": \"name\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"surname\",\n    \"type\": \"STRING\"\n  },\n  {\n    \"mode\": \"NULLABLE\",\n    \"name\": \"age\",\n    \"type\": \"INTEGER\"\n  }\n]\n```\n\n\u003ca name=\"Benchmarks\"\u003e\u003c/a\u003e\n## Benchmarks\n\nI wrote the `bigquery_schema_generator/anonymize.py` script to create an\nanonymized data file `tests/testdata/anon1.data.json.gz`:\n```bash\n$ ./bigquery_schema_generator/anonymize.py \u003c original.data.json \\\n    \u003e anon1.data.json\n$ gzip anon1.data.json\n```\nThis data file is 290MB (5.6MB compressed) with 103080 data records.\n\nGenerating the schema using\n```bash\n$ bigquery_schema_generator/generate_schema.py \u003c anon1.data.json \\\n    \u003e anon1.schema.json\n```\ntook 67s on a Dell Precision M4700 laptop with an Intel Core i7-3840QM CPU @\n2.80GHz, 32GB of RAM, Ubuntu Linux 18.04, Python 3.6.7.\n\n\u003ca name=\"SystemRequirements\"\u003e\u003c/a\u003e\n## System Requirements\n\nThis project was initially developed on Ubuntu 17.04 using Python 3.5.3, but it\nnow requires Python 3.6 or higher, I think mostly due to the use of f-strings.\n\nI have tested it on:\n\n* Ubuntu 22.04, Python 3.10.6\n* Ubuntu 20.04, Python 3.8.5\n* Ubuntu 18.04, Python 3.7.7\n* Ubuntu 18.04, Python 3.6.7\n* Ubuntu 17.10, Python 3.6.3\n* MacOS 12.6.2 (Monterey), Python 3.10.9\n* MacOS 11.7.2 (Big Sur), Python 3.10.9\n* MacOS 11.7.2 (Big Sur), Python 3.8.9\n* MacOS 10.14.2 (Mojave), Python 3.6.4\n* MacOS 10.13.2 (High Sierra), Python 3.6.4\n\nThe GitHub Actions continuous integration pipeline validates on Python 3.7,\n3.8, 3.9, and 3.10.\n\nThe unit tests are invoked with `$ make tests` target, and depends only on the\nbuilt-in Python `unittest` package.\n\nThe coding style check is invoked using `$ make flake8` and depends on the\n`flake8` package. It can be installed using `$ pip3 install --user flake8`.\n\n\u003ca name=\"License\"\u003e\u003c/a\u003e\n## License\n\nApache License 2.0\n\n\u003ca name=\"Feedback\"\u003e\u003c/a\u003e\n## Feedback and Support\n\nIf you have any questions, comments, or feature requests for this library,\nplease use the [GitHub\nDiscussions](https://github.com/bxparks/bigquery-schema-generator/discussions)\nfor this project. If you have bug reports, please file a ticket in [GitHub\nIssues](https://github.com/bxparks/bigquery-schema-generator/issues). Feature\nrequests should go into Discussions first because they often have alternative\nsolutions which are useful to remain visible, instead of disappearing from the\ndefault view of the Issue tracker after the ticket is closed.\n\nPlease refrain from emailing me directly unless the content is sensitive. The\nproblem with email is that I cannot reference the email conversation when other\npeople ask similar questions later.\n\n\u003ca name=\"Authors\"\u003e\u003c/a\u003e\n## Authors\n\n* Created by Brian T. Park (brian@xparks.net).\n* Type inference inside quoted strings by Luigi Mori (jtschichold@).\n* Flag to disable type inference inside quoted strings by Daniel Ecer\n  (de-code@).\n* Support for CSV files and detection of `REQUIRED` fields by Sandor Korotkevics\n  (korotkevics@).\n* Better support for using `bigquery_schema_generator` as a library from an\n  external Python code by StefanoG_ITA (StefanoGITA@).\n* Sanitizing of column names to valid BigQuery characters and length by Jon\n  Warghed (jonwarghed@).\n* Bug fix in `--sanitize_names` by Riccardo M. Cefala (riccardomc@).\n* Print full path of nested JSON elements in error messages, by Austin Brogle\n  (abroglesc@).\n* Allow an existing schema file to be specified using `--existing_schema_path`,\n  by Austin Brogle (abroglesc@) and Bozo Dragojevic (bozzzzo@).\n* Allow `SchemaGenerator.deduce_schema()` to accept a list of native Python\n  `dict` objects, by Zigfrid Zvezdin (ZiggerZZ@).\n* Make the column order in the BQ schema file match the order of appearance in\n  the JSON data file using the `--preserve_input_sort_order` flag. By Kevin\n  Deggelman (kdeggelman@).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbxparks%2Fbigquery-schema-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbxparks%2Fbigquery-schema-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbxparks%2Fbigquery-schema-generator/lists"}