{"id":15060682,"url":"https://github.com/tylertreat/bigquery-python","last_synced_at":"2025-05-15T21:05:01.362Z","repository":{"id":14736673,"uuid":"17457558","full_name":"tylertreat/BigQuery-Python","owner":"tylertreat","description":"Simple Python client for interacting with Google BigQuery.","archived":false,"fork":false,"pushed_at":"2021-11-24T20:45:46.000Z","size":1185,"stargazers_count":458,"open_issues_count":23,"forks_count":176,"subscribers_count":33,"default_branch":"master","last_synced_at":"2025-05-10T10:48:05.550Z","etag":null,"topics":["bigquery","google-bigquery","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"broadinstitute/cromwell","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tylertreat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-03-05T22:26:49.000Z","updated_at":"2025-04-30T16:50:05.000Z","dependencies_parsed_at":"2022-08-31T13:01:46.715Z","dependency_job_id":null,"html_url":"https://github.com/tylertreat/BigQuery-Python","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylertreat%2FBigQuery-Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylertreat%2FBigQuery-Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylertreat%2FBigQuery-Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylertreat%2FBigQuery-Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tylertreat","download_url":"https://codeload.github.com/tylertreat/BigQuery-Python/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254422756,"owners_count":22068678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","google-bigquery","python"],"created_at":"2024-09-24T23:02:43.568Z","updated_at":"2025-05-15T21:05:01.336Z","avatar_url":"https://github.com/tylertreat.png","language":"Python","readme":"BigQuery-Python\n===============\n\u003ca href=\"https://travis-ci.org/tylertreat/BigQuery-Python\"\u003e\u003cimg align=\"right\" src=\"https://travis-ci.org/tylertreat/BigQuery-Python.svg\"\u003e\u003c/a\u003e\nSimple Python client for interacting with Google BigQuery.\n\nThis client provides an API for retrieving and inserting BigQuery data by wrapping Google's low-level API client library. It also provides facilities that make it convenient to access data that is tied to an App Engine appspot, such as request logs.\n\n[Documentation](http://tylertreat.github.io/BigQuery-Python/)\n\n# Installation\n\n`pip install bigquery-python`\n\n# Basic Usage\n\n```python\nfrom bigquery import get_client\n\n# BigQuery project id as listed in the Google Developers Console.\nproject_id = 'project_id'\n\n# Service account email address as listed in the Google Developers Console.\nservice_account = 'my_id_123@developer.gserviceaccount.com'\n\n# PKCS12 or PEM key provided by Google.\nkey = 'key.pem'\n\nclient = get_client(project_id, service_account=service_account,\n                    private_key_file=key, readonly=True)\n\n# JSON key provided by Google\njson_key = 'key.json'\n \nclient = get_client(json_key_file=json_key, readonly=True)\n\n# Submit an async query.\njob_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')\n\n# Check if the query has finished running.\ncomplete, row_count = client.check_job(job_id)\n\n# Retrieve the results.\nresults = client.get_query_rows(job_id)\n```\n\n# Executing Queries\n\nThe BigQuery client allows you to execute raw queries against a dataset. The `query` method inserts a query job into BigQuery. By default, `query` method runs asynchronously with `0` for `timeout`. When a non-zero timeout value is specified, the job will wait for the results, and throws an exception on timeout.\n\nWhen you run an async query, you can use the returned `job_id` to poll for job status later with `check_job`.\n\n```python\n# Submit an async query\njob_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')\n\n# Do other stuffs\n\n# Poll for query completion.\ncomplete, row_count = client.check_job(job_id)\n\n# Retrieve the results.\nif complete:\n    results = client.get_query_rows(job_id)\n```\n\nYou can also specify a non-zero timeout value if you want your query to be synchronous.\n\n```python\n# Submit a synchronous query\ntry:\n    _job_id, results = client.query('SELECT * FROM dataset.my_table LIMIT 1000', timeout=10)\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n```\n\n## Query Builder\n\nThe `query_builder` module provides an API for generating query strings that can be run using the BigQuery client.\n\n```python\nfrom bigquery.query_builder import render_query\n\nselects = {\n    'start_time': {\n        'alias': 'Timestamp',\n        'format': 'INTEGER-FORMAT_UTC_USEC'\n    }\n}\n\nconditions = [\n    {\n        'field': 'Timestamp',\n        'type': 'INTEGER',\n        'comparators': [\n            {\n                'condition': '\u003e=',\n                'negate': False,\n                'value': 1399478981\n            }\n        ]\n    }\n]\n\ngrouping = ['Timestamp']\n\nhaving = [\n    {\n        'field': 'Timestamp',\n        'type': 'INTEGER',\n        'comparators': [\n            {\n                'condition': '==',\n                'negate': False,\n                'value': 1399478981\n            }\n        ]\n    }\n]\n\norder_by ={'fields': ['Timestamp'], 'direction': 'desc'}\n\nquery = render_query(\n    'dataset',\n    ['table'],\n    select=selects,\n    conditions=conditions,\n    groupings=grouping,\n    having=having,\n    order_by=order_by,\n    limit=47\n)\n\njob_id, _ = client.query(query)\n```\n\n# Managing Tables\n\nThe BigQuery client provides facilities to manage dataset tables, including creating, deleting, checking the existence, and getting the metadata of tables.\n\n```python\n# Create a new table.\nschema = [\n    {'name': 'foo', 'type': 'STRING', 'mode': 'nullable'},\n    {'name': 'bar', 'type': 'FLOAT', 'mode': 'nullable'}\n]\ncreated = client.create_table('dataset', 'my_table', schema)\n\n# Delete an existing table.\ndeleted = client.delete_table('dataset', 'my_table')\n\n# Check if a table exists.\nexists = client.check_table('dataset', 'my_table')\n\n# Get a table's full metadata. Includes numRows, numBytes, etc. \n# See: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables\nmetadata = client.get_table('dataset', 'my_table')\n```\n\nThere is also functionality for retrieving tables that are associated with a Google App Engine appspot, assuming table names are in the form of appid_YYYY_MM or YYYY_MM_appid. This allows tables between a date range to be selected and queried on.\n\n```python\n# Get appspot tables falling within a start and end time.\nfrom datetime import datetime, timedelta\nrange_end = datetime.utcnow()\nrange_start = range_end - timedelta(weeks=12)\ntables = client.get_tables('dataset', 'appid', range_start, range_end)\n```\n\n# Inserting Data\n\nThe client provides an API for inserting data into a BigQuery table. The last parameter refers to an optional insert id key used to avoid duplicate entries.\n\n```python\n# Insert data into table.\nrows =  [\n    {'one': 'ein', 'two': 'zwei'},\n    {'id': 'NzAzYmRiY', 'one': 'uno', 'two': 'dos'},\n    {'id': 'NzAzYmRiY', 'one': 'ein', 'two': 'zwei'} # duplicate entry\n]\n\ninserted = client.push_rows('dataset', 'table', rows, 'id')\n```\n\n# Write Query Results to Table\nYou can write query results directly to table. When either dataset or table parameter is omitted, query result will be written to temporary table.\n```python\n# write to permanent table\njob = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100',\n                            'dataset',\n                            'table')\ntry:\n    job_resource = client.wait_for_job(job, timeout=60)\n    print job_resource\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n\n# write to permanent table with UDF in query string\nexternal_udf_uris = [\"gs://bigquery-sandbox-udf/url_decode.js\"]\nquery = \"\"\"SELECT requests, title\n            FROM\n              urlDecode(\n                SELECT\n                  title, sum(requests) AS num_requests\n                FROM\n                  [fh-bigquery:wikipedia.pagecounts_201504]\n                WHERE language = 'fr'\n                GROUP EACH BY title\n              )\n            WHERE title LIKE '%ç%'\n            ORDER BY requests DESC\n            LIMIT 100\n        \"\"\"\njob = client.write_to_table(\n  query,\n  'dataset',\n  'table',\n  external_udf_uris=external_udf_uris\n)\n\ntry:\n    job_resource = client.wait_for_job(job, timeout=60)\n    print job_resource\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n\n# write to temporary table\njob = client.write_to_table('SELECT * FROM dataset.original_table LIMIT 100')\ntry:\n    job_resource = client.wait_for_job(job, timeout=60)\n    print job_resource\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n\n\n```\n\n# Import data from Google cloud storage\n```python\nschema = [ {\"name\": \"username\", \"type\": \"string\", \"mode\": \"nullable\"} ]\njob = client.import_data_from_uris( ['gs://mybucket/mydata.json'],\n                                    'dataset',\n                                    'table',\n                                    schema,\n                                    source_format=JOB_SOURCE_FORMAT_JSON)\n\ntry:\n    job_resource = client.wait_for_job(job, timeout=60)\n    print job_resource\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n```\n\n# Export data to Google cloud storage\n```python\njob = client.export_data_to_uris( ['gs://mybucket/mydata.json'],\n                                   'dataset',\n                                   'table')\ntry:\n    job_resource = client.wait_for_job(job, timeout=60)\n    print job_resource\nexcept BigQueryTimeoutException:\n    print \"Timeout\"\n```\n\n# Managing Datasets\n\nThe client provides an API for listing, creating, deleting, updating and patching datasets.\n\n```python\n# List datasets\ndatasets = client.get_datasets()\n\n\n# Create dataset\ndataset = client.create_dataset('mydataset', friendly_name=\"My Dataset\", description=\"A dataset created by me\")\n\n# Get dataset\nclient.get_dataset('mydataset')\n\n# Delete dataset\nclient.delete_dataset('mydataset')\nclient.delete_dataset('mydataset', delete_contents=True) # delete even if it contains data\n\n# Update dataset\nclient.update_dataset('mydataset', friendly_name=\"mon Dataset\") # description is deleted\n\n# Patch dataset\nclient.patch_dataset('mydataset', friendly_name=\"mon Dataset\") # friendly_name changed; description is preserved\n\n# Check if dataset exists.\nexists = client.check_dataset('mydataset')\n```\n\n# Creating a schema from a sample record\n```python\nfrom bigquery import schema_from_record\n\nschema_from_record({\"id\":123, \"posts\": [{\"id\":123, \"text\": \"this is a post\"}], \"username\": \"bob\"})\n```\n\n# Contributing\n\nRequirements to commit here:\n\n  - Branch off master, PR back to master.\n  - Your code should pass [Flake8](http://flake8.readthedocs.org/en/latest/).\n  - Unit test coverage is required.\n  - Good docstrs are required.\n  - Good [commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) are required.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylertreat%2Fbigquery-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftylertreat%2Fbigquery-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylertreat%2Fbigquery-python/lists"}