{"id":42482609,"url":"https://github.com/cloudspannerecosystem/spanner-analytics","last_synced_at":"2026-01-28T11:15:42.293Z","repository":{"id":188588383,"uuid":"679023147","full_name":"cloudspannerecosystem/spanner-analytics","owner":"cloudspannerecosystem","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-07T01:53:37.000Z","size":33,"stargazers_count":2,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-01-17T14:46:39.842Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudspannerecosystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null}},"created_at":"2023-08-15T23:53:25.000Z","updated_at":"2024-08-18T15:19:15.000Z","dependencies_parsed_at":"2023-08-16T02:19:18.445Z","dependency_job_id":null,"html_url":"https://github.com/cloudspannerecosystem/spanner-analytics","commit_stats":null,"previous_names":["cloudspannerecosystem/spanner-analytics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cloudspannerecosystem/spanner-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudspannerecosystem%2Fspanner-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudspannerecosystem%2Fspanner-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudspannerecosystem%2Fspanner-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudspannerecosystem%2Fspanner-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudspannerecosystem","download_url":"https://codeload.github.com/cloudspannerecosystem/spanner-analytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudspannerecosystem%2Fspanner-analytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28844861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T10:53:21.605Z","status":"ssl_error","status_checked_at":"2026-01-28T10:53:20.789Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-28T11:15:41.498Z","updated_at":"2026-01-28T11:15:42.288Z","avatar_url":"https://github.com/cloudspannerecosystem.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spanner-analytics\n\nThis package aims to facilitate common data-analytic operations in Python\nusing data from Cloud Spanner.  This includes integrations with Jupyter\nNotebooks.\n\n## Using\n\n### Installation\n\nInstall from PyPI:\n\n```\npip install spanner-analytics\n```\n\n## Developing\n\nThis package can be used from Python code.  For example:\n\n```python\nfrom spanner_analytics import Database\ndb = Database.connect('\u003cproject\u003e', '\u003cinstance\u003e', '\u003cdatabase\u003e')\ndataframe = db.execute_sql(\"SELECT * FROM my_table\")\n```\n\nThe package also offers a \"magic\" command that can be used within a Jupyter\nNotebook.  For example:\n\n```\n%load_ext spanner_analytics.magic\n```\n```\n%%spanner --project \u003cproject\u003e --instance \u003cinstance\u003e --database \u003cdatabase\u003e\n\nSELECT * FROM my_table\n```\n\nQueries are executed using Cloud Spanner\n[DataBoost](https://cloud.google.com/spanner/docs/databoost/databoost-overview).\nDataBoost allocates dedicated compute resources to execute your query.  So it\nwon't compete for resources with other workloads on your production database.\nBut you will be billed for compute resources consumed by a query.  See the\n[DataBoost Pricing](https://cloud.google.com/spanner/pricing#spanner-data-boost-pricing)\npage for more details.\n\n### Root-partitionable queries\n\nQueries currently must be _root-partitionable_.  This means that the query can\nbe logically decomposed into independent operations that operate on collocated\ndata, with no data shuffling and no final aggregation required.  The query plan\nmust specifically contain a DistributedUnion operator as its topmost operator,\nand otherwise follow the documentation on\n[reading data in parallel](https://cloud.google.com/spanner/docs/reads#read_data_in_parallel).\nThis enables Spanner's client to connect in parallel to multiple Spanner\nnodes to fetch data with maximum performance.\n\nFor example,\n\n```\nSELECT a + b FROM t\n```\n\nis root-partitionable because it operates on each row independently.  Similarly,\n\n```\nSELECT a + b FROM t\nWHERE c \u003c 5\n```\n\nis root-partitionable because, while some nodes may not have any data that's\nrelevant to the query, that determination can be made independently.\n\n```\nSELECT sum(a) FROM t   -- Nope!\n```\n\nis NOT root-partitionable:  While each node can scan in parallel, the query\nrequires bringing data back to a single node to compute the final sum.  This can\nbe implemented by reading all data from `t` and performing the sum using Pandas.\n\n```\nSELECT * FROM t1 JOIN t1 ON t1.x = t2.y\n```\n\nMAY be root-partitionable IF `t1` and `t2` are `INTERLEAVED` together.\nInterleaved tables are stored together, so joins between them can be\nperformed locally.  Non-interleaved tables require shuffling each affected\nrecord from one table over to the node that stores the corresponding record\nfrom the other table.  Because of this requirement to send data between nodes,\nnon-interleaved joins are generally not root-partitionable.\n\n\n## Building\n\nThis package uses the `setuptools` and `build` packages.  `cd` into the\nrepository's top-level directory and run:\n\n```\npython3 -m build\n```\n\nThis will produce a `.whl` file under `dist/`.  For more information about\nPython's build process, see Python's packaging\n[documentation](https://packaging.python.org/en/latest/tutorials/packaging-projects/).\nAlso see `package_test.py`.\n\n\n## Testing changes\n\nThis project uses [pytest](http://pytest.org) to test its code.  To execute\nall tests, `cd` to the repository's top-level directory and run:\n\n```\npytest .\n```\n\nThe end-to-end tests in this suite depend on Google's `gcloud` command-line\ntool, and will be skipped if it's not available.  The tool is used to launch\na local Spanner Emulator process, to test that this code can correctly connect\nto a Spanner database and handle results that it returns.  `gcloud` can be\ninstalled following\n[these directions](https://cloud.google.com/sdk/docs/install).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudspannerecosystem%2Fspanner-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudspannerecosystem%2Fspanner-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudspannerecosystem%2Fspanner-analytics/lists"}