Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/betodealmeida/gsheets-db-api
A Python DB-API and SQLAlchemy dialect to Google Spreasheets
https://github.com/betodealmeida/gsheets-db-api
api db google python spreadsheet spreadsheets sql
Last synced: about 1 hour ago
JSON representation
A Python DB-API and SQLAlchemy dialect to Google Spreasheets
- Host: GitHub
- URL: https://github.com/betodealmeida/gsheets-db-api
- Owner: betodealmeida
- License: mit
- Created: 2018-09-10T22:07:02.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T11:32:58.000Z (almost 2 years ago)
- Last Synced: 2024-10-30T06:58:17.535Z (14 days ago)
- Topics: api, db, google, python, spreadsheet, spreadsheets, sql
- Language: Python
- Homepage:
- Size: 121 KB
- Stars: 213
- Watchers: 7
- Forks: 16
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - betodealmeida/gsheets-db-api - A Python DB-API and SQLAlchemy dialect to Google Spreasheets (Python)
README
[![Build Status](https://travis-ci.org/betodealmeida/gsheets-db-api.svg?branch=master)](https://travis-ci.org/betodealmeida/gsheets-db-api) [![codecov](https://codecov.io/gh/betodealmeida/gsheets-db-api/branch/master/graph/badge.svg)](https://codecov.io/gh/betodealmeida/gsheets-db-api)
**Note:** [shillelagh](https://github.com/betodealmeida/shillelagh/) is a drop-in replacement for `gsheets-db-api`, with many additional features. You should use it instead. If you're using SQLAlchemy all you need to do:
```bash
$ pip uninstall gsheetsdb
$ pip install shillelagh
```If you're using the DB API:
```bash
# from gsheetsdb import connect
from shillelagh.backends.apsw.db import connect
```# A Python DB API 2.0 for Google Spreadsheets #
This module allows you to query Google Spreadsheets using SQL.
Using [this spreadsheet](https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/) as an example:
| | A | B |
|-|--------|-----|
| 1 | country | cnt |
| 2 | BR | 1 |
| 3 | BR | 3 |
| 4 | IN | 5 |Here's a simple query using the Python API:
```python
from gsheetsdb import connectconn = connect()
result = conn.execute("""
SELECT
country
, SUM(cnt)
FROM
"https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
GROUP BY
country
""", headers=1)
for row in result:
print(row)
```This will print:
```
Row(country='BR', sum_cnt=4.0)
Row(country='IN', sum_cnt=5.0)
```## How it works ##
### Transpiling ###
Google spreadsheets can actually be queried with a [very limited SQL API](https://developers.google.com/chart/interactive/docs/querylanguage). This module will transpile the SQL query into a simpler query that the API understands. Eg, the query above would be translated to:
```sql
SELECT A, SUM(B) GROUP BY A
```### Processors ###
In addition to transpiling, this module also provides pre- and post-processors. The pre-processors add more columns to the query, and the post-processors build the actual result from those extra columns. Eg, `COUNT(*)` is not supported, so the following query:
```sql
SELECT COUNT(*) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
```Gets translated to:
```sql
SELECT COUNT(A), COUNT(B)
```And then the maximum count is returned. This assumes that at least one column has no `NULL`s.
### SQLite ###
When a query can't be expressed, the module will issue a `SELECT *`, load the data into an in-memory SQLite table, and execute the query in SQLite. This is obviously inneficient, since all data has to be downloaded, but ensures that all queries succeed.## Installation ##
```bash
$ pip install gsheetsdb
$ pip install gsheetsdb[cli] # if you want to use the CLI
$ pip install gsheetsdb[sqlalchemy] # if you want to use it with SQLAlchemy
```## CLI ##
The module will install an executable called `gsheetsdb`:
```bash
$ gsheetsdb --headers=1
> SELECT * FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
country cnt
--------- -----
BR 1
BR 3
IN 5
> SELECT country, SUM(cnt) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1
pscv8ZXPtg8/" GROUP BY country
country sum cnt
--------- ---------
BR 4
IN 5
>
```## SQLAlchemy support ##
This module provides a SQLAlchemy dialect. You don't need to specify a URL, since the spreadsheet is extracted from the `FROM` clause:
```python
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *engine = create_engine('gsheets://')
inspector = inspect(engine)table = Table(
'https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0',
MetaData(bind=engine),
autoload=True)
query = select([func.count(table.columns.country)], from_obj=table)
print(query.scalar()) # prints 3.0
```Alternatively, you can initialize the engine with a "catalog". The catalog is a Google spreadsheet where each row points to another Google spreadsheet, with URL, number of headers and schema as the columns. You can see an example [here](https://docs.google.com/spreadsheets/d/1AAqVVSpGeyRZyrr4n--fb_IxhLwwKtLbjfu4h6MyyYA/edit#gid=0):
|| A | B | C |
|-|-|-|-|
| 1 | https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0 | 1 | default |
| 2 | https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1077884006 | 2 | default |This will make the two spreadsheets above available as "tables" in the `default` schema.
## Authentication ##
You can access spreadsheets that are shared only within an organization. In order to do this, first [create a service account](https://developers.google.com/api-client-library/python/auth/service-accounts#creatinganaccount). Make sure you select "Enable G Suite Domain-wide Delegation". Download the key as a JSON file.
Next, you need to manage API client access at https://admin.google.com/${DOMAIN}/AdminHome?chromeless=1#OGX:ManageOauthClients. Add the "Unique ID" from the previous step as the "Client Name", and add `https://spreadsheets.google.com/feeds` as the scope.
Now, when creating the connection from the DB API or from SQLAlchemy you can point to the JSON file and the user you want to impersonate:
```python
>>> auth = {'service_account_file': '/path/to/certificate.json', 'subject': '[email protected]'}
>>> conn = connect(auth)
```