https://github.com/treasure-data/td-client-python

Treasure Data API library for Python
https://github.com/treasure-data/td-client-python
Last synced: 5 months ago
JSON representation
Treasure Data API library for Python
Host: GitHub
URL: https://github.com/treasure-data/td-client-python
Owner: treasure-data
License: apache-2.0
Created: 2014-12-26T19:21:47.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2025-12-03T02:48:59.000Z (7 months ago)
Last Synced: 2025-12-21T07:24:53.529Z (6 months ago)
Language: Python
Homepage:
Size: 903 KB
Stars: 48
Watchers: 79
Forks: 23
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project

README

          
Treasure Data API library for Python

====================================

.. image:: https://github.com/treasure-data/td-client-python/workflows/Python%20testing/badge.svg

   :target: https://github.com/treasure-data/td-client-python/actions

   :alt: Build Status on GitHub Actions

.. image:: https://badge.fury.io/py/td-client.svg

   :target: http://badge.fury.io/py/td-client

   :alt: PyPI version

Treasure Data API library for Python

Requirements

------------

``td-client`` supports the following versions of Python.

* Python 3.10+

* PyPy

Install

-------

You can install the releases from `PyPI `_.

.. code-block:: sh

   $ pip install td-client

It'd be better to install `certifi `_ to enable SSL certificate verification.

.. code-block:: sh

   $ pip install certifi

Examples

--------

Please see also the examples at `Treasure Data Documentation `_.

The td-client documentation is hosted at https://tdclient.readthedocs.io/,

or you can go directly to the

`API documentation `_.

For information on the parameters that may be used when reading particular

types of data, see `File import parameters`_.

.. _`file import parameters`:

   https://tdclient.readthedocs.io/en/latest/file_import_parameters.html

Listing jobs

^^^^^^^^^^^^

Treasure Data API key will be read from environment variable ``TD_API_KEY``\ , if none is given via ``apikey=`` argument passed to ``tdclient.Client``.

Treasure Data API endpoint ``https://api.treasuredata.com`` is used by default. You can override this with environment variable ``TD_API_SERVER``\ , which in turn can be overridden via ``endpoint=`` argument passed to ``tdclient.Client``. List of available Treasure Data sites and corresponding API endpoints can be found `here `_.

.. code-block:: python

   import tdclient

   with tdclient.Client() as td:

       for job in td.jobs():

           print(job.job_id)

Running jobs

^^^^^^^^^^^^

Running jobs on Treasure Data.

.. code-block:: python

   import tdclient

   with tdclient.Client() as td:

       job = td.query("sample_datasets", "SELECT COUNT(1) FROM www_access", type="hive")

       job.wait()

       for row in job.result():

           print(repr(row))

Running jobs via DBAPI2

^^^^^^^^^^^^^^^^^^^^^^^

td-client-python implements `PEP 0249 `_ Python Database API v2.0.

You can use td-client-python with external libraries which supports Database API such like `pandas `_.

.. code-block:: python

   import pandas

   import tdclient

   def on_waiting(cursor):

       print(cursor.job_status())

   with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:

       data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)

       print(repr(data))

We offer another package for pandas named `pytd `_ with some advanced features.

You may prefer it if you need to do complicated things, such like exporting result data to Treasure Data, printing job's

progress during long execution, etc.

Importing data

^^^^^^^^^^^^^^

Importing data into Treasure Data in streaming manner, as similar as `fluentd `_ is doing.

.. code-block:: python

   import sys

   import tdclient

   with tdclient.Client() as td:

       for file_name in sys.argv[:1]:

           td.import_file("mydb", "mytbl", "csv", file_name)

.. Warning::

   Importing data in streaming manner requires certain amount of time to be ready to query since schema update will be

   executed with delay.

Bulk import

^^^^^^^^^^^

Importing data into Treasure Data in batch manner.

.. code-block:: python

   import sys

   import tdclient

   import uuid

   import warnings

   if len(sys.argv) <= 1:

       sys.exit(0)

   with tdclient.Client() as td:

       session_name = "session-{}".format(uuid.uuid1())

       bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")

       try:

           for file_name in sys.argv[1:]:

               part_name = "part-{}".format(file_name)

               bulk_import.upload_file(part_name, "json", file_name)

           bulk_import.freeze()

       except:

           bulk_import.delete()

           raise

       bulk_import.perform(wait=True)

       if 0 < bulk_import.error_records:

           warnings.warn("detected {} error records.".format(bulk_import.error_records))

       if 0 < bulk_import.valid_records:

           print("imported {} records.".format(bulk_import.valid_records))

       else:

           raise(RuntimeError("no records have been imported: {}".format(bulk_import.name)))

       bulk_import.commit(wait=True)

       bulk_import.delete()

If you want to import data as `msgpack `_ format, you can write as follows:

.. code-block:: python

   import io

   import time

   import uuid

   import warnings

   import tdclient

   t1 = int(time.time())

   l1 = [{"a": 1, "b": 2, "time": t1}, {"a": 3, "b": 9, "time": t1}]

   with tdclient.Client() as td:

       session_name = "session-{}".format(uuid.uuid1())

       bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")

       try:

           _bytes = tdclient.util.create_msgpack(l1)

           bulk_import.upload_file("part", "msgpack", io.BytesIO(_bytes))

           bulk_import.freeze()

       except:

           bulk_import.delete()

           raise

       bulk_import.perform(wait=True)

       # same as the above example

Changing how CSV and TSV columns are read

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``td-client`` package will generally make sensible choices on how to read

the columns in CSV and TSV data, but sometimes the user needs to override the

default mechanism. This can be done using the optional `file import

parameters`_ ``dtypes`` and ``converters``.

For instance, consider CSV data that starts with the following records::

  time,col1,col2,col3

  1575454204,a,0001,a;b;c

  1575454204,b,0002,d;e;f

If that data is read using the defaults, it will produce values that look

like:

.. code:: python

  1575454204, "a", 1, "a;b;c"

  1575454204, "b", 2, "d;e;f"

that is, an integer, a string, an integer and another string.

If the user wants to keep the leading zeroes in ``col2``, then they can

specify the column datatype as string. For instance, using

``bulk_import.upload_file`` to read data from ``input_data``:

.. code:: python

    bulk_import.upload_file(

        "part", "msgpack", input_data,

        dtypes={"col2": "str"},

    )

which would produce:

.. code:: python

  1575454204, "a", "0001", "a;b;c"

  1575454204, "b", "0002", "d;e;f"

If they also wanted to treat ``col3`` as a sequence of strings, separated by

semicolons, then they could specify a function to process ``col3``:

.. code:: python

    bulk_import.upload_file(

        "part", "msgpack", input_data,

        dtypes={"col2": "str"},

        converters={"col3", lambda x: x.split(";")},

    )

which would produce:

.. code:: python

  1575454204, "a", "0001", ["a", "b", "c"]

  1575454204, "b", "0002", ["d", "e", "f"]

Type Hints

----------

td-client-python includes comprehensive type hints (PEP 484) for improved development experience with static type checkers like mypy and pyright. Type hints are available for all public APIs.

**Features:**

* Fully typed public API with precise type annotations

* ``py.typed`` marker file for PEP 561 compliance

* Type aliases in ``tdclient.types`` for common patterns

* Support for type checking with mypy, pyright, and other tools

**Example with type checking:**

.. code-block:: python

   import tdclient

   # Type checkers will understand the types

   with tdclient.Client(apikey="your_api_key") as client:

       # client is inferred as tdclient.Client

       job = client.query("sample_db", "SELECT COUNT(1) FROM table", type="presto")

       # job is inferred as tdclient.models.Job

       job.wait()

       for row in job.result():

           # row is inferred as dict[str, Any]

           print(row)

**Using type aliases:**

.. code-block:: python

   from tdclient.types import QueryEngineType, Priority

   def run_query(engine: QueryEngineType, priority: Priority) -> None:

       with tdclient.Client() as client:

           job = client.query("mydb", "SELECT 1", type=engine, priority=priority)

           job.wait()

Development

-----------

Running tests

^^^^^^^^^^^^^

Install the project dependencies with `uv `_ (runtime

and test extras) and execute pytest via ``uv run``.

.. code-block:: sh

    $ uv sync --extra test

    $ uv run pytest tdclient/test

To run the coverage suite locally, use:

.. code-block:: sh

    $ uv run coverage run --source=tdclient -m pytest tdclient/test

    $ uv run coverage report

Linting and type checking

^^^^^^^^^^^^^^^^^^^^^^^^^

Install the development extras and invoke ``ruff`` and ``pyright`` using

``uv run``.

.. code-block:: sh

    $ uv sync --dev

    $ uv run ruff format tdclient --diff --exit-non-zero-on-fix

    $ uv run ruff check tdclient

    $ uv run pyright tdclient

Running tests (tox)

^^^^^^^^^^^^^^^^^^^

You can run tests against all supported Python versions with ``tox``. I'd

recommend you to install `pyenv `_ to manage

additional interpreters.

.. code-block:: sh

   $ pyenv shell system

   $ for version in $(cat .python-version); do [ -d "$(pyenv root)/versions/${version}" ] || pyenv install "${version}"; done

   $ pyenv shell --unset

Install the development extras (which include ``tox``) with ``uv``.

.. code-block:: sh

    $ uv sync --dev

Then, run ``tox`` via ``uv``.

.. code-block:: sh

    $ uv run tox

Release

^^^^^^^

1. Update version `x.x.x` in `pyproject.toml`.

2. Create a PR with `release-x.x.x` branch. Request and merge the PR.

3. Create and push a tag `x.x.x` on `release-x.x.x` merge commit.

4. Create a Release on GitHub will publish new version to PyPI.

Manual release

~~~~~~~~~~~~~~

If you want to release manually, you can upload by twine.

.. code-block:: sh

   $ python -m build

   $ twine upload dist/*

License

-------

Apache Software License, Version 2.0
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/treasure-data/td-client-python

Awesome Lists containing this project

README