
An open API service indexing awesome lists of open source software.

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 3 months ago
JSON representation

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.





workflows made easy

[![Python versions](](
[![Development Status](](
[![PyPI downloads](](
[![Commit activity](](
[![ status](](

**Astro Python SDK** is a Python SDK for rapid development of extract, transform, and load workflows in [Apache Airflow]( It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by [Astronomer](

## Prerequisites

- Apache Airflow >= 2.1.0.

## Install

The Astro Python SDK is available at [PyPI]( Use the standard Python
[installation tools](

To install a cloud-agnostic version of the SDK, run:

pip install astro-sdk-python

You can also install dependencies for using the SDK with popular cloud providers:

pip install astro-sdk-python[amazon,google,snowflake,postgres]

## Quickstart
1. Ensure that your Airflow environment is set up correctly by running the following commands:

export AIRFLOW_HOME=`pwd`
airflow db init

> **Note:**
> - `AIRFLOW__CORE__ENABLE_XCOM_PICKLING` no longer needs to be enabled from astro-sdk-python release 1.2 and above.
> - For airflow version < 2.5 and astro-sdk-python release < 1.3 Users can either use a custom XCom backend [AstroCustomXcomBackend]( with Xcom pickling disabled (or) enable Xcom pickling.
> - For airflow version >= 2.5 and astro-sdk-python release >= 1.3.3 Users can either use [Airflow's Xcom backend]( with Xcom pickling disabled (or) enable Xcom pickling.

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

Read more: [enable_xcom_pickling]( and [pickle](

2. Create a SQLite database for the example to run with:

# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`

3. Copy the following workflow into a file named `` and add it to the `dags` directory of your Airflow project:

Alternatively, you can download ``
curl -O

4. Run the example DAG:

airflow dags test calculate_popular_movies `date -Iseconds`

5. Check the result of your DAG by running:

sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"

You should see the following output:

$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9

## Supported technologies

| FileLocation |
| :----------- |
| local |
| http |
| https |
| gs |
| gdrive |
| s3 |
| wasb |
| wasbs |
| azure |
| sftp |
| ftp |

| FileType |
| :------- |
| csv |
| json |
| ndjson |
| parquet |
| xls |
| xlsx |

| Database |
| :-------- |
| postgres |
| sqlite |
| delta |
| bigquery |
| snowflake |
| redshift |
| mssql |
| duckdb |
| mysql |

## Available operations

The following are some key functions available in the SDK:

- [`load_file`]( Load a given file into a SQL table
- [`transform`]( Applies a SQL select statement to a source table and saves the result to a destination table
- [`drop_table`]( Drops a SQL table
- [`run_raw_sql`]( Run any SQL statement without handling its output
- [`append`]( Insert rows from the source SQL table into the destination SQL table, if there are no conflicts
- [`merge`]( Insert rows from the source SQL table into the destination SQL table, depending on conflicts:
- `ignore`: Do not add rows that already exist
- `update`: Replace existing rows with new ones
- [`export_file`]( Export SQL table rows into a destination file
- [`dataframe`]( Export given SQL table into in-memory Pandas data-frame

For a full list of available operators, see the [SDK reference documentation](

## Documentation

The documentation is a work in progress--we aim to follow the [Diátaxis]( system:

- **[Getting Started Tutorial](**: A hands-on introduction to the Astro Python SDK
- **How-to guides**: Simple step-by-step user guides to accomplish specific tasks
- **[Reference guide](**: Commands, modules, classes and methods
- **Explanation**: Clarification and discussion of key decisions when designing the project

## Changelog

The Astro Python SDK follows semantic versioning for releases. Check the [changelog](python-sdk/docs/ for the latest changes.

## Release managements

To learn more about our release philosophy and steps, see [Managing Releases](python-sdk/docs/development/

## Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the [Contribution Guideline](python-sdk/docs/development/ for a detailed overview on how to contribute.

Contributors and maintainers should abide by the [Contributor Code of Conduct](

## License

[Apache Licence 2.0](LICENSE)