https://github.com/malloydata/malloy-py

Python package for executing Malloy
https://github.com/malloydata/malloy-py

business-analytics business-intelligence data data-modeling python semantic-modeling sql

Last synced: 7 months ago
JSON representation

Python package for executing Malloy

Host: GitHub
URL: https://github.com/malloydata/malloy-py
Owner: malloydata
License: mit
Created: 2022-11-02T15:36:57.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2025-02-11T19:07:32.000Z (8 months ago)
Last Synced: 2025-03-02T13:10:06.338Z (7 months ago)
Topics: business-analytics, business-intelligence, data, data-modeling, python, semantic-modeling, sql
Language: JavaScript
Homepage:
Size: 2.35 MB
Stars: 27
Watchers: 10
Forks: 8
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ![Malloy Logo](https://raw.githubusercontent.com/malloydata/malloy-py/main/assets/malloy_square_centered.png)

## What is it?

Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.

_Note: These APIs are still in development and are subject to change._

## How do I get it?

Binary installers for the latest released version are available at the [Python Package Index](https://pypi.org/project/malloy/) (PyPI).

```sh

python3 -m pip install malloy

```

## Resources

- [Malloy Language GitHub](https://github.com/looker-open-source/malloy/) - Primary location for the malloy language source, documentation, and information

- [Malloy Language](https://looker-open-source.github.io/malloy/documentation/language/basic.html) - A quick introduction to the language

- [eCommerce Example Analysis](https://looker-open-source.github.io/malloy/documentation/examples/ecommerce.html) - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)

- [Modeling Walkthrough](https://looker-open-source.github.io/malloy/documentation/examples/iowa/iowa.html) - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)

- [Malloy on YouTube](https://www.youtube.com/channel/UCfN2td1dzf-fKmVtaDjacsg) - Watch demos / walkthroughs of Malloy

## Join The Community

- Join our [Malloy Slack Community!](https://malloydata.github.io/slack) Use this community to ask questions, meet other Malloy users, and share ideas with one another.

- Use [GitHub issues](https://github.com/looker-open-source/malloy/issues) to provide feedback, suggest improvements, report bugs, and start new discussions.

## Syntax Examples

### Run a named query from a Malloy file

```python

import asyncio

import malloy

from malloy.data.duckdb import DuckDbConnection

async def main():

  home_dir = "/path/to/samples/duckdb/imdb"

  with malloy.Runtime() as runtime:

    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    data = await runtime.load_file(home_dir + "/imdb.malloy").run(

        named_query="genre_movie_map")

    dataframe = data.to_dataframe()

    print(dataframe)

if __name__ == "__main__":

  asyncio.run(main())

```

### Get SQL from an in-line query, using a Malloy file as a source

```python

import asyncio

import malloy

from malloy.data.duckdb import DuckDbConnection

async def main():

  home_dir = "/path/to/samples/duckdb/faa"

  with malloy.Runtime() as runtime:

    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    [sql, connection

    ] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""

                  run: flights -> {

                    where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03

                    group_by:

                      flight_date is dep_time.day

                      carrier

                    aggregate:

                      daily_flight_count is flight_count

                      aircraft.aircraft_count

                    nest: per_plane_data is {

                      limit: 20

                      group_by: tail_num

                      aggregate: plane_flight_count is flight_count

                      nest: flight_legs is {

                        order_by: 2

                        group_by:

                          tail_num

                          dep_minute is dep_time.minute

                          origin_code

                          dest_code is destination_code

                          dep_delay

                          arr_delay

                      }

                    }

                }

            """)

    print(sql)

if __name__ == "__main__":

  asyncio.run(main())

```

### Write an in-line Malloy model, and run a query

```python

import asyncio

import malloy

from malloy.data.duckdb import DuckDbConnection

async def main():

  home_dir = "/path/to/samples/duckdb/imdb/data"

  with malloy.Runtime() as runtime:

    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    data = await runtime.load_source("""

        source:titles is duckdb.table('titles.parquet') extend {

          primary_key: tconst

          dimension:

            movie_url is concat('https://www.imdb.com/title/',tconst)

        }

        """).run(query="""

        run: titles -> {

          group_by: movie_url

          limit: 5

        }

        """)

    dataframe = data.to_dataframe()

    print(dataframe)

if __name__ == "__main__":

  asyncio.run(main())

  

```

### Querying BigQuery tables

BigQuery auth via OAuth using gcloud.

```

gcloud auth login --update-adc

gcloud config set project {my_project_id} --installation

```

Actual usage is similar to DuckDB.

```python

import asyncio

import malloy

from malloy.data.bigquery import BigQueryConnection

async def main():

  with malloy.Runtime() as runtime:

    runtime.add_connection(BigQueryConnection())

    data = await runtime.load_source("""

        source:ga_sessions is bigquery.table('bigquery-public-data.google_analytics_sample.ga_sessions_20170801') extend {

          measure:

            hits_count is hits.count()

        }

        """).run(query="""

        run: ga_sessions -> {

            where: trafficSource.`source` != '(direct)'

            group_by: trafficSource.`source`

            aggregate: hits_count

            limit: 10

          }

        """)

    dataframe = data.to_dataframe()

    print(dataframe)

if __name__ == "__main__":

  asyncio.run(main())

```

## Development

### Initial setup

```sh

git submodule init

git submodule update

python3 -m pip install -r requirements.dev.txt

scripts/gen-services.sh

```

### Regenerate Protobuf files

```sh

scripts/gen-protos.sh

```

### Tests

```sh

python3 -m pytest

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/malloydata/malloy-py

Awesome Lists containing this project

README