Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/malloydata/malloy-py
Python package for executing Malloy
https://github.com/malloydata/malloy-py
business-analytics business-intelligence data data-modeling python semantic-modeling sql
Last synced: 4 days ago
JSON representation
Python package for executing Malloy
- Host: GitHub
- URL: https://github.com/malloydata/malloy-py
- Owner: malloydata
- License: mit
- Created: 2022-11-02T15:36:57.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-26T17:14:12.000Z (about 2 months ago)
- Last Synced: 2024-11-07T14:54:19.079Z (7 days ago)
- Topics: business-analytics, business-intelligence, data, data-modeling, python, semantic-modeling, sql
- Language: JavaScript
- Homepage:
- Size: 2.32 MB
- Stars: 26
- Watchers: 9
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![Malloy Logo](https://raw.githubusercontent.com/malloydata/malloy-py/main/assets/malloy_square_centered.png)
## What is it?
Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.
_Note: These APIs are still in development and are subject to change._
## How do I get it?
Binary installers for the latest released version are available at the [Python Package Index](https://pypi.org/project/malloy/) (PyPI).
```sh
python3 -m pip install malloy
```## Resources
- [Malloy Language GitHub](https://github.com/looker-open-source/malloy/) - Primary location for the malloy language source, documentation, and information
- [Malloy Language](https://looker-open-source.github.io/malloy/documentation/language/basic.html) - A quick introduction to the language
- [eCommerce Example Analysis](https://looker-open-source.github.io/malloy/documentation/examples/ecommerce.html) - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)
- [Modeling Walkthrough](https://looker-open-source.github.io/malloy/documentation/examples/iowa/iowa.html) - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)
- [Malloy on YouTube](https://www.youtube.com/channel/UCfN2td1dzf-fKmVtaDjacsg) - Watch demos / walkthroughs of Malloy## Join The Community
- Join our [Malloy Slack Community!](https://malloydata.github.io/slack) Use this community to ask questions, meet other Malloy users, and share ideas with one another.
- Use [GitHub issues](https://github.com/looker-open-source/malloy/issues) to provide feedback, suggest improvements, report bugs, and start new discussions.## Syntax Examples
### Run a named query from a Malloy file
```python
import asyncioimport malloy
from malloy.data.duckdb import DuckDbConnectionasync def main():
home_dir = "/path/to/samples/duckdb/imdb"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))data = await runtime.load_file(home_dir + "/imdb.malloy").run(
named_query="genre_movie_map")dataframe = data.to_dataframe()
print(dataframe)if __name__ == "__main__":
asyncio.run(main())
```### Get SQL from an in-line query, using a Malloy file as a source
```python
import asyncioimport malloy
from malloy.data.duckdb import DuckDbConnectionasync def main():
home_dir = "/path/to/samples/duckdb/faa"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))[sql, connection
] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
run: flights -> {
where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
group_by:
flight_date is dep_time.day
carrier
aggregate:
daily_flight_count is flight_count
aircraft.aircraft_count
nest: per_plane_data is {
limit: 20
group_by: tail_num
aggregate: plane_flight_count is flight_count
nest: flight_legs is {
order_by: 2
group_by:
tail_num
dep_minute is dep_time.minute
origin_code
dest_code is destination_code
dep_delay
arr_delay
}
}
}
""")print(sql)
if __name__ == "__main__":
asyncio.run(main())
```### Write an in-line Malloy model, and run a query
```python
import asyncioimport malloy
from malloy.data.duckdb import DuckDbConnectionasync def main():
home_dir = "/path/to/samples/duckdb/imdb/data"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))data = await runtime.load_source("""
source:titles is duckdb.table('titles.parquet') extend {
primary_key: tconst
dimension:
movie_url is concat('https://www.imdb.com/title/',tconst)
}
""").run(query="""
run: titles -> {
group_by: movie_url
limit: 5
}
""")dataframe = data.to_dataframe()
print(dataframe)if __name__ == "__main__":
asyncio.run(main())
```### Querying BigQuery tables
BigQuery auth via OAuth using gcloud.
```
gcloud auth login --update-adc
gcloud config set project {my_project_id} --installation
```Actual usage is similar to DuckDB.
```python
import asyncio
import malloy
from malloy.data.bigquery import BigQueryConnectionasync def main():
with malloy.Runtime() as runtime:
runtime.add_connection(BigQueryConnection())data = await runtime.load_source("""
source:ga_sessions is bigquery.table('bigquery-public-data.google_analytics_sample.ga_sessions_20170801') extend {
measure:
hits_count is hits.count()
}
""").run(query="""
run: ga_sessions -> {
where: trafficSource.`source` != '(direct)'
group_by: trafficSource.`source`
aggregate: hits_count
limit: 10
}
""")dataframe = data.to_dataframe()
print(dataframe)if __name__ == "__main__":
asyncio.run(main())```
## Development
### Initial setup
```sh
git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh
```### Regenerate Protobuf files
```sh
scripts/gen-protos.sh
```### Tests
```sh
python3 -m pytest
```