{"id":31845651,"url":"https://github.com/dbis-ilm/grizzly","last_synced_at":"2025-10-12T08:17:49.140Z","repository":{"id":35131195,"uuid":"203779715","full_name":"dbis-ilm/grizzly","owner":"dbis-ilm","description":"A Python-to-SQL transpiler as replacement for Python Pandas","archived":false,"fork":false,"pushed_at":"2022-12-19T15:02:45.000Z","size":24682,"stargazers_count":48,"open_issues_count":2,"forks_count":7,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-09-07T12:22:36.427Z","etag":null,"topics":["code-generation","database","pandas","python","sql","transpiler"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dbis-ilm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-22T11:19:02.000Z","updated_at":"2025-01-07T14:36:51.000Z","dependencies_parsed_at":"2023-01-15T14:30:38.999Z","dependency_job_id":null,"html_url":"https://github.com/dbis-ilm/grizzly","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dbis-ilm/grizzly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbis-ilm%2Fgrizzly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbis-ilm%2Fgrizzly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbis-ilm%2Fgrizzly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbis-ilm%2Fgrizzly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dbis-ilm","download_url":"https://codeload.github.com/dbis-ilm/grizzly/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbis-ilm%2Fgrizzly/sbom","scorecard":{"id":328925,"data":{"date":"2025-08-11","repo":{"name":"github.com/dbis-ilm/grizzly","commit":"9149b11d85540d1858a72c2449e3a836e0740bcb"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-18T03:07:49.897Z","repository_id":35131195,"created_at":"2025-08-18T03:07:49.898Z","updated_at":"2025-08-18T03:07:49.898Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010799,"owners_count":26084807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-generation","database","pandas","python","sql","transpiler"],"created_at":"2025-10-12T08:17:44.122Z","updated_at":"2025-10-12T08:17:49.129Z","avatar_url":"https://github.com/dbis-ilm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grizzly\n\n[![Testing](https://dbgit.prakinf.tu-ilmenau.de/code/grizzly/badges/master/pipeline.svg)](https://dbgit.prakinf.tu-ilmenau.de/code/grizzly/commits/master)\n[![coverage report](https://dbgit.prakinf.tu-ilmenau.de/code/grizzly/badges/master/coverage.svg)](https://dbgit.prakinf.tu-ilmenau.de/code/grizzly/commits/master)\n\nGrizzly is a transpiler from a Python-API to SQL to move computations from the client into a database system.\n\nGrizzly implements its own `DataFrame` structure that tracks operations, like projection, filter, joins, ...\nOnly when the result of the sequence of operations is needed, a SQL string is produced, resembling all those operations, and sent to a DBMS.\nThis way, you don't have to care about Out-of-Memory problems, un-optimized queries, and high CPU load.\n\n## Publications\nWe presented the idea as well as key concepts at several conferences:\n\n - Stefan Hagedorn: [**When sweet and cute isn't enough anymore: Solving scalability issues in Python Pandas with Grizzly.**](http://cidrdb.org/cidr2020/gongshow2020/gongshow/abstracts/cidr2020_abstract76.pdf), *CIDR 2020*\n - Stefan Hagedorn, Steffen Kläbe, Kai-Uwe Sattler: [**Putting Pandas in a Box**](http://cidrdb.org/cidr2021/papers/cidr2021_paper07.pdf), *CIDR 2021*\n   - [Presentation on Youtube](https://www.youtube.com/watch?v=8zUszpr0300)\n - Steffen Kläbe, Stefan Hagedorn: [**When Bears get Machine Support: Applying Machine Learning Models to Scalable DataFrames with Grizzly**](https://dl.gi.de/bitstream/handle/20.500.12116/35793/A2-4.pdf), *BTW 2021*\n - Stefan Hagedorn, Steffen Kläbe, Kai-Uwe Sattler: [**Conquering a Panda’s weaker self - Fighting laziness with laziness**](https://edbt2021proceedings.github.io/docs/p174.pdf), *EDBT 2021*, Demo Paper\n   - [Presentation on Youtube](https://www.youtube.com/embed/nBvUPlU_NOU)\n - Steffen Kläbe, Robert DeSantis, Stefan Hagedorn, Kai-Uwe Sattler: [**Accelerating Python UDFs in Vectorized Query Execution**](http://cidrdb.org/cidr2022/papers/p33-klaebe.pdf), *CIDR 2022*\n   - [Presentation on Youtube](https://www.youtube.com/watch?v=FLatSmSGkk8)\n\n\n## Installation\n\nGrizzly is available on PyPi: \u003chttps://pypi.org/project/grizzly-sql\u003e\n\n```python\npip3 install --user grizzly-sql\n```\n\n## Dependencies\n\nGrizzly uses\n\n- Python 3\n- [SQLite3](https://docs.python.org/2/library/sqlite3.html) (currently for tests only)\n- [BeautifulTable](https://github.com/pri22296/beautifultable) for pretty output\n- [PyYAML](https://pypi.org/project/PyYAML/) for support of vendor-specific query templates\n- [antlr4-python3-runtime 4.9.3](https://pypi.org/project/antlr4-python3-runtime/4.9.3/) for compiling Python UDFs to prozedual sql\n\n## Getting started\n\n### Import\n\nAs with any Python module, just import it\n\n```Python\nimport grizzly\n```\n\n### Connection\n\nConnect to your database using an appropriate connection string. In order to load the shipped test database containing events from the [GDELT](https://www.gdeltproject.org/) project:\n\n```python\nimport sqlite3\ncon = sqlite3.connect(\"grizzly.db\")\n```\nGrizzly uses different classes for code generation and executing the produced query.\nCurrently, Grizzly includes a SQL code generator and execution wrapper for relational DBMS (more will follow).\nIn order to activate them, set:\n\n```python\nfrom grizzly.relationaldbexecutor import RelationalExecutor\nfrom grizzly.sqlgenerator import SQLGenerator\ngrizzly.use(RelationalExecutor(con, SQLGenerator(\"sqlite\")))\n```\n\nThe `RelationalExecutor` constructor has a parameter for the code generator to use. By default this is a `grizzly.sqlgenerator.SQLGenerator`, but can be set to some own implementation.\n\nThe parameter to `SQLGenerator` defines the SQL dialect of the underlying database system. We store vendor-specific code in a configuration file `grizzly.yml`. The dialect is only needed for `limit` operation which some SQL engines implement as `LIMIT` whereas others have `TOP`. Also UDFs (see below) require system-specific code.\n\nNow, reference the table(s) you want to work with:\n\n```python\ndf = grizzly.read_table(\"events\")\n```\n\nHere, `df` is just a reference, it contains no data from your table.\nTo show its complete contents, use the `show` method:\n\n```python\ndf.show(pretty=True)\n```\n\nThis will print the table's content on the screen. Alternatively, you can convert the dataframe into a string using `str(df)`.\n\nIn order to collect the result of a query/program into a local list, use `df.collect(includeHeader=True)`\n\n### Filter \u0026 Projection\n\nOperations are similar to Pandas:\n\n```python\ndf[df[\"globaleventid\"] == 470747760] # filter\ndf = df[[\"actor1name\",\"actor2name\"]] #projection\n```\n\nA column can also be referenced using the dot notation, e.g. `df.actor1name`.\n\n\n### Joins\n\nA `DataFrame` can be joined with another `DataFrame`:\n\n```python\ndf1 = grizzly.read_table(\"t1\")\ndf2 = grizzly.read_table(\"t2\")\n\njoined = df1.join(df2, on=[\"actor1name\", \"actor2name\"], how=\"inner\", comp='=')\n```\n\nIn the `on` parameter, you specify the join columns. The first one is for the left input (`df1`), the second one for the right input (`df2`).\nThe `how` parameter is used to select the join type: `inner`, `left outer`, etc. This value is directly placed into the generated query, and thus depends on\nthe dialect of the underlying DBMS. An additional `comp` parameter lets you choose the comparison operator.\n\nYou sometimes want to join on multiple columns with different comparisons. For this, in Grizzly you define the expression as if it was for filters:\n\n```python\ndf1 = grizzly.read_table(\"t1\")\ndf2 = grizzly.read_table(\"t2\")\n\nj = df1.join(df2, on = (df1.actor1name == df2.actor2name) | (df1[\"actor1countrycode\"] \u003c= df2[\"actor2countrycode\"]), how=\"left outer\")\n```\n\nThis results in the following SQL code:\n\n```sql\nSELECT * \nFROM (SELECT * FROM t1 _t0) _t1  \n    left outer JOIN (SELECT * FROM t2 _t2) _t3 ON _t1.actor1name = _t3.actor2name or _t1.actor1countrycode \u003c= _t3.actor2countrycode\n```\n\n### Grouping \u0026 Aggregation\n\nYou can also group the data on multiple columns and compute an aggregate over the groups using `agg`:\n\n```python\nfrom grizzly.aggregates import AggregateType\ndf = grizzly.read_table(\"events\")\ng = df.groupby([\"year\",\"actor1name\"])\n\na = g.agg(col=\"actor2name\", aggType=AggregateTyoe.COUNT)\n```\n\nHere, `a` represents a DataFrame with three columns: `year`, `monthyear` and the `count` value. In the above example, `a.generateQuery()` will give\n\n```sql\nSELECT _t0.year, _t0.actor1name, count(_t0.actor2name)\nFROM events _t0 \nGROUP BY _t0.year, _t0.actor1name\n```\n\nIf no aggregation function and projection is used, only the grouping columns are selected upon query generation.\n\nYou can apply aggregation functions on non-grouped `DataFrame`s of course. In this case the aggregates will be computed for the whole content. For example, `g.count()` immediately runs the following query and returns the scalar value\n```sql\nSELECT count(*) FROM (\n    SELECT _t1.year, _t1.actor1name\n    FROM (SELECT * FROM events _t0) _t1\n    GROUP BY _t1.year, _t1.actor1name\n    ) _t2\n```\n\nA `df.count()` (i.e. before the grouping) for the above piece of code will return the single scalar value with the number of records in `df` (22225).\nThe query executed for this is:\n\n```sql\nSELECT count(*)\nFROM events\n```\n\nGrizzly supports predefined aggregations, defined in the `AggregateType` enum: `MIN`, `MAX`, `MEAN`, `SUM`, `COUNT`. \nOther functions can be applied by passing the name of the functions as a string instead of the `ENUM` value.\n\n### User Defined Functions \u0026 Computed Columns\nGrizzly allows to apply almost any function defined in Python on your data. Currently, we support scala functions only.\n\n```Python\ndef myfunc(a: int) -\u003e str:\n      return a+\"_grizzly\"\n    \ndf = grizzly.read_table(\"events\")  # load table\ndf = df[df.globaleventid == 467268277] # filter it\n```\n\nApply function with Python code on dbms (supported by PostgreSQL, Actian Vector and MonetDB)\n```Python\ndf[\"newid\"] = df[\"globaleventid\"].map(myfunc) # apply myfunc\n```\n\nApply translated function with procedural SQL code (Oracle and PostgreSQL supported)\n```Python\ndf[\"newid\"] = df[\"globaleventid\"].map(myfunc, lang='sql', fallback=True) # apply myfunc\n```\n\nThe `lang` parameter defines whether the function is executed with Python code or the code is translated with the integrated `udfcompiler` module to a procedural language. The `fallback` parameter allows to apply the function with Python code or locally to a `Pandas DataFrame` if compilation errors occur.\n\nIn the example above, the function `myfunc` is applied to all entries in the `globaleventid` column and the result is stored in a new column `newid`. \n\nThis way new columns can be added to the result. The value of a computed column can be any expression.\n\n```Python\ndf[\"newcol\"] = df.theyear + df.monthyear\n```\n\n### Apply Machine Learning Models\nUsing the UDF mechanism described above, we enable users to easily apply their pre-trained models to their data inside the DB. \n\nFor ONNX models, users only need to specify the path to the model file (must be availble for the database engine) as well as two conversion functions: \n  - first functions converts the tuple into the format expected by the model\n  - the second function converts the output of the model into a format the DB (and user) can handle. \n\nThe [ONNX model zoo](https://github.com/onnx/models) provides a rich set of models with the according conversion functions.\n\n```Python\ndef input_to_model(a: str):\n        ...\n\ndef model_to_output(a) -\u003e str:\n        ...\n\ndf = grizzly.read_table('tab') # load table\n# apply model to every value in column 'col'\n# using provided input and output conversion functions\n# store model output in computed column 'classification'\ndf['classification'] = df['col'].apply_model(\"/path/to/model\", input_to_model, model_to_output)\n# group by e.g. predicted classes\ndf = df.groupby(['classification']).count()\ndf.show()\n```\n\n### SQL\n\nYou can inspect the produced query string (in this case SQL) with `generateQuery()`:\n\n```Python\nprint(df.generateQuery())\n```\n\n\n## Supported operations\n\n- filter/selection\n- projection\n- join\n- group by\n- aggregation functions: min, max, mean (avg), count, sum\n- user defined functions\n- apply TensorFlow, PyTorch, ONNX models\n\n## Limitations\n\n - Our DataFrame implementation is not yet fully compatibile with Pandas, but we are working on it.\n - Grizzly is under active development and things might change.\n - There are certainly some bugs. Probably with complex queries.\n\n\n# Vision\n\nGrizzly is a research project. We aim at bringing data-intensive operations back into the database system. Our plan is to extend Grizzly in the following ways - some of them are inspired by our other projects:\n\n  - Support for heterogeneous data sources:\n    - Combine data from different sources (relational DB, file, HDFS, NoSQL) in one program/query (i.e. Polystores, federated query processing)\n    - automatically import external data when neccessary\n  - Add spatial operations\n  - Stream processing operations\n  - Code generation\n    - Procude native code from the Python API\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbis-ilm%2Fgrizzly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbis-ilm%2Fgrizzly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbis-ilm%2Fgrizzly/lists"}