{"id":17091280,"url":"https://github.com/kachayev/camille-sql","last_synced_at":"2025-04-12T22:34:53.328Z","repository":{"id":51485937,"uuid":"264406123","full_name":"kachayev/camille-sql","owner":"kachayev","description":"Run SQL over your Maven artifacts","archived":false,"fork":false,"pushed_at":"2024-10-03T19:27:09.000Z","size":92,"stargazers_count":11,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-26T16:39:00.557Z","etag":null,"topics":["calcite","netty","postgresql-protocol","sql"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kachayev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-16T09:50:16.000Z","updated_at":"2024-10-03T19:27:06.000Z","dependencies_parsed_at":"2023-01-18T12:46:01.380Z","dependency_job_id":null,"html_url":"https://github.com/kachayev/camille-sql","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fcamille-sql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fcamille-sql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fcamille-sql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fcamille-sql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kachayev","download_url":"https://codeload.github.com/kachayev/camille-sql/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248642308,"owners_count":21138350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["calcite","netty","postgresql-protocol","sql"],"created_at":"2024-10-14T13:58:08.503Z","updated_at":"2025-04-12T22:34:53.308Z","avatar_url":"https://github.com/kachayev.png","language":"Java","readme":"# camille-sql\n\nRun SQL over your Maven artifacts.\n\n[Slides](https://speakerdeck.com/kachayev/talking-sql-to-strangers) from the talk \"Talking SQL to Strangers\".\n\n## What?\n\n`camille-sql` allows you to explore Maven artifacts you have on your local hard drive.\n\nRun the server:\n\n```shell\n$ bin/camille-server\nArtifacts repository path: /Users/\u003cuser\u003e/.m2/repository/\nRunning server on localhost:26727\n...\n```\n\nThe server understands PostreSQL wire protocol, so you can connect to it using standard `psql` client:\n\n```shell\n$ PGPASSWORD=nopass psql \"host=localhost port=26727 sslmode=disable\"\npsql (12.2, server 9.5.0)\nType \"help\" for help.\n\ncamille=\u003e\n```\n\nAs you can see, `psql` is absolutely sure it talks to PostgreSQL version 9.5.0.\n\nNow you have access to 2 tables: `artifacts` and `versions`. You can run any read-only SQL query: the server supports projections, filtering, grouping, joins, agg functions, sub-queries etc (pretty much all of SQL99).\n\nBasic queries:\n\n```sql\ncamille=\u003e select * from artifacts limit 6;\n    uid     |         group_id         |       artifact_id        |        name         |                   url\n------------+--------------------------+--------------------------+---------------------+------------------------------------------\n 3227713579 | alandipert               | desiderata               | desiderata          | https://github.com/alandipert/desiderata\n 3382955103 | aopalliance              | aopalliance              | AOP alliance        | http://aopalliance.sourceforge.net\n 1507835947 | asm                      | asm-parent               | ASM                 | http://asm.objectweb.org/\n 226341444  | backport-util-concurrent | backport-util-concurrent | Backport of JSR 166 | http://backport-jsr166.sourceforge.net/\n 1712481681 | biz.aQute                | bndlib                   | BND Library         | http://www.aQute.biz/Code/Bnd\n 2280883480 | biz.aQute.bnd            | biz.aQute.bndlib         | biz.aQute.bndlib    | https://bnd.bndtools.org/\n(6 rows)\n```\n\n```sql\ncamille=\u003e select * from versions where filesize \u003e 10000 limit 5;\n    uid     | version | filesize |      last_modified      |                   sha1\n------------+---------+----------+-------------------------+------------------------------------------\n 3345961009 | 1.3.2   | 337129   | 2019-07-04 23:36:26.464 | ff84d15cfeb0825935a170d7908fbfae00498050\n 1053708643 | 1.0.1   | 26514    | 2019-07-04 23:23:20.322 | 49c100caf72d658aca8e58bd74a4ba90fa2b0d70\n 2740841946 | 1.6.5   | 1034049  | 2019-07-05 05:37:10.953 | 7d18faf23df1a5c3a43613952e0e8a182664564b\n 925895164  | 0.4.4   | 42645    | 2020-02-01 06:45:59.599 | 2522f7f1b4bab169a2540406eb3eb71f7d6e3003\n 136773645  | 1.9     | 263965   | 2019-07-04 23:25:30.09  | 9ce04e34240f674bc72680f8b843b1457383161a\n(5 rows)\n```\n\nSomething more complicated:\n\n```sql\ncamille=\u003e\nSELECT group_id, COUNT(*) AS n_files\nFROM artifacts\nLEFT JOIN versions ON artifacts.uid=versions.uid\nGROUP BY group_id\nORDER BY n_files DESC\nLIMIT 10;\n         group_id         | n_files\n--------------------------+---------\n org.apache.flink         | 391\n org.apache.maven         | 245\n org.codehaus.plexus      | 186\n org.apache.hadoop        | 121\n org.apache.maven.doxia   | 108\n org.apache.maven.plugins | 82\n io.netty                 | 67\n org.apache.maven.shared  | 65\n org.apache.lucene        | 64\n org.apache.commons       | 62\n(10 rows)\n```\n\n## Why?\n\nThe project is mainly done out of pure curiosity:\n- figure out how does low-level PostgreSQL transport protocol (`pgwire`) look like\n- check on practice how simple or hard would it be to implement `pgwire` as a [Netty](https://netty.io/) codec\n- implement simple enough but not trivial example of defining relational algebra system using [Apache Calcite](https://calcite.apache.org/)\n- it's just fun and looks cool\n\n## Implementation Details\n\n- [Netty](https://netty.io/) to run async I/O server\n- Custom \"codec\" to encode/decode `pgwire` messages (see `pgwire` package). The tricky part of the codec is that very first message has a different structure compared to all following messages (from PostgreSQL documentation: because of purely historical reasons). Channel initializer creates pipeline with `PgwireStartupMessageDecoder` that will eventually remove itself after the first message is succesfully processed.\n- Server handler cycles over incomming SQL queries, decoding queries from bytes protocol and serializing result set into a proper sequence of messages (row descriptor -\u003e row data -\u003e command complete).\n- \"Database\" that actually executes query is implemented in `m2sql` package. It exposes JDBC connection, so the server uses standard `java.sql` interface when talking to it (see documentation for [Apache Avatica](https://calcite.apache.org/avatica/) library).\n- [Apache Calcite](https://calcite.apache.org/) is used for query parsing, query planning, query optimizaiton. High-level API is used to declare catalog structure, tables, schemas, relations and scanning logic.\n\nMore details in the [deck](https://speakerdeck.com/kachayev/talking-sql-to-strangers).\n\n## Optimizations\n\n\"Precision is the difference between a butcher and a surgeon\" (tm)\n\nImplemented optimization:\n- prune unused fields: if \"filesize\" is not queried, we don't need to waste cpu/ram to calculate it\n- push-down filtering predicates: jump into a subfolder if prefix is known\n\nWork in progress:\n- optimized join of artifacts and versions: we can walk files tree once to retrieve all the information we need\n\n## Contributions\n\nThis is the project made for fun. Feel free to implement whatever feature you want and just drop a PR here ;) See TODO list below if you need ideas on what could be helpful (or what is critically missing).\n\n## TODO\n\n- [ ] Network encoding logic baked into DTO object is such a bad idea... Instead of `toByteBuf` method for each message type, the logic should be implemented in a single encoder with dynamic type-based dispatch\n- [x] Propage errors (like, wrong queries) to the client instead of re-openning the connection\n- [ ] Additional PostgreSQL client features, like `\\l`, `show databases`, `show tables` (need to register `pg_catalog` to make this happen)\n- [ ] `pgwire` protocol has way more message types that are currently implemented\n- [x] Reject non-read queries (`insert`, `update` etc)\n- [x] Push-down predicates for folder traversal (e.g. `group_id LIKE com.apache.%` predicate might be optimited by going directly  to `com/apache/` subfolder)\n- [x] Better CLI for the server (logs, args parser help etc)\n- [ ] SSL, password authentication\n- [ ] Carry cancel flag around\n- [ ] `DELETE` versions\n\n## License\n\nCopyright © 2020 `camille-sql`\n\n`camille-sql` is licensed under the MIT license, available at MIT and also in the LICENSE file.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkachayev%2Fcamille-sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkachayev%2Fcamille-sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkachayev%2Fcamille-sql/lists"}