{"id":23262927,"url":"https://github.com/gagandeepb/frames-beam","last_synced_at":"2025-08-20T18:34:47.272Z","repository":{"id":59150715,"uuid":"139995161","full_name":"gagandeepb/Frames-beam","owner":"gagandeepb","description":"Accessing Postgres in a data frame in Haskell","archived":false,"fork":false,"pushed_at":"2023-12-12T22:15:35.000Z","size":47,"stargazers_count":24,"open_issues_count":3,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-17T19:38:53.894Z","etag":null,"topics":["data-science","database","postgres"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gagandeepb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-07-06T14:29:53.000Z","updated_at":"2023-12-12T22:15:40.000Z","dependencies_parsed_at":"2023-12-12T23:22:47.264Z","dependency_job_id":"51afbfe1-ccac-4f6b-8269-08561e7c087a","html_url":"https://github.com/gagandeepb/Frames-beam","commit_stats":{"total_commits":32,"total_committers":1,"mean_commits":32.0,"dds":0.0,"last_synced_commit":"f12d5ead298226f2bfb6574f87622c2b784fa4ae"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagandeepb%2FFrames-beam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagandeepb%2FFrames-beam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagandeepb%2FFrames-beam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagandeepb%2FFrames-beam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gagandeepb","download_url":"https://codeload.github.com/gagandeepb/Frames-beam/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230445924,"owners_count":18227060,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","database","postgres"],"created_at":"2024-12-19T14:13:17.798Z","updated_at":"2024-12-19T14:13:18.788Z","avatar_url":"https://github.com/gagandeepb.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Frames-beam\n\n[![Build Status](https://travis-ci.org/gagandeepb/Frames-beam.png)](https://travis-ci.org/gagandeepb/Frames-beam)\n\n## Accessing Postgres in a data frame in Haskell\n\nA library for accessing Postgres tables as in-memory data structures.\n\nThis library provides helpers for generating  types (at compile time) corresponding to a database schema  and 'canned queries' to execute against a database instance. Additionally, it provides utilities to convert plain Haskell records (i.e. the format of query results) to `vinyl` records (upon which the Frames library is based). Can be used for interactive exploration by loading all data in-memory at once (and converting to a data frame), and also in a constant memory streaming mode. \n\n## Usage Example \nIn this example we assume there is a local Postgres instance with schema and rows given by the small DB-dump present in `data/users.sql`.\n\n### A. Interactive Workflow Steps\n1. **Bootstrap database schema:** In a new project, assume a file `Example.hs` is present in the `src` directory with the code below. You may of course change the string passed to `genBeamSchema` to match your database instance of interest.\n```haskell\n-- Example.hs \n{-# LANGUAGE DataKinds              #-}\n{-# LANGUAGE FlexibleContexts       #-}\n{-# LANGUAGE FlexibleInstances      #-}\n{-# LANGUAGE FunctionalDependencies #-}\n{-# LANGUAGE MultiParamTypeClasses  #-}\n{-# LANGUAGE OverloadedStrings      #-}\n{-# LANGUAGE TemplateHaskell        #-}\n{-# LANGUAGE TypeApplications       #-}\n{-# LANGUAGE TypeFamilies           #-}\n{-# LANGUAGE TypeFamilyDependencies #-}\n{-# LANGUAGE TypeOperators          #-}\n{-# LANGUAGE UndecidableInstances   #-}\nmodule Example where\n\nimport qualified Data.Conduit.List        as CL\nimport qualified Data.Vinyl.Functor       as VF\nimport qualified Frames                   as F\nimport           Frames.SQL.Beam.Postgres\n\n\n\n$(genBeamSchema \"host=localhost dbname=shoppingcart1\")\n```\n\n2. Next, execute `stack build` or `stack ghci`. This compilation step, if completed without any errors, will establish a connection to your database instance of interest, read its schema, generate corresponding Haskell types and put them in a module named `NewBeamSchema` in your `src` directory (the file creation step is also part of the compilation process).\n\n3. Assuming step 2 worked fine for you and you were using the test DB-dump from the `data` folder you should now have a module with code matching that in the `test/NewBeamSchema.hs` file of this repository. In case you used some other database instance of your own, your generated module would look different.\nImport this module into `Example`:\n\n```haskell\n-- Example.hs\n-- Extensions elided\nmodule Example where\n\nimport qualified Data.Conduit.List        as CL\nimport qualified Data.Vinyl.Functor       as VF\nimport qualified Frames                   as F\nimport           Frames.SQL.Beam.Postgres\n\nimport NewBeamSchema\n\n\n$(genBeamSchema \"host=localhost dbname=shoppingcart1\")\n```\n\n4. Let's assume the table of interest is `Cart_usersT`. We want to pull rows from this table into a data frame to explore it interactively from `ghci`. Note that `beam` query results are lists of plain Haskell records whereas `Frames` requires a list of `vinyl` records. In order to make this conversion, we add the following two invokations of code-generating (Template-Haskell) functions to `Example`:\n\n```haskell\n-- Example.hs\n-- rest of the module elided\n\nimport NewBeamSchema\n\n\n$(genBeamSchema \"host=localhost dbname=shoppingcart1\")\n\nderiveGeneric ''Cart_usersT\nderiveVinyl ''Cart_usersT\n```\n...and build your project. This will add some additional code into the `Example` module. You can inspect this code by adding the appropriate compiler flags to your `.cabal` file.\n\n5. **Querying the DB:**\nIn this step we will execute a `SELECT * FROM tbl WHERE...` query and convert the results to a data frame. Note that the table declaration (`_cart_users`) and the database declaration (`db`) are exported by the `NewBeamSchema` module. More importantly, these declarations are autogenerated at compile time, so in case new tables are added, the corresponding declarations are automatically available for use.\n\n```haskell\n-- Example.hs\nconnString :: ByteString\nconnString = \"host=localhost dbname=shoppingcart1\"\n\n-- selects 'n' rows from the specified table in the db.\nloadRows1 :: Int -\u003e IO [(Cart_usersT Identity)]\nloadRows1 n =\n  withConnection connString $\n    bulkSelectAllRows _cart_users db n\n\nloadRows2 :: Int -\u003e IO [(Cart_usersT Identity)]\nloadRows2 n =\n  withConnection connString $\n    bulkSelectAllRowsWhere _cart_users db n (\\c -\u003e (_cart_usersFirst_name c) `like_` \"J%\")\n```\nNotice the lambda passed to `bulkSelectAllRowsWhere` in `loadRows2`. This is a 'filter lambda' that forms the `WHERE ...` part of the SQL query and is executed at the DB-level. We will see how to create our own 'filter lambdas' in another section below. For now, if we were to enter `ghci` by executing `stack ghci` after adding the above code:\n```ghci\nghci\u003eres1 \u003c- loadRows1 5\nghci\u003e:t res1\nres1 :: [Cart_usersT Identity]\nghci\u003e:t (map createRecId res1)\n(map createRecId res1)\n  :: [F.Rec\n        VF.Identity\n        '[\"_cart_usersEmail\" F.:-\u003e Text,\n          \"_cart_usersFirst_name\" F.:-\u003e Text,\n          \"_cart_usersLast_name\" F.:-\u003e Text,\n          \"_cart_usersIs_member\" F.:-\u003e Bool,\n          \"_cart_usersDays_in_queue\" F.:-\u003e Int]]\nghci\u003e:t (F.toFrame $ map createRecId res1)\n(F.toFrame $ map createRecId res1)\n  :: F.Frame\n       (F.Record\n          '[\"_cart_usersEmail\" F.:-\u003e Text,\n            \"_cart_usersFirst_name\" F.:-\u003e Text,\n            \"_cart_usersLast_name\" F.:-\u003e Text,\n            \"_cart_usersIs_member\" F.:-\u003e Bool,\n            \"_cart_usersDays_in_queue\" F.:-\u003e Int])\nghci\u003emyFrame = F.toFrame $ map createRecId res1\nghci\u003e:set -XTypeApplications\nghci\u003e:set -XTypeOperators\nghci\u003e:set -XDataKinds\nghci\u003eminiFrame = fmap (F.rcast @'[\"_cart_usersEmail\" F.:-\u003e Text, \"_cart_usersDays_in_queue\" F.:-\u003e Int]) myFrame\nghci\u003emapM_ print miniFrame\n{_cart_usersEmail :-\u003e \"james@example.com\", _cart_usersDays_in_queue :-\u003e 1}\n{_cart_usersEmail :-\u003e \"betty@example.com\", _cart_usersDays_in_queue :-\u003e 42}\n{_cart_usersEmail :-\u003e \"james@pallo.com\", _cart_usersDays_in_queue :-\u003e 1}\n{_cart_usersEmail :-\u003e \"betty@sims.com\", _cart_usersDays_in_queue :-\u003e 42}\n{_cart_usersEmail :-\u003e \"james@oreily.com\", _cart_usersDays_in_queue :-\u003e 1}\n```\nWe could have used `loadRows2` in place of `loadRows1` in order to have the `WHERE ...` clause executed at the DB-level.\nNote that in the above, once the query results are converted to a data frame, you're free to play with the frame in anyway, just like you would for a data frame created from a CSV.\n\n### B. Streaming Workflow Steps\n\nOnce you're done working with a small subset of data, and would like to scale up your analysis by looking at a larger-subset-of/complete data, then it's time to look at writing your own `conduit` to process incoming rows from the DB.\n\n1 - 4: Same as 'Interactive Workflow Steps'\n\n5. **Writing your own streaming pipeline:**\n\nConsider the following:\n```haskell\nstreamRows :: IO ()\nstreamRows = do\n  res \u003c-  withConnection connString $\n            streamingSelectAllPipeline' _cart_users db 1000 (\\c -\u003e (_cart_usersFirst_name c) `like_` \"J%\") $\n              (CL.map (\\record -\u003e F.rcast @[\"_cart_usersEmail\" F.:-\u003e Text, \"_cart_usersIs_member\" F.:-\u003e Bool] record))\n  mapM_ print res\n```\nIn the above, we select all rows from the specified table that match a certain pattern (`\"J%\"`), then the function `streamingSelectAllPipeline'` converts the query results to vinyl records inside a `conduit` and sends it downstream, where we can operate on its output. Here, specifically, we do a column subset of the output using `rcast`, and `CL.map` applies `rcast` to every incoming row and sends it downstream, where the result gets returned. We then print the list of `vinyl` records.\n\nIn order to write your own conduit, all you need to know is that internally the conduit flow is as follows:\n\n```haskell\n(\\c -\u003e runConduit $ c .| CL.map createRecId\n                      .| recordProcessorConduit\n                      .| CL.take nrows)\n```\nIn the above, you supply the `recordProcessorConduit` to the `streamingSelectAllPipeline'` function which takes a `vinyl` record as input and sends it downstream to the `CL.take`. Note that in all functions in the `Frames.SQL.Beam.Postgres.Streaming` module, you need to specify the number of rows you want to return (this is an upper bound of sorts, the actual number of rows returned depends on the amount of data present in your database).\n\n## A Note on 'Canned Queries' and 'Filter Lambdas'\n\nThere are three things needed to execute a canned query (`SELECT * FROM tbl WHERE ...`):\n* `PostgresTable a b`: auto generated by BeamSchemaGen module\n* `PostgresDB b`: auto generated by BeamSchemaGen module\n* `PostgresFilterLambda a s`: The `WHERE...` clause. All filter lambdas are of the form:\n```haskell\n(\\tbl -\u003e (_fieldName tbl) `op` constant)\n```\nor\n```haskell\n(\\tbl -\u003e (_fieldName1 tbl) `op` (_fieldName2 tbl))\n```\nIn the above `op` can be one of : [`==.`, `/=.`, `\u003e.`, `\u003c.`, `\u003c=.`, `\u003e=.`, `between_`, `like_`, `in_` ] (some of these are not be applicable to the second case). You may use `(\u0026\u0026.)` and `(||.)` to combine expressions inside the lambda. To see some actual examples of 'filter lambdas', check out `test/LibSpec.hs` in this repository.\n\n## Background Reading:\n\n* About `deriveGeneric` and `deriveVinyl`: [Deriving Vinyl Representation From Plain Haskell Records][generic-vinyl]\n* [Frames tutorial][frames-tutorial]\n* [Beam tutorial and user-guide][beam-indepth]\n\n\n[generic-vinyl]: https://www.gagandeepbhatia.com/blog/deriving-vinyl-representation-from-plain-haskell-records/\n[frames-tutorial]: http://acowley.github.io/Frames/\n[beam-indepth]: https://tathougies.github.io/beam/","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagandeepb%2Fframes-beam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgagandeepb%2Fframes-beam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagandeepb%2Fframes-beam/lists"}