Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nmattia/cio
cio: cached HTTP requests for a smooth Jupyter experience!
https://github.com/nmattia/cio
haskell http jupyter leveldb
Last synced: 20 days ago
JSON representation
cio: cached HTTP requests for a smooth Jupyter experience!
- Host: GitHub
- URL: https://github.com/nmattia/cio
- Owner: nmattia
- Created: 2018-08-19T17:16:21.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-08-20T21:59:56.000Z (over 6 years ago)
- Last Synced: 2024-11-14T00:03:49.852Z (2 months ago)
- Topics: haskell, http, jupyter, leveldb
- Language: Haskell
- Homepage:
- Size: 13.7 KB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.ipynb
Awesome Lists containing this project
README
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# cio: cached HTTP requests for a smooth Jupyter experience!\n",
"\n",
"This library provides a thin wrapper around the [wreq](http://serpentine.com/wreq) library (a simple HTTP client library). It is meant to be used with [Jupyter](http://jupyter.org/): all requests will be stored _on disk_ and served from the cache subsequently, even if your kernel gets restarted. The cache lookups are near-instantaneous thanks to the amazing [LevelDB](http://leveldb.org/) library. You can use `cio` just like you would `wreq` -- instead of importing `Network.Wreq`, import `CIO` (which stands for Cached IO):"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"{-# LANGUAGE OverloadedStrings #-}\n",
"\n",
"import CIO\n",
"import Data.Aeson.Lens\n",
"import Control.Lens"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then use the functions you are used to, like `get`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Nicolas Mattia\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"get \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building cio\n",
"\n",
"The simplest way to build this library is to use Nix. To get started clone the cio repository ([nmattia/cio](https://github.com/nmattia/cio)), then run the following:\n",
"\n",
"``` shell\n",
"$ nix-shell\n",
"helpers:\n",
"> cio_build\n",
"> cio_ghci\n",
"> cio_notebook\n",
"> cio_readme_gen\n",
"```\n",
"\n",
"The helper functions will respectively build `cio`, start a `ghci` session for `cio`, start a Jupyter notebook with `cio` loaded and regenerate the README (this file is a Jupyter notebook!)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using cio\n",
"\n",
"Three functions are provided on top of `wreq`:\n",
"* `get :: String -> CIO Response` performs a (cached) request to the given URL.\n",
"* `getWith :: Options -> String -> CIO Response` performs a (cached) request to the given URL using the provided `wreq` [`Options`](http://hackage.haskell.org/package/wreq-0.5.2.1/docs/Network-Wreq.html#t:Options).\n",
"* `getAllWith :: Options -> String -> Producer CIO Response` performs several (cached) requests by lazily following the `Link` headers (see for instance [GitHub's pagination mechanism](https://developer.github.com/v3/guides/traversing-with-pagination/)).\n",
"\n",
"Let's see what happens when a request is performed twice. First let's write a function for timing the requests:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import Control.Monad.IO.Class\n",
"import Data.Time\n",
"\n",
"timeIt :: CIO a -> CIO (NominalDiffTime, a)\n",
"timeIt act = do\n",
" start <- liftIO $ getCurrentTime\n",
" res <- act\n",
" stop <- liftIO $ getCurrentTime\n",
" pure (diffUTCTime stop start, res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we'll generate a unique string which we'll use as a dummy parameter in order to force `cio` to perform the request the first time, so that we can time it:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import Data.UUID (toText)\n",
"import System.Random (randomIO)\n",
"\n",
"uuid <- toText <$> randomIO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally we use `getWith` and set the `dummy` query parameter to the `UUID` we just generated and time the request:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1.214306799s,\"Nicolas Mattia\")"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"timeIt $ getWith (param \"dummy\" .~ [uuid] $ defaults) \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's a pretty long time! When playing around with data in a Jupyter notebook waiting around for requests to complete is a real productivity and creativity killer. Let's see what `cio` can do for us:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.000248564s,\"Nicolas Mattia\")"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"timeIt $ getWith (param \"dummy\" .~ [uuid] $ defaults) \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pretty nice! You might have noticed that the `CIO` results were printed out, as `Show a => IO a` would be in GHCi. As mentioned before, `cio` is optimized for Jupyter workflows, and as such all `Show`-able results will be printed directly to the notebook's output. Lists of `Show`-ables will be pretty printed, which we'll demonstrate by playing with `cio`'s other cool feature: lazily following page links."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import Data.Conduit\n",
"import Data.Conduit.Combinators as C"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to lazily fetch data `cio` uses the [`conduit` library](http://hackage.haskell.org/package/conduit). The `getAllWith` function is a `Producer` of `Response`s (sorry, a `ConduitT i Response CIO ()`) which are served from the cache when possible. Here we ask GitHub to give us only two results per page, and `cio` will iterate the pages until the five expected items have been fetched (if you do the math that's about 3 pages):"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"jgm/pandoc\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"koalaman/shellcheck\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"PostgREST/postgrest\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"purescript/purescript\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"elm/compiler\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sourceToList $ \n",
" getAllWith \n",
" (defaults \n",
" & param \"q\" .~ [\"language:haskell\"] \n",
" & param \"sort\" .~ [\"stars\"]\n",
" & param \"per_page\" .~ [\"2\"])\n",
" \"https://api.github.com/search/repositories\"\n",
" .| awaitForever (C.yieldMany . (\n",
" ^..responseBody\n",
" .key \"items\"\n",
" .values\n",
" .key \"full_name\"\n",
" ._String))\n",
" .| C.take 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What if something goes wrong?\n",
"\n",
"What's the second hardest thing in computer science, besides naming and off-by-one errors? Cache invalidation, of course. For the cache's sake, all your requests should be idempotent, but unfortunately that's not always possible. Here `cio` doesn't assume anything but lets you deal with dirtying yourself by using either of these two functions:\n",
"\n",
"* `dirtyReq :: String -> CIO ()`, like `get` but instead of fetching the response dirties the entry in the cache.\n",
"* `dirtyReqWith :: Options -> String -> CIO ()`, like `getWith` but instead of fetching the response dirties the entry in the cache.\n",
"\n",
"If things went _really_ wrong, you can always wipe the cache entirely..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ... but where's the cache?\n",
"\n",
"The cache is set globally (reminder: this is a Jupyter-optimized workflow):"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"requests.cache\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"getCacheFile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you need a different cache file you can either change the global cache file:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"/* Styles used for the Hoogle display in the pager */\n",
".hoogle-doc {\n",
"display: block;\n",
"padding-bottom: 1.3em;\n",
"padding-left: 0.4em;\n",
"}\n",
".hoogle-code {\n",
"display: block;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"}\n",
".hoogle-text {\n",
"display: block;\n",
"}\n",
".hoogle-name {\n",
"color: green;\n",
"font-weight: bold;\n",
"}\n",
".hoogle-head {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-sub {\n",
"display: block;\n",
"margin-left: 0.4em;\n",
"}\n",
".hoogle-package {\n",
"font-weight: bold;\n",
"font-style: italic;\n",
"}\n",
".hoogle-module {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-class {\n",
"font-weight: bold;\n",
"}\n",
".get-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"display: block;\n",
"white-space: pre-wrap;\n",
"}\n",
".show-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"margin-left: 1em;\n",
"}\n",
".mono {\n",
"font-family: monospace;\n",
"display: block;\n",
"}\n",
".err-msg {\n",
"color: red;\n",
"font-style: italic;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"display: block;\n",
"}\n",
"#unshowable {\n",
"color: red;\n",
"font-weight: bold;\n",
"}\n",
".err-msg.in.collapse {\n",
"padding-top: 0.7em;\n",
"}\n",
".highlight-code {\n",
"white-space: pre;\n",
"font-family: monospace;\n",
"}\n",
".suggestion-warning { \n",
"font-weight: bold;\n",
"color: rgb(200, 130, 0);\n",
"}\n",
".suggestion-error { \n",
"font-weight: bold;\n",
"color: red;\n",
"}\n",
".suggestion-name {\n",
"font-weight: bold;\n",
"}\n",
"setCacheFile :: FilePath -> IO ()"
],
"text/plain": [
"setCacheFile :: FilePath -> IO ()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
":t setCacheFile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"or run your `CIO` code manually:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"/* Styles used for the Hoogle display in the pager */\n",
".hoogle-doc {\n",
"display: block;\n",
"padding-bottom: 1.3em;\n",
"padding-left: 0.4em;\n",
"}\n",
".hoogle-code {\n",
"display: block;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"}\n",
".hoogle-text {\n",
"display: block;\n",
"}\n",
".hoogle-name {\n",
"color: green;\n",
"font-weight: bold;\n",
"}\n",
".hoogle-head {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-sub {\n",
"display: block;\n",
"margin-left: 0.4em;\n",
"}\n",
".hoogle-package {\n",
"font-weight: bold;\n",
"font-style: italic;\n",
"}\n",
".hoogle-module {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-class {\n",
"font-weight: bold;\n",
"}\n",
".get-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"display: block;\n",
"white-space: pre-wrap;\n",
"}\n",
".show-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"margin-left: 1em;\n",
"}\n",
".mono {\n",
"font-family: monospace;\n",
"display: block;\n",
"}\n",
".err-msg {\n",
"color: red;\n",
"font-style: italic;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"display: block;\n",
"}\n",
"#unshowable {\n",
"color: red;\n",
"font-weight: bold;\n",
"}\n",
".err-msg.in.collapse {\n",
"padding-top: 0.7em;\n",
"}\n",
".highlight-code {\n",
"white-space: pre;\n",
"font-family: monospace;\n",
"}\n",
".suggestion-warning { \n",
"font-weight: bold;\n",
"color: rgb(200, 130, 0);\n",
"}\n",
".suggestion-error { \n",
"font-weight: bold;\n",
"color: red;\n",
"}\n",
".suggestion-name {\n",
"font-weight: bold;\n",
"}\n",
"runCIOWith :: forall a. FilePath -> CIO a -> IO a"
],
"text/plain": [
"runCIOWith :: forall a. FilePath -> CIO a -> IO a"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
":t runCIOWith"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## one more thing...\n",
"\n",
".. nope, that's all! Enjoy!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Haskell",
"language": "haskell",
"name": "haskell"
},
"language_info": {
"codemirror_mode": "ihaskell",
"file_extension": ".hs",
"name": "haskell",
"version": "8.2.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}