Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nmattia/cio

cio: cached HTTP requests for a smooth Jupyter experience!
https://github.com/nmattia/cio

haskell http jupyter leveldb

Last synced: 20 days ago
JSON representation

cio: cached HTTP requests for a smooth Jupyter experience!

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# cio: cached HTTP requests for a smooth Jupyter experience!\n",
"\n",
"This library provides a thin wrapper around the [wreq](http://serpentine.com/wreq) library (a simple HTTP client library). It is meant to be used with [Jupyter](http://jupyter.org/): all requests will be stored _on disk_ and served from the cache subsequently, even if your kernel gets restarted. The cache lookups are near-instantaneous thanks to the amazing [LevelDB](http://leveldb.org/) library. You can use `cio` just like you would `wreq` -- instead of importing `Network.Wreq`, import `CIO` (which stands for Cached IO):"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"{-# LANGUAGE OverloadedStrings #-}\n",
"\n",
"import CIO\n",
"import Data.Aeson.Lens\n",
"import Control.Lens"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then use the functions you are used to, like `get`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Nicolas Mattia\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"get \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building cio\n",
"\n",
"The simplest way to build this library is to use Nix. To get started clone the cio repository ([nmattia/cio](https://github.com/nmattia/cio)), then run the following:\n",
"\n",
"``` shell\n",
"$ nix-shell\n",
"helpers:\n",
"> cio_build\n",
"> cio_ghci\n",
"> cio_notebook\n",
"> cio_readme_gen\n",
"```\n",
"\n",
"The helper functions will respectively build `cio`, start a `ghci` session for `cio`, start a Jupyter notebook with `cio` loaded and regenerate the README (this file is a Jupyter notebook!)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using cio\n",
"\n",
"Three functions are provided on top of `wreq`:\n",
"* `get :: String -> CIO Response` performs a (cached) request to the given URL.\n",
"* `getWith :: Options -> String -> CIO Response` performs a (cached) request to the given URL using the provided `wreq` [`Options`](http://hackage.haskell.org/package/wreq-0.5.2.1/docs/Network-Wreq.html#t:Options).\n",
"* `getAllWith :: Options -> String -> Producer CIO Response` performs several (cached) requests by lazily following the `Link` headers (see for instance [GitHub's pagination mechanism](https://developer.github.com/v3/guides/traversing-with-pagination/)).\n",
"\n",
"Let's see what happens when a request is performed twice. First let's write a function for timing the requests:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import Control.Monad.IO.Class\n",
"import Data.Time\n",
"\n",
"timeIt :: CIO a -> CIO (NominalDiffTime, a)\n",
"timeIt act = do\n",
" start <- liftIO $ getCurrentTime\n",
" res <- act\n",
" stop <- liftIO $ getCurrentTime\n",
" pure (diffUTCTime stop start, res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we'll generate a unique string which we'll use as a dummy parameter in order to force `cio` to perform the request the first time, so that we can time it:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import Data.UUID (toText)\n",
"import System.Random (randomIO)\n",
"\n",
"uuid <- toText <$> randomIO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally we use `getWith` and set the `dummy` query parameter to the `UUID` we just generated and time the request:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1.214306799s,\"Nicolas Mattia\")"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"timeIt $ getWith (param \"dummy\" .~ [uuid] $ defaults) \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's a pretty long time! When playing around with data in a Jupyter notebook waiting around for requests to complete is a real productivity and creativity killer. Let's see what `cio` can do for us:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.000248564s,\"Nicolas Mattia\")"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"timeIt $ getWith (param \"dummy\" .~ [uuid] $ defaults) \"https://api.github.com/users/nmattia\" <&>\n",
" (^.responseBody.key \"name\"._String)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pretty nice! You might have noticed that the `CIO` results were printed out, as `Show a => IO a` would be in GHCi. As mentioned before, `cio` is optimized for Jupyter workflows, and as such all `Show`-able results will be printed directly to the notebook's output. Lists of `Show`-ables will be pretty printed, which we'll demonstrate by playing with `cio`'s other cool feature: lazily following page links."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import Data.Conduit\n",
"import Data.Conduit.Combinators as C"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to lazily fetch data `cio` uses the [`conduit` library](http://hackage.haskell.org/package/conduit). The `getAllWith` function is a `Producer` of `Response`s (sorry, a `ConduitT i Response CIO ()`) which are served from the cache when possible. Here we ask GitHub to give us only two results per page, and `cio` will iterate the pages until the five expected items have been fetched (if you do the math that's about 3 pages):"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"jgm/pandoc\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"koalaman/shellcheck\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"PostgREST/postgrest\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"purescript/purescript\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"elm/compiler\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sourceToList $ \n",
" getAllWith \n",
" (defaults \n",
" & param \"q\" .~ [\"language:haskell\"] \n",
" & param \"sort\" .~ [\"stars\"]\n",
" & param \"per_page\" .~ [\"2\"])\n",
" \"https://api.github.com/search/repositories\"\n",
" .| awaitForever (C.yieldMany . (\n",
" ^..responseBody\n",
" .key \"items\"\n",
" .values\n",
" .key \"full_name\"\n",
" ._String))\n",
" .| C.take 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What if something goes wrong?\n",
"\n",
"What's the second hardest thing in computer science, besides naming and off-by-one errors? Cache invalidation, of course. For the cache's sake, all your requests should be idempotent, but unfortunately that's not always possible. Here `cio` doesn't assume anything but lets you deal with dirtying yourself by using either of these two functions:\n",
"\n",
"* `dirtyReq :: String -> CIO ()`, like `get` but instead of fetching the response dirties the entry in the cache.\n",
"* `dirtyReqWith :: Options -> String -> CIO ()`, like `getWith` but instead of fetching the response dirties the entry in the cache.\n",
"\n",
"If things went _really_ wrong, you can always wipe the cache entirely..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ... but where's the cache?\n",
"\n",
"The cache is set globally (reminder: this is a Jupyter-optimized workflow):"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"requests.cache\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"getCacheFile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you need a different cache file you can either change the global cache file:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"/* Styles used for the Hoogle display in the pager */\n",
".hoogle-doc {\n",
"display: block;\n",
"padding-bottom: 1.3em;\n",
"padding-left: 0.4em;\n",
"}\n",
".hoogle-code {\n",
"display: block;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"}\n",
".hoogle-text {\n",
"display: block;\n",
"}\n",
".hoogle-name {\n",
"color: green;\n",
"font-weight: bold;\n",
"}\n",
".hoogle-head {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-sub {\n",
"display: block;\n",
"margin-left: 0.4em;\n",
"}\n",
".hoogle-package {\n",
"font-weight: bold;\n",
"font-style: italic;\n",
"}\n",
".hoogle-module {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-class {\n",
"font-weight: bold;\n",
"}\n",
".get-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"display: block;\n",
"white-space: pre-wrap;\n",
"}\n",
".show-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"margin-left: 1em;\n",
"}\n",
".mono {\n",
"font-family: monospace;\n",
"display: block;\n",
"}\n",
".err-msg {\n",
"color: red;\n",
"font-style: italic;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"display: block;\n",
"}\n",
"#unshowable {\n",
"color: red;\n",
"font-weight: bold;\n",
"}\n",
".err-msg.in.collapse {\n",
"padding-top: 0.7em;\n",
"}\n",
".highlight-code {\n",
"white-space: pre;\n",
"font-family: monospace;\n",
"}\n",
".suggestion-warning { \n",
"font-weight: bold;\n",
"color: rgb(200, 130, 0);\n",
"}\n",
".suggestion-error { \n",
"font-weight: bold;\n",
"color: red;\n",
"}\n",
".suggestion-name {\n",
"font-weight: bold;\n",
"}\n",
"setCacheFile :: FilePath -> IO ()"
],
"text/plain": [
"setCacheFile :: FilePath -> IO ()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
":t setCacheFile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"or run your `CIO` code manually:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"/* Styles used for the Hoogle display in the pager */\n",
".hoogle-doc {\n",
"display: block;\n",
"padding-bottom: 1.3em;\n",
"padding-left: 0.4em;\n",
"}\n",
".hoogle-code {\n",
"display: block;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"}\n",
".hoogle-text {\n",
"display: block;\n",
"}\n",
".hoogle-name {\n",
"color: green;\n",
"font-weight: bold;\n",
"}\n",
".hoogle-head {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-sub {\n",
"display: block;\n",
"margin-left: 0.4em;\n",
"}\n",
".hoogle-package {\n",
"font-weight: bold;\n",
"font-style: italic;\n",
"}\n",
".hoogle-module {\n",
"font-weight: bold;\n",
"}\n",
".hoogle-class {\n",
"font-weight: bold;\n",
"}\n",
".get-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"display: block;\n",
"white-space: pre-wrap;\n",
"}\n",
".show-type {\n",
"color: green;\n",
"font-weight: bold;\n",
"font-family: monospace;\n",
"margin-left: 1em;\n",
"}\n",
".mono {\n",
"font-family: monospace;\n",
"display: block;\n",
"}\n",
".err-msg {\n",
"color: red;\n",
"font-style: italic;\n",
"font-family: monospace;\n",
"white-space: pre;\n",
"display: block;\n",
"}\n",
"#unshowable {\n",
"color: red;\n",
"font-weight: bold;\n",
"}\n",
".err-msg.in.collapse {\n",
"padding-top: 0.7em;\n",
"}\n",
".highlight-code {\n",
"white-space: pre;\n",
"font-family: monospace;\n",
"}\n",
".suggestion-warning { \n",
"font-weight: bold;\n",
"color: rgb(200, 130, 0);\n",
"}\n",
".suggestion-error { \n",
"font-weight: bold;\n",
"color: red;\n",
"}\n",
".suggestion-name {\n",
"font-weight: bold;\n",
"}\n",
"runCIOWith :: forall a. FilePath -> CIO a -> IO a"
],
"text/plain": [
"runCIOWith :: forall a. FilePath -> CIO a -> IO a"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
":t runCIOWith"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## one more thing...\n",
"\n",
".. nope, that's all! Enjoy!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Haskell",
"language": "haskell",
"name": "haskell"
},
"language_info": {
"codemirror_mode": "ihaskell",
"file_extension": ".hs",
"name": "haskell",
"version": "8.2.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}