https://github.com/climateimpactlab/impactlab_api_mockup_2

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/climateimpactlab/impactlab_api_mockup_2
Owner: ClimateImpactLab
Created: 2017-03-08T23:24:15.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-03-09T01:15:09.000Z (about 9 years ago)
Last Synced: 2025-09-10T03:51:39.462Z (9 months ago)
Language: Python
Size: 25.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.ipynb
Awesome Lists containing this project

README

          {

 "cells": [

  {

   "cell_type": "code",

   "execution_count": 1,

   "metadata": {

    "collapsed": false

   },

   "outputs": [],

   "source": [

    "from __future__ import absolute_import\n",

    "from impactlab import impactlab\n",

    "from impactlab.mockfs import DataAPI, Variable, SuperIndex\n",

    "\n",

    "import pandas as pd"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 2,

   "metadata": {

    "collapsed": true

   },

   "outputs": [],

   "source": [

    "# initialize the mockup of a DataFS api with variable support\n",

    "api = DataAPI()"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "## Specifying a simple pipeline job"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "The goal is to have the ability to write atomic pipeline components and then map them across our data sets."

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "#### Example: multiply `popop` by `tas`"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 3,

   "metadata": {

    "collapsed": true

   },

   "outputs": [],

   "source": [

    "def compute_mortality(popop, tas):\n",

    "    '''\n",

    "    Demonstrates a simple atomic computation\n",

    "    '''\n",

    "\n",

    "    return popop.value * tas.value\n",

    "\n"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "This can be parameterized using the `impactlab.uses` functions and run using `impactlab.updates`:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 4,

   "metadata": {

    "collapsed": true

   },

   "outputs": [],

   "source": [

    "@impactlab.uses(popop=api.get_variable('/GCP/socioeconomics/popop'), tas=api.get_variable('/GCP/climate/tas'))\n",

    "@impactlab.iters()\n",

    "@impactlab.updates(api.get_variable('/GCP/impacts/mortality'))\n",

    "def mortality(popop, tas):\n",

    "    '''\n",

    "    Demonstrates a simple computation job\n",

    "\n",

    "    The `impactlab.uses` decorator accepts keyword arguments of the form \n",

    "    {name: obj}, where name is the name of argument to pass and obj is a mockfs\n",

    "    Variable object.\n",

    "\n",

    "    The `impactlab.iters` decorator drives a for-loop over all combinations of\n",

    "    indices for the given variables.\n",

    "\n",

    "    The value returned by the decorated function is used to update the value of\n",

    "    the variable specified by `impactlab.updates` for the given indices.\n",

    "    '''\n",

    "\n",

    "    return compute_mortality(popop, tas)"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "To see this in action, simply call `mortality`:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 5,

   "metadata": {

    "collapsed": false

   },

   "outputs": [

    {

     "name": "stdout",

     "output_type": "stream",

     "text": [

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n",

      " bumped 0.0.1 --> 0.0.2\n"

     ]

    }

   ],

   "source": [

    "mortality()"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "To run a subset of the jobs, slice the variables in the `@impactlab.uses` calls:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 6,

   "metadata": {

    "collapsed": true

   },

   "outputs": [],

   "source": [

    "@impactlab.uses(popop=api.get_variable('/GCP/socioeconomics/popop'))\n",

    "@impactlab.uses(tas=api.get_variable('/GCP/climate/tas')[{'rcp': 'rcp85'}])\n",

    "@impactlab.iters()\n",

    "@impactlab.updates(api.get_variable('/GCP/impacts/mortality'))\n",

    "def mortality_rcp85(popop, tas):\n",

    "    '''\n",

    "    `impactlab_uses` may be supplied as many times as desired (but must be above\n",

    "    impactlab.updates or other functional decorators).\n",

    "    \n",

    "    Slicing variables is done with a dictionary specifying the index to be sliced\n",

    "    '''\n",

    "\n",

    "    return compute_mortality(popop, tas)"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "Note that this function only iterates over `rcp85`:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 7,

   "metadata": {

    "collapsed": false

   },

   "outputs": [

    {

     "name": "stdout",

     "output_type": "stream",

     "text": [

      " bumped 0.0.2 --> 0.0.3\n",

      " bumped 0.0.2 --> 0.0.3\n",

      " bumped 0.0.2 --> 0.0.3\n",

      " bumped 0.0.2 --> 0.0.3\n",

      " bumped 0.0.2 --> 0.0.3\n"

     ]

    }

   ],

   "source": [

    "mortality_rcp85()"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "## Running a complex ETL Job"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "Let's say you want to perform a job that loads one class of variables into memory, then iterates over another set while keeping the first set alive."

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "This can be done using a two-stage job using the `iters` decorator:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 8,

   "metadata": {

    "collapsed": true

   },

   "outputs": [],

   "source": [

    "@impactlab.uses(tas=api.get_variable('/GCP/climate/tas'))\n",

    "@impactlab.iters()\n",

    "def tas2_ir(tas):\n",

    "    '''\n",

    "    Demonstrates a two-stage ETL process\n",

    "    '''\n",

    "\n",

    "    with tas.open('r') as f:\n",

    "        tas_data = pd.read_csv(f)\n",

    "\n",

    "    @impactlab.uses(tas=tas)\n",

    "    @impactlab.uses(popop=api.get_variable('/GCP/socioeconomics/popop'))\n",

    "    @impactlab.iters()\n",

    "    @impactlab.updates(api.get_variable('/GCP/climate/tas2_ir'))\n",

    "    def inner(popop, tas):\n",

    "        '''\n",

    "        The inner loop's uses() decorator is given tas as an argument. The\n",

    "        `iters` decorator sees that tas is an archive rather than a variable\n",

    "        and simply passes the value through rather than attempting to loop over\n",

    "        it.\n",

    "        '''\n",

    "\n",

    "        with popop.open('r') as f:\n",

    "            popop_data = pd.read_csv(f)\n",

    "\n",

    "        return (tas_data**2) * popop_data\n",

    "\n",

    "    inner()"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "When this is run, note how the climate data is only loaded once per outer loop:"

   ]

  },

  {

   "cell_type": "code",

   "execution_count": 9,

   "metadata": {

    "collapsed": false

   },

   "outputs": [

    {

     "name": "stdout",

     "output_type": "stream",

     "text": [

      "loading \n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n",

      "loading \n",

      " bumped 0.0.1 --> 0.0.2\n"

     ]

    }

   ],

   "source": [

    "tas2_ir()"

   ]

  },

  {

   "cell_type": "markdown",

   "metadata": {},

   "source": [

    "nifty!"

   ]

  }

 ],

 "metadata": {

  "anaconda-cloud": {},

  "kernelspec": {

   "display_name": "Python [conda env:datafs]",

   "language": "python",

   "name": "conda-env-datafs-py"

  },

  "language_info": {

   "codemirror_mode": {

    "name": "ipython",

    "version": 2

   },

   "file_extension": ".py",

   "mimetype": "text/x-python",

   "name": "python",

   "nbconvert_exporter": "python",

   "pygments_lexer": "ipython2",

   "version": "2.7.12"

  }

 },

 "nbformat": 4,

 "nbformat_minor": 1

}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/climateimpactlab/impactlab_api_mockup_2

Awesome Lists containing this project

README