{"id":19795586,"url":"https://github.com/dkogan/numpysane","last_synced_at":"2025-05-01T03:30:28.632Z","repository":{"id":57447334,"uuid":"58525493","full_name":"dkogan/numpysane","owner":"dkogan","description":"more-reasonable core functionality for numpy","archived":false,"fork":false,"pushed_at":"2023-12-23T23:50:11.000Z","size":644,"stargazers_count":28,"open_issues_count":1,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-04-25T12:21:57.062Z","etag":null,"topics":["broadcasting","linear-algebra","numpy","python-wrapper-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dkogan.png","metadata":{"files":{"readme":"README-pywrap.org","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-05-11T08:00:48.000Z","updated_at":"2024-02-01T15:24:05.000Z","dependencies_parsed_at":"2024-09-09T20:53:28.543Z","dependency_job_id":"f9d739dc-8650-4b33-9583-c98ea12096ae","html_url":"https://github.com/dkogan/numpysane","commit_stats":{"total_commits":423,"total_committers":2,"mean_commits":211.5,"dds":"0.0023640661938534313","last_synced_commit":"1a56727c2ceba7918c287b169c92fbf7ba954202"},"previous_names":[],"tags_count":41,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkogan%2Fnumpysane","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkogan%2Fnumpysane/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkogan%2Fnumpysane/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkogan%2Fnumpysane/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dkogan","download_url":"https://codeload.github.com/dkogan/numpysane/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251817809,"owners_count":21648811,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["broadcasting","linear-algebra","numpy","python-wrapper-api"],"created_at":"2024-11-12T07:16:46.481Z","updated_at":"2025-05-01T03:30:28.370Z","avatar_url":"https://github.com/dkogan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"* TALK\nI just gave a talk about this at [[https://www.socallinuxexpo.org/scale/18x][SCaLE 18x]]. Here are the [[https://www.youtube.com/watch?v=YOOapXNtUWw][video of the talk]] and\nthe [[https://github.com/dkogan/talk-numpysane-gnuplotlib/raw/master/numpysane-gnuplotlib.pdf][\"slides\"]].\n\n* NAME\nnumpysane_pywrap: Python-wrap C code with broadcasting awareness\n\n* SYNOPSIS\n\nLet's implement a broadcastable and type-checked inner product that is\n\n- Written in C (i.e. it is fast)\n- Callable from python using numpy arrays (i.e. it is convenient)\n\nWe write a bit of python to generate the wrapping code. \"genpywrap.py\":\n\n#+BEGIN_EXAMPLE\nimport numpy     as np\nimport numpysane as nps\nimport numpysane_pywrap as npsp\n\nm = npsp.module( name      = \"innerlib\",\n                 docstring = \"An inner product module in C\")\nm.function( \"inner\",\n            \"Inner product pywrapped with npsp\",\n\n            args_input       = ('a', 'b'),\n            prototype_input  = (('n',), ('n',)),\n            prototype_output = (),\n\n            Ccode_slice_eval = \\\n                {np.float64:\n                 r\"\"\"\n                   double* out = (double*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const double*)(data_slice__a +\n                                              i*strides_slice__a[0]) *\n                             *(const double*)(data_slice__b +\n                                              i*strides_slice__b[0]);\n                   return true;\"\"\" })\nm.write()\n#+END_EXAMPLE\n\nWe run this, and save the output to \"inner_pywrap.c\":\n\n#+BEGIN_EXAMPLE\npython3 genpywrap.py \u003e inner_pywrap.c\n#+END_EXAMPLE\n\nWe build this into a python module:\n\n#+BEGIN_EXAMPLE\nCOMPILE=(`python3 -c \"\nimport sysconfig\nconf = sysconfig.get_config_vars()\nprint('{} {} {} -I{}'.format(*[conf[x] for x in ('CC',\n                                                 'CFLAGS',\n                                                 'CCSHARED',\n                                                 'INCLUDEPY')]))\"`)\nLINK=(`python3 -c \"\nimport sysconfig\nconf = sysconfig.get_config_vars()\nprint('{} {} {}'.format(*[conf[x] for x in ('BLDSHARED',\n                                            'BLDLIBRARY',\n                                            'LDFLAGS')]))\"`)\nEXT_SUFFIX=`python3 -c \"\nimport sysconfig\nprint(sysconfig.get_config_vars('EXT_SUFFIX')[0])\"`\n\n${COMPILE[@]} -c -o inner_pywrap.o inner_pywrap.c\n${LINK[@]} -o innerlib$EXT_SUFFIX inner_pywrap.o\n#+END_EXAMPLE\n\nHere we used the build commands directly. This could be done with\nsetuptools/distutils instead; it's a normal extension module. And now we can\ncompute broadcasted inner products from a python script \"tst.py\":\n\n#+BEGIN_EXAMPLE\nimport numpy as np\nimport innerlib\nprint(innerlib.inner( np.arange(4, dtype=float),\n                      np.arange(8, dtype=float).reshape( 2,4)))\n#+END_EXAMPLE\n\nRunning it to compute inner([0,1,2,3],[0,1,2,3]) and inner([0,1,2,3],[4,5,6,7]):\n\n#+BEGIN_EXAMPLE\n$ python3 tst.py\n[14. 38.]\n#+END_EXAMPLE\n\n* DESCRIPTION\nThis module provides routines to python-wrap existing C code by generating C\nsources that define the wrapper python extension module.\n\nTo create the wrappers we\n\n1. Instantiate a new numpysane_pywrap.module class\n2. Call module.function() for each wrapper function we want to add to this\n   module\n3. Call module.write() to write the C sources defining this module to standard\n   output\n\nThe sources can then be built and executed normally, as any other python\nextension module. The resulting functions are called as one would expect:\n\n#+BEGIN_EXAMPLE\noutput                  = f_one_output      (input0, input1, ...)\n(output0, output1, ...) = f_multiple_outputs(input0, input1, ...)\n#+END_EXAMPLE\n\ndepending on whether we declared a single output, or multiple outputs (see\nbelow). It is also possible to pre-allocate the output array(s), and call the\nfunctions like this (see below):\n\n#+BEGIN_EXAMPLE\noutput = np.zeros(...)\nf_one_output      (input0, input1, ..., out = output)\n\noutput0 = np.zeros(...)\noutput1 = np.zeros(...)\nf_multiple_outputs(input0, input1, ..., out = (output0, output1))\n#+END_EXAMPLE\n\nEach wrapped function is broadcasting-aware. The normal numpy broadcasting rules\n(as described in 'broadcast_define' and on the numpy website:\nhttp://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) apply. In\nsummary:\n\n- Dimensions are aligned at the end of the shape list, and must match the\n  prototype\n\n- Extra dimensions left over at the front must be consistent for all the\n  input arguments, meaning:\n\n  - All dimensions of length != 1 must match\n  - Dimensions of length 1 match corresponding dimensions of any length in\n    other arrays\n  - Missing leading dimensions are implicitly set to length 1\n\n- The output(s) have a shape where\n  - The trailing dimensions match the prototype\n  - The leading dimensions come from the extra dimensions in the inputs\n\nWhen we create a wrapper function, we only define how to compute a single\nbroadcasted slice. If the generated function is called with higher-dimensional\ninputs, this slice code will be called multiple times. This broadcast loop is\nproduced by the numpysane_pywrap generator automatically. The generated code\nalso\n\n- parses the python arguments\n- generates python return values\n- validates the inputs (and any pre-allocated outputs) to make sure the given\n  shapes and types all match the declared shapes and types. For instance,\n  computing an inner product of a 5-vector and a 3-vector is illegal\n- creates the output arrays as necessary\n\nThis code-generator module does NOT produce any code to implicitly make copies\nof the input. If the inputs fail validation (unknown types given, contiguity\nchecks failed, etc) then an exception is raised. Copying the input is\npotentially slow, so we require the user to do that, if necessary.\n\n** Explicated example\n\nIn the synopsis we declared the wrapper module like this:\n\n#+BEGIN_EXAMPLE\nm = npsp.module( name      = \"innerlib\",\n                 docstring = \"An inner product module in C\")\n#+END_EXAMPLE\n\nThis produces a module named \"innerlib\". Note that the python importer will look\nfor this module in a file called \"innerlib$EXT_SUFFIX\" where EXT_SUFFIX comes\nfrom the python configuration. This is normal behavior for python extension\nmodules.\n\nA module can contain many wrapper functions. Each one is added by calling\n'm.function()'. We did this:\n\n#+BEGIN_EXAMPLE\nm.function( \"inner\",\n            \"Inner product pywrapped with numpysane_pywrap\",\n\n            args_input       = ('a', 'b'),\n            prototype_input  = (('n',), ('n',)),\n            prototype_output = (),\n\n            Ccode_slice_eval = \\\n                {np.float64:\n                 r\"\"\"\n                   double* out = (double*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const double*)(data_slice__a +\n                                              i*strides_slice__a[0]) *\n                             *(const double*)(data_slice__b +\n                                              i*strides_slice__b[0]);\n                   return true;\"\"\" })\n#+END_EXAMPLE\n\nWe declared:\n\n- A function \"inner\" with the given docstring\n- two inputs to this function: named 'a' and 'b'. Each is a 1-dimensional array\n  of length 'n', same 'n' for both arrays\n- one output: a scalar\n- how to compute a single inner product where all inputs and outputs are 64-bit\n  floating-point values: this snippet of C is included in the generated sources\n  verbatim\n\nIt is possible to support multiple sets of types by passing more key/value\ncombinations in 'Ccode_slice_eval'. Each set of types requires a different C\nsnippet. If the input doesn't match any known type set, an exception will be\nthrown. More on the type matching below.\n\nThe length of the inner product is defined by the length of the input, in this\ncase 'dims_slice__a[0]'. I could have looked at 'dims_slice__b[0]' instead, but\nI know it's identical: the 'prototype_input' says that both 'a' and 'b' have\nlength 'n', and if we're running the slice code snippet, we know that the inputs\nhave already been checked, and have compatible dimensionality. More on this\nbelow.\n\nI did not assume the data is contiguous, so I use 'strides_slice__a' and\n'strides_slice__b' to index the input arrays. We could add a validation function\nthat accepts only contiguous input; if we did that, the slice code snippet could\nassume contiguous data and ignore the strides. More on that below.\n\nOnce all the functions have been added, we write out the generated code to\nstandard output by invoking\n\n#+BEGIN_EXAMPLE\nm.write()\n#+END_EXAMPLE\n\n** Dimension specification\nThe shapes of the inputs and outputs are given in the 'prototype_input' and\n'prototype_output' arguments respectively. This is similar to how this is done\nin 'numpysane.broadcast_define()': each prototype is a tuple of shapes, one for\neach argument. Each shape is given as a tuple of sizes for each expected\ndimension. Each size can be either\n\n- a positive integer if we know the expected dimension size beforehand, and only\n  those sizes are accepted\n\n- a string that names the dimension. Any size could be accepted for a named\n  dimension, but for any given named dimension, the sizes must match across all\n  inputs and outputs\n\nUnlike 'numpysane.broadcast_define()', the shapes of both inputs and outputs\nmust be defined here: the output shape may not be omitted.\n\nThe common special case of a single output is supported: this one output is\nspecified in 'prototype_output' as a single shape, instead of a tuple of shapes.\nThis also affects whether the resulting python function returns the one output\nor a tuple of outputs.\n\nExamples:\n\nA function taking in some 2D vectors and the same number of 3D vectors:\n\n#+BEGIN_EXAMPLE\nprototype_input  = (('n',2), ('n',3))\n#+END_EXAMPLE\n\nA function producing a single 2D vector:\n\n#+BEGIN_EXAMPLE\nprototype_output = (2,)\n#+END_EXAMPLE\n\nA function producing 3 outputs: some number of 2D vectors, a single 3D vector\nand a scalar:\n\n#+BEGIN_EXAMPLE\nprototype_output = (('n',2), (3,), ())\n#+END_EXAMPLE\n\nNote that when creating new output arrays, all the dimensions must be known from\nthe inputs. For instance, given this, we cannot create the output:\n\n#+BEGIN_EXAMPLE\nprototype_input  = ((2,), ('n',))\nprototype_output = (('m',), ('m', 'm'))\n#+END_EXAMPLE\n\nI have the inputs, so I know 'n', but I don't know 'm'. When calling a function\nlike this, it is required to pass in pre-allocated output arrays instead of\nasking the wrapper code to create new ones. See below.\n\n** In-place outputs\nAs with 'numpysane.broadcast_define()', the caller of the generated python\nfunction may pre-allocate the output and pass it in the 'out' kwarg to be\nfilled-in. Sometimes this is required if we want to avoid extra copying of data.\nThis is also required if the output prototypes have any named dimensions not\npresent in the input prototypes: in this case we dont know how large the output\narrays should be, so we can't create them.\n\nIf a wrapped function is called this way, we check that the dimensions and types\nin the outputs match the prototype. Otherwise, we create a new output array with\nthe correct type and shape.\n\nIf we have multiple outputs, the in-place arrays are given as a tuple of arrays\nin the 'out' kwarg. If any outputs are pre-allocated, all of them must be.\n\nExample. Let's use the inner-product we defined earlier. We compute two sets of\ninner products. We make two calls to inner(), each one broadcasted to produce\ntwo inner products into a non-contiguous slice of an output array:\n\n#+BEGIN_EXAMPLE\nimport numpy as np\nimport innerlib\n\nout=np.zeros((2,2), dtype=float)\ninnerlib.inner( np.arange(4, dtype=float),\n                np.arange(8, dtype=float).reshape( 2,4),\n                out=out[:,0] )\ninnerlib.inner( 1+np.arange(4, dtype=float),\n                np.arange(8, dtype=float).reshape( 2,4),\n                out=out[:,1] )\nprint(out)\n#+END_EXAMPLE\n\nThe first two inner products end up in the first column of the output, and the\nnext two inner products in the second column:\n\n#+BEGIN_EXAMPLE\n$ python3 tst.py\n\n[[14. 20.]\n [38. 60.]]\n#+END_EXAMPLE\n\nIf we have a function \"f\" that produces two outputs, we'd do this:\n\n#+BEGIN_EXAMPLE\noutput0 = np.zeros(...)\noutput1 = np.zeros(...)\nf( ..., out = (output0, output1) )\n#+END_EXAMPLE\n\n** Type checking\nSince C code is involved, we must be very explicit about the types of our\narrays. These types are specified in the keys of the 'Ccode_slice_eval'\nargument to 'function()'. For each type specification in a key, the\ncorresponding value is a C code snippet to use for that type spec. The type\nspecs can be either\n\n- A type known by python and acceptable to numpy as a valid dtype. In this usage\n  ALL inputs and ALL outputs must have this type\n- A tuple of types. The elements of this tuple correspond to each input, in\n  order, followed by each output, in order. This allows different arguments to\n  have different types\n\nIt is up to the user to make sure that the C snippet they provide matches the\ntypes that they declared.\n\nExample. Let's extend the inner product to know about 32-bit floats and also\nabout producing a rounded integer inner product from 64-bit floats:\n\n#+BEGIN_EXAMPLE\nm = npsp.module( name      = \"innerlib\",\n                 docstring = \"An inner product module in C\",\n                 header    = \"#include \u003cstdint.h\u003e\")\nm.function( \"inner\",\n            \"Inner product pywrapped with numpysane_pywrap\",\n\n            args_input       = ('a', 'b'),\n            prototype_input  = (('n',), ('n',)),\n            prototype_output = (),\n\n            Ccode_slice_eval = \\\n                {np.float64:\n                 r\"\"\"\n                   double* out = (double*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const double*)(data_slice__a +\n                                              i*strides_slice__a[0]) *\n                             *(const double*)(data_slice__b +\n                                              i*strides_slice__b[0]);\n                   return true;\"\"\",\n                 np.float32:\n                 r\"\"\"\n                   float* out = (float*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const float*)(data_slice__a +\n                                             i*strides_slice__a[0]) *\n                             *(const float*)(data_slice__b +\n                                             i*strides_slice__b[0]);\n                   return true;\"\"\",\n                 (np.float64, np.float64, np.int32):\n                 r\"\"\"\n                   double out = 0.0;\n                   const int N = dims_slice__a[0];\n\n                   for(int i=0; i\u003cN; i++)\n                     out += *(const double*)(data_slice__a +\n                                             i*strides_slice__a[0]) *\n                            *(const double*)(data_slice__b +\n                                             i*strides_slice__b[0]);\n                   *(int32_t*)data_slice__output = (int32_t)round(out);\n                   return true;\"\"\" })\n#+END_EXAMPLE\n\n** Argument validation\nAfter the wrapping code confirms that all the shapes and types match the\nprototype, it calls a user-provided validation routine once to flag any extra\nconditions that are required. A common use case: we're wrapping some C code that\nassumes the input data is stored contiguously in memory, so the validation\nroutine checks that this is true.\n\nThis code snippet is provided in the 'Ccode_validate' argument to 'function()'.\nThe result is returned as a boolean: if the checks pass, we return true. If the\nchecks fail, we return false, which will result in an exception being thrown. If\nyou want to throw your own, more informative exception, you can do that as usual\n(by calling something like PyErr_Format()) before returning false.\n\nIf the 'Ccode_validate' argument is omitted, no additional checks are performed,\nand we accept all calls that satisfied the broadcasting and type requirements.\n\n** Contiguity checking\nSince checking for memory contiguity is a very common use case for argument\nvalidation, there are convenience macros provided:\n\n#+BEGIN_EXAMPLE\nCHECK_CONTIGUOUS__NAME()\nCHECK_CONTIGUOUS_AND_SETERROR__NAME()\n\nCHECK_CONTIGUOUS_ALL()\nCHECK_CONTIGUOUS_AND_SETERROR_ALL()\n#+END_EXAMPLE\n\nThe strictest, and most common usage will accept only those calls where ALL\ninputs and ALL outputs are stored in contiguous memory. This can be accomplished\nby defining the function like\n\n#+BEGIN_EXAMPLE\nm.function( ...,\n           Ccode_validate = 'return CHECK_CONTIGUOUS_AND_SETERROR_ALL();' )\n#+END_EXAMPLE\n\nAs before, \"NAME\" refers to each individual input or output, and \"ALL\" checks\nall of them. These all evaluate to true if the argument in question IS\ncontiguous. The ..._AND_SETERROR_... flavor does that, but ALSO raises an\ninformative exception.\n\nGenerally you want to do this in the validation routine only, since it runs only\nonce. But there's nothing stopping you from checking this in the computation\nfunction too.\n\nNote that each broadcasted slice is processed separately, so the C code being\nwrapped usually only cares about each SLICE being contiguous. If the dimensions\nabove each slice (those being broadcasted) are not contiguous, this doesn't\nbreak the underlying assumptions. Thus the CHECK_CONTIGUOUS_... macros only\ncheck and report the in-slice contiguity. If for some reason you need more than\nthis, you should write the check yourself, using the strides_full__... and\ndims_full__... arrays.\n\n** Slice computation\nThe code to evaluate each broadcasted slice is provided in the required\n'Ccode_slice_eval' argument to 'function()'. This argument is a dict, specifying\ndifferent flavors of the available computation, with each code snippet present\nin the values of this dict. Each code snippet is wrapped into a function which\nreturns a boolean: true on success, false on failure. If false is ever returned,\nall subsequent slices are abandoned, and an exception is thrown. As with the\nvalidation code, you can just return false, and a generic Exception will be\nthrown. Or you can throw a more informative exception yourself prior to\nreturning false.\n\n** Values available to the code snippets\nEach of the user-supplied code blocks is placed into a separate function in the\ngenerated code, with identical arguments in both cases. These arguments describe\nthe inputs and outputs, and are meant to be used by the user code. We have\ndimensionality information:\n\n#+BEGIN_EXAMPLE\nconst int       Ndims_full__NAME\nconst npy_intp* dims_full__NAME\nconst int       Ndims_slice__NAME\nconst npy_intp* dims_slice__NAME\n#+END_EXAMPLE\n\nwhere \"NAME\" is the name of the input or output. The input names are given in\nthe 'args_input' argument to 'function()'. If we have a single output, the\noutput name is \"output\". If we have multiple outputs, their names are \"output0\",\n\"output1\", ... The ...full... arguments describe the full array, that describes\nALL the broadcasted slices. The ...slice... arguments describe each broadcasted\nslice separately. Under most usages, you want the ...slice... information\nbecause the C code we're wrapping only sees one slice at a time. Ndims...\ndescribes how many dimensions we have in the corresponding dims... arrays.\nnpy_intp is a long integer used internally by numpy for dimension information.\n\nWe have memory layout information:\n\n#+BEGIN_EXAMPLE\nconst npy_intp* strides_full__NAME\nconst npy_intp* strides_slice__NAME\nnpy_intp        sizeof_element__NAME\n#+END_EXAMPLE\n\nNAME and full/slice and npy_intp have the same meanings as before. The\nstrides... arrays each have length described by the corresponding dims... The\nstrides contain the step size in bytes, of each dimension. sizeof_element...\ndescribes the size in bytes, of a single data element.\n\nFinally, I have a pointer to the data itself. The validation code gets a pointer\nto the start of the whole data array:\n\n#+BEGIN_EXAMPLE\nvoid*           data__NAME\n#+END_EXAMPLE\n\nbut the computation code gets a pointer to the start of the slice we're\ncurrently looking at:\n\n#+BEGIN_EXAMPLE\nvoid*           data_slice__NAME\n#+END_EXAMPLE\n\nIf the data in the arrays is representable as a basic C type (most integers,\nfloats and complex numbers), then convenience macros are available to index\nelements in the sliced arrays and to conveniently access the C type of the data.\nThese macros take into account the data type and the strides.\n\n#+BEGIN_EXAMPLE\n#define         ctype__NAME     ...\n#define         item__NAME(...) ...\n#+END_EXAMPLE\n\nFor instance, if we have a 2D array 'x' containing 64-bit floats, we'll have\nthis:\n\n#+BEGIN_EXAMPLE\n#define         ctype__x     npy_float64 /* \"double\" on most platforms */\n#define         item__x(i,j) (*(ctype__x*)(data_slice__x + ...))\n#+END_EXAMPLE\n\nFor more complex types (objects, vectors, strings) you'll need to deal with the\nstrides and the pointers yourself.\n\nExample: I'm computing a broadcasted slice. An input array 'x' is a\n2-dimensional slice of dimension (3,4) of 64-bit floating-point values. I thus\nhave Ndims_slice__x == 2 and dims_slice__x[] = {3,4} and sizeof_element__x == 8.\nAn element of this array at i,j can be accessed with either\n\n#+BEGIN_EXAMPLE\n*((double*)(data_slice__a + i*strides_slice__a[0] + j*strides_slice__a[1]))\n\nitem__a(i,j)\n#+END_EXAMPLE\n\nBoth are identical. If I defined a validation function that makes sure that 'a'\nis stored in contiguous memory, the computation code doesn't need to look at the\nstrides at all, and element at i,j can be found more simply:\n\n#+BEGIN_EXAMPLE\n((double*)data_slice__a)[ i*dims_slice__a[1] + j ]\n\nitem__a(i,j)\n#+END_EXAMPLE\n\nAs you can see, the item__...() macros are much simpler, less error-prone and\nare thus the preferred form.\n\n** Specifying extra, non-broadcasted arguments\n\nSometimes it is desired to pass extra arguments to the C code; ones that aren't\nbroadcasted in any way, but are just passed verbatim by the wrapping code down\nto the inner C code. We can do that with the 'extra_args' argument to\n'function()'. This argument is an tuple of tuples, where each inner tuple\nrepresents an extra argument:\n\n#+BEGIN_EXAMPLE\n(c_type, arg_name, default_value, parse_arg)\n#+END_EXAMPLE\n\nEach element is a string.\n\n- the \"c_type\" is the C type of the argument; something like \"int\" or \"double\",\n  or \"const char*\"\n\n- the \"arg_name\" is the name of the argument, used in both the Python and the C\n  levels\n\n- the \"default_value\" is the value the C wrapping code will use if this argument\n  is omitted in the Python call. Note that this is a string used in generating\n  the C code, so if we have an integer with a default value of 0, we use a\n  string \"0\" and not the integer 0\n\n- the \"parse_arg\" is the code used in the PyArg_ParseTupleAndKeywords() call.\n  See the documentation for that function.\n\nThese extra arguments are expected to be read-only, and are passed as a const*\nto the validation routines and the slice computation routines. If the C type is\nalready a pointer (most notably if it is a string), then we do NOT dereference\nit a second time.\n\nThe generated code for parsing of Python arguments sets all of these extra\narguments as being optional, using the default_value if an argument is omitted.\nIf one of these arguments is actually required, the corresponding logic goes\ninto the validation function.\n\nWhen calling the resulting Python function, the extra arguments MUST be\npassed-in as kwargs. These will NOT work as positional arguments.\n\nThis is most clearly explained with an example. Let's update our inner product\nexample to accept a \"scale\" numerical argument and a \"scale_string\" string\nargument, where the scale_string is required:\n\n#+BEGIN_EXAMPLE\nm.function( \"inner\",\n            \"Inner product pywrapped with numpysane_pywrap\",\n\n            args_input       = ('a', 'b'),\n            prototype_input  = (('n',), ('n',)),\n            prototype_output = (),\n            extra_args = ((\"double\",      \"scale\",          \"1\",    \"d\"),\n                          (\"const char*\", \"scale_string\",   \"NULL\", \"s\")),\n            Ccode_validate = r\"\"\"\n                if(scale_string == NULL)\n                {\n                    PyErr_Format(PyExc_RuntimeError,\n                        \"The 'scale_string' argument is required\" );\n                    return false;\n                }\n                return true; \"\"\",\n            Ccode_slice_eval = \\\n                {np.float64:\n                 r\"\"\"\n                   double* out = (double*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const double*)(data_slice__a +\n                                              i*strides_slice__a[0]) *\n                             *(const double*)(data_slice__b +\n                                              i*strides_slice__b[0]);\n                   *out *= *scale * atof(scale_string);\n\n                   return true;\"\"\" }\n)\n#+END_EXAMPLE\n\nNow I can optionally scale the result:\n\n#+BEGIN_EXAMPLE\n\u003e\u003e\u003e print(innerlib.inner( np.arange(4, dtype=float),\n                          np.arange(8, dtype=float).reshape( 2,4)),\n                          scale_string = \"1.0\")\n[14. 38.]\n\n\u003e\u003e\u003e print(innerlib.inner( np.arange(4, dtype=float),\n                          np.arange(8, dtype=float).reshape( 2,4),\n                          scale        = 2.0,\n                          scale_string = \"10.0\"))\n[280. 760.]\n#+END_EXAMPLE\n\n** Precomputing a cookie outside the slice computation\nSometimes it is useful to generate some resource once, before any of the\nbroadcasted slices were evaluated. The slice evaluation code could then make use\nof this resource. Example: allocating memory, opening files. This is supported\nusing a 'cookie'. We define a structure that contains data that will be\navailable to all the generated functions. This structure is initialized at the\nbeginning, used by the slice computation functions, and then cleaned up at the\nend. This is most easily described with an example. The scaled inner product\ndemonstrated immediately above has an inefficiency: we compute\n'atof(scale_string)' once for every slice, even though the string does not\nchange. We should compute the atof() ONCE, and use the resulting value each\ntime. And we can:\n\n#+BEGIN_EXAMPLE\nm.function( \"inner\",\n            \"Inner product pywrapped with numpysane_pywrap\",\n\n            args_input       = ('a', 'b'),\n            prototype_input  = (('n',), ('n',)),\n            prototype_output = (),\n            extra_args = ((\"double\",      \"scale\",          \"1\",    \"d\"),\n                          (\"const char*\", \"scale_string\",   \"NULL\", \"s\")),\n            Ccode_cookie_struct = r\"\"\"\n              double scale; /* from BOTH scale arguments: \"scale\", \"scale_string\" */\n            \"\"\",\n            Ccode_validate = r\"\"\"\n                if(scale_string == NULL)\n                {\n                    PyErr_Format(PyExc_RuntimeError,\n                        \"The 'scale_string' argument is required\" );\n                    return false;\n                }\n                cookie-\u003escale = *scale * (scale_string ? atof(scale_string) : 1.0);\n                return true; \"\"\",\n            Ccode_slice_eval = \\\n                {np.float64:\n                 r\"\"\"\n                   double* out = (double*)data_slice__output;\n                   const int N = dims_slice__a[0];\n\n                   *out = 0.0;\n\n                   for(int i=0; i\u003cN; i++)\n                     *out += *(const double*)(data_slice__a +\n                                              i*strides_slice__a[0]) *\n                             *(const double*)(data_slice__b +\n                                              i*strides_slice__b[0]);\n                   *out *= cookie-\u003escale;\n\n                   return true;\"\"\" },\n\n            // Cleanup, such as free() or close() goes here\n            Ccode_cookie_cleanup = ''\n)\n#+END_EXAMPLE\n\nWe defined a cookie structure that contains one element: 'double scale'. We\ncompute the scale factor (from BOTH of the extra arguments) before any of the\nslices are evaluated: in the validation function. Then we apply the\nalready-computed scale with each slice. Both the validation and slice\ncomputation functions have the whole cookie structure available in '*cookie'. It\nis expected that the validation function will write something to the cookie, and\nthe slice functions will read it, but this is not enforced: this structure is\nnot const, and both functions can do whatever they like.\n\nIf the cookie initialization did something that must be cleaned up (like a\nmalloc() for instance), the cleanup code can be specified in the\n'Ccode_cookie_cleanup' argument to function(). Note: this cleanup code is ALWAYS\nexecuted, even if there were errors that raise an exception, EVEN if we haven't\ninitialized the cookie yet. When the cookie object is first initialized, it is\nfilled with 0, so the cleanup code can detect whether the cookie has been\ninitialized or not:\n\n#+BEGIN_EXAMPLE\nm.function( ...\n            Ccode_cookie_struct = r\"\"\"\n              ...\n              bool initialized;\n            \"\"\",\n            Ccode_validate = r\"\"\"\n              ...\n              cookie-\u003einitialized = true;\n              return true;\n            \"\"\",\n            Ccode_cookie_cleanup = r\"\"\"\n              if(cookie-\u003einitialized) cleanup();\n            \"\"\" )\n#+END_EXAMPLE\n\n** Examples\nFor some sample usage, see the wrapper-generator used in the test suite:\nhttps://github.com/dkogan/numpysane/blob/master/test/genpywrap.py\n\n** Planned functionality\nCurrently, each broadcasted slice is computed sequentially. But since the slices\nare inherently independent, this is a natural place to add parallelism. And\nimplemention this with something like OpenMP should be straightforward. I'll get\naround to doing this eventually, but in the meantime, patches are welcome.\n\n* COMPATIBILITY\n\nPython 2 and Python 3 should both be supported. Please report a bug if either\none doesn't work.\n\n* REPOSITORY\n\nhttps://github.com/dkogan/numpysane\n\n* AUTHOR\n\nDima Kogan \u003cdima@secretsauce.net\u003e\n\n* LICENSE AND COPYRIGHT\n\nCopyright 2016-2020 Dima Kogan.\n\nThis program is free software; you can redistribute it and/or modify it under\nthe terms of the GNU Lesser General Public License (any version) as published by\nthe Free Software Foundation\n\nSee https://www.gnu.org/licenses/lgpl.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkogan%2Fnumpysane","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdkogan%2Fnumpysane","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkogan%2Fnumpysane/lists"}