{"id":25045910,"url":"https://github.com/bataeves/isparkcache","last_synced_at":"2026-04-20T05:33:37.753Z","repository":{"id":62571835,"uuid":"95787706","full_name":"bataeves/isparkcache","owner":"bataeves","description":"Jupyter модуль для кеширования Spark DataFrame, полученных в результате выполнения ячейки","archived":false,"fork":false,"pushed_at":"2017-08-09T08:29:56.000Z","size":29,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-08T17:49:13.769Z","etag":null,"topics":["cache","ipython","jupyter","pyspark","spark"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bataeves.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-29T14:44:12.000Z","updated_at":"2018-05-28T15:53:01.000Z","dependencies_parsed_at":"2022-11-03T19:53:39.019Z","dependency_job_id":null,"html_url":"https://github.com/bataeves/isparkcache","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bataeves%2Fisparkcache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bataeves%2Fisparkcache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bataeves%2Fisparkcache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bataeves%2Fisparkcache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bataeves","download_url":"https://codeload.github.com/bataeves/isparkcache/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246399788,"owners_count":20770907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","ipython","jupyter","pyspark","spark"],"created_at":"2025-02-06T06:26:56.325Z","updated_at":"2026-04-20T05:33:32.732Z","avatar_url":"https://github.com/bataeves.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Defines a **%%sparkcache** cell magic in the IPython notebook to cache DataFrame\nand outputs of long-lasting computations in a persistent Parquet file in Hadoop.\nUseful when some computations in a notebook are long and you want to\neasily save the results in a file.\n\nBased on [ipycache](https://github.com/rossant/ipycache) module.\n\nInstallation\n------------\n\n-  ``pip install isparkcache``\n\nUsage\n-----\n\n-  In IPython/Jupyter:\n\n\n       %load_ext isparkcache\n\n-  Then, create a cell with:\n\n\n       %%sparkcache df1 df2\n       df = ...\n       df1 = sql.createDataFrame(df)\n       df2 = sql.createDataFrame(df)\n\n-  When you execute this cell the first time, the code is executed, and\n   the dataframes ``df1`` and ``df2`` are saved in \n   ``/user/$USER/sparkcache/mysparkapplication/df1`` and \n   ``/user/$USER/sparkcache/mysparkapplication/df2``. \n   When you execute this cell again, the code is skipped, the dataframes are\n   loaded from the Parquet and injected into the namespace, and the outputs\n   are restored in the notebook.\n\n\n-  Use the ``--force`` or ``-f`` option to force the cell's execution\n   and overwrite the file.\n\n-  Use the ``--read`` or ``-r`` option to prevent the cell's execution\n   and always load the variables from the cache. An exception is raised\n   if the file does not exist.\n\n-  Use the ``--cachedir`` or ``-d`` option to specify the cache\n   directory. Default directory: ``/user/$USER/sparkcache``.\n    You can specify a default directory in the IPython\n   configuration file in your profile (typically in\n   ``~\\.ipython\\profile_default\\ipython_config.py``) by adding the\n   following line:\n\n\n       c.SparkCacheMagics.cachedir = \"/path/to/mycache\"\n\n   If both a default cache directory and the ``--cachedir`` option are\n   given, the latter is used.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbataeves%2Fisparkcache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbataeves%2Fisparkcache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbataeves%2Fisparkcache/lists"}