{"id":14977365,"url":"https://github.com/danielfrg/s3contents","last_synced_at":"2025-05-16T10:06:08.633Z","repository":{"id":12345819,"uuid":"71605775","full_name":"danielfrg/s3contents","owner":"danielfrg","description":"Jupyter Notebooks in S3 - Jupyter Contents Manager implementation","archived":false,"fork":false,"pushed_at":"2025-04-03T03:52:49.000Z","size":556,"stargazers_count":251,"open_issues_count":17,"forks_count":87,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-05-16T10:05:33.465Z","etag":null,"topics":["aws","aws-s3","jupyter-notebook","python","s3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielfrg.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-10-22T00:03:09.000Z","updated_at":"2025-05-03T15:10:33.000Z","dependencies_parsed_at":"2023-11-15T01:27:08.691Z","dependency_job_id":"fb5b02b0-7d92-46c3-a300-d470d59ddbef","html_url":"https://github.com/danielfrg/s3contents","commit_stats":{"total_commits":189,"total_committers":38,"mean_commits":4.973684210526316,"dds":0.6243386243386244,"last_synced_commit":"f4105fc749c7394c82a4080b172f9a0f1c0916f9"},"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielfrg%2Fs3contents","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielfrg%2Fs3contents/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielfrg%2Fs3contents/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielfrg%2Fs3contents/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielfrg","download_url":"https://codeload.github.com/danielfrg/s3contents/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509477,"owners_count":22082891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-s3","jupyter-notebook","python","s3"],"created_at":"2024-09-24T13:55:31.631Z","updated_at":"2025-05-16T10:06:08.610Z","avatar_url":"https://github.com/danielfrg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/danielfrg/s3contents/main/docs/logo.png\" width=\"450px\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://pypi.org/project/s3contents/\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/v/mkdocs-jupyter.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/danielfrg/s3contents/actions/workflows/test.yml\"\u003e\n        \u003cimg src=\"https://github.com/danielfrg/s3contents/workflows/test/badge.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://codecov.io/gh/danielfrg/s3contents?branch=main\"\u003e\n        \u003cimg src=\"https://codecov.io/gh/danielfrg/s3contents/branch/main/graph/badge.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"http://github.com/danielfrg/s3contents/blob/main/LICENSE.txt\"\u003e\n        \u003cimg src=\"https://img.shields.io/:license-Apache%202-blue.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n# S3Contents - Jupyter Notebooks in S3\n\nA transparent, drop-in replacement for Jupyter standard filesystem-backed storage system.\nWith this implementation of a\n[Jupyter Contents Manager](https://jupyter-server.readthedocs.io/en/latest/developers/contents.html)\nyou can save all your notebooks, files and directory structure directly to a\nS3/GCS bucket on AWS/GCP or a self hosted S3 API compatible like [MinIO](http://minio.io).\n\n## Installation\n\n```shell\npip install s3contents\n```\n\nInstall with GCS dependencies:\n\n```shell\npip install s3contents[gcs]\n```\n\n## s3contents vs X\n\nWhile there are some implementations of an S3 Jupyter Content Manager such as\n[s3nb](https://github.com/monetate/s3nb) or [s3drive](https://github.com/stitchfix/s3drive)\ns3contents is the only one tested against new versions of Jupyter.\nIt also supports more authentication methods and Google Cloud Storage.\n\nThis aims to be a fully tested implementation and it's based on [PGContents](https://github.com/quantopian/pgcontents).\n\n## Configuration\n\nCreate a `jupyter_notebook_config.py` file in one of the\n[Jupyter config directories](https://jupyter.readthedocs.io/en/latest/use/jupyter-directories.html#id1)\nfor example: `~/.jupyter/jupyter_notebook_config.py`.\n\n**Jupyter Notebook Classic**: If you plan to use the Classic Jupyter Notebook\ninterface you need to change `ServerApp` to `NotebookApp` for all the examples on this page.\n\n## AWS S3\n\n```python\nfrom s3contents import S3ContentsManager\n\nc = get_config()\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\nc.S3ContentsManager.bucket = \"\u003cS3 bucket name\u003e\"\n\n# Fix JupyterLab dialog issues\nc.ServerApp.root_dir = \"\"\n```\n\n### Authentication\n\nAdditionally you can configure multiple authentication methods:\n\nAccess and secret keys:\n\n```python\nc.S3ContentsManager.access_key_id = \"\u003cAWS Access Key ID / IAM Access Key ID\u003e\"\nc.S3ContentsManager.secret_access_key = \"\u003cAWS Secret Access Key / IAM Secret Access Key\u003e\"\n```\n\nSession token:\n\n```python\nc.S3ContentsManager.session_token = \"\u003cAWS Session Token / IAM Session Token\u003e\"\n```\n\n### AWS EC2 role auth setup\n\nIt also possible to use IAM Role-based access to the S3 bucket from an Amazon EC2 instance or AWS resource.\n\nTo do that just leave any authentication options (`access_key_id`, `secret_access_key`) to their default of `None`\nand ensure that the EC2 instance has an IAM role which provides sufficient permissions (read and write) for the bucket.\n\n### Optional settings\n\n```python\n# A prefix in the S3 buckets to use as the root of the Jupyter file system\nc.S3ContentsManager.prefix = \"this/is/a/prefix/on/the/s3/bucket\"\n\n# Server-Side Encryption\nc.S3ContentsManager.sse = \"AES256\"\n\n# Authentication signature version\nc.S3ContentsManager.signature_version = \"s3v4\"\n\n# See AWS key refresh\nc.S3ContentsManager.init_s3_hook = init_function\n```\n\n### AWS key refresh\n\nThe optional `init_s3_hook` configuration can be used to enable AWS key rotation (described [here](https://dev.to/li_chastina/auto-refresh-aws-tokens-using-iam-role-and-boto3-2cjf) and [here](https://www.owenrumney.co.uk/2019/01/15/implementing-refreshingawscredentials-python/)) as follows:\n\n```python\nfrom aiobotocore.credentials import AioRefreshableCredentials\nfrom aiobotocore.session import get_session\nfrom configparser import ConfigParser\n\nfrom s3contents import S3ContentsManager\n\ndef refresh_external_credentials():\n    config = ConfigParser()\n    config.read('/home/jovyan/.aws/credentials')\n    return {\n        \"access_key\": config['default']['aws_access_key_id'],\n        \"secret_key\": config['default']['aws_secret_access_key'],\n        \"token\": config['default']['aws_session_token'],\n        \"expiry_time\": config['default']['aws_expiration']\n    }\n\nasync def async_refresh_credentials():\n    return refresh_external_credentials()\n\ndef make_key_refresh_boto3(this_s3contents_instance):\n    session_credentials = AioRefreshableCredentials.create_from_metadata(\n        metadata = refresh_external_credentials(),\n        refresh_using = async_refresh_credentials,\n        method = 'custom-refreshing-key-file-reader'\n    )\n    refresh_session =  get_session() # from aibotocore.session\n    refresh_session._credentials = session_credentials\n    this_s3contents_instance.boto3_session = refresh_session\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\n\nc.S3ContentsManager.init_s3_hook = make_key_refresh_boto3\n```\n\n### MinIO playground example\n\nYou can test this using the [`play.minio.io:9000`](https://play.minio.io:9000) playground:\n\nJust be sure to create the bucket first.\n\n```python\nfrom s3contents import S3ContentsManager\n\nc = get_config()\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\nc.S3ContentsManager.access_key_id = \"Q3AM3UQ867SPQQA43P2F\"\nc.S3ContentsManager.secret_access_key = \"zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG\"\nc.S3ContentsManager.endpoint_url = \"https://play.minio.io:9000\"\nc.S3ContentsManager.bucket = \"s3contents-demo\"\nc.S3ContentsManager.prefix = \"notebooks/test\"\n```\n\n## Access local files\n\nTo access local file as well as remote files in S3 you can use [hybridcontents](https://github.com/viaduct-ai/hybridcontents).\n\nInstall it:\n\n```shell\npip install hybridcontents\n```\n\nUse a configuration similar to this:\n\n```python\nfrom s3contents import S3ContentsManager\nfrom hybridcontents import HybridContentsManager\nfrom notebook.services.contents.largefilemanager import LargeFileManager\n\nc = get_config()\n\nc.ServerApp.contents_manager_class = HybridContentsManager\n\nc.HybridContentsManager.manager_classes = {\n    # Associate the root directory with an S3ContentsManager.\n    # This manager will receive all requests that don\"t fall under any of the\n    # other managers.\n    \"\": S3ContentsManager,\n    # Associate /local_directory with a LargeFileManager.\n    \"local_directory\": LargeFileManager,\n}\n\nc.HybridContentsManager.manager_kwargs = {\n    # Args for root S3ContentsManager.\n    \"\": {\n        \"access_key_id\": \"\u003cAWS Access Key ID / IAM Access Key ID\u003e\",\n        \"secret_access_key\": \"\u003cAWS Secret Access Key / IAM Secret Access Key\u003e\",\n        \"bucket\": \"\u003cS3 bucket name\u003e\",\n    },\n    # Args for the LargeFileManager mapped to /local_directory\n    \"local_directory\": {\n        \"root_dir\": \"/Users/danielfrg/Downloads\",\n    },\n}\n```\n\n## GCP - Google Cloud Storage\n\nInstall the extra dependencies with:\n\n```shell\npip install s3contents[gcs]\n```\n\n```python\nfrom s3contents.gcs import GCSContentsManager\n\nc = get_config(\n\nc.ServerApp.contents_manager_class = GCSContentsManager\nc.GCSContentsManager.project = \"\u003cyour-project\u003e\"\nc.GCSContentsManager.token = \"~/.config/gcloud/application_default_credentials.json\"\nc.GCSContentsManager.bucket = \"\u003cGCP bucket name\u003e\"\n```\n\nNote that the file `~/.config/gcloud/application_default_credentials.json` assumes\na POSIX system when you did `gcloud init`.\n\n## Other configuration\n\n### File Save Hooks\n\nIf you want to use pre/post file save hooks here are some examples.\n\nA `pre_save_hook` is written in the exact same way as normal, operating on the\nfile in local storage before committing it to the object store.\n\n```python\ndef scrub_output_pre_save(model, **kwargs):\n    \"\"\"\n    Scrub output before saving notebooks\n    \"\"\"\n\n    # only run on notebooks\n    if model[\"type\"] != \"notebook\":\n        return\n\n    # only run on nbformat v4\n    if model[\"content\"][\"nbformat\"] != 4:\n        return\n\n    for cell in model[\"content\"][\"cells\"]:\n        if cell[\"cell_type\"] != \"code\":\n            continue\n        cell[\"outputs\"] = []\n        cell[\"execution_count\"] = None\n\nc.S3ContentsManager.pre_save_hook = scrub_output_pre_save\n```\n\nA `post_save_hook` instead operates on the file in object storage,\nbecause of this it is useful to use the file methods on the `contents_manager`\nfor data manipulation.\nIn addition, one must use the following function signature (unique to `s3contents`):\n\n```python\ndef make_html_post_save(model, s3_path, contents_manager, **kwargs):\n    \"\"\"\n    Convert notebooks to HTML after saving via nbconvert\n    \"\"\"\n    from nbconvert import HTMLExporter\n\n    if model[\"type\"] != \"notebook\":\n        return\n\n    content, _format = contents_manager.fs.read(s3_path, format=\"text\")\n    my_notebook = nbformat.reads(content, as_version=4)\n\n    html_exporter = HTMLExporter()\n    html_exporter.template_name = \"classic\"\n\n    (body, resources) = html_exporter.from_notebook_node(my_notebook)\n\n    base, ext = os.path.splitext(s3_path)\n    contents_manager.fs.write(path=(base + \".html\"), content=body, format=_format)\n\nc.S3ContentsManager.post_save_hook = make_html_post_save\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielfrg%2Fs3contents","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielfrg%2Fs3contents","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielfrg%2Fs3contents/lists"}