Skip to content

User Guide

The Jupyter ecosystem is huge, and one can find plenty of resources across the internet to learn Jupyter or solve a particular problem. Thus, this guide focuses on topics that are not covered well in the existing documentation.

Access S3 resources (e.g., from Coscine)

Analysing data is a typical use-case for the Jupyter service. However, this data might not be available locally for upload nor easily downloadable from the internet. To access, for example, object storage (S3), Jupyter-fs has been installed, currently in the experimental images datascience-ai-notebook and engineering-toolbox.

Setup Jupyter-fs for S3

  1. Open a terminal and create a file conf.json with the structure below. Save the file under .config/fsspec/.

    Image title Image title

    You have to change all the variables marked with <> with the specific values that match the s3 resource you want to access. For example, Coscine commonly uses the endpoint_url of https://global.datastorage.nrw:443/. You can also reference the Coscine documentation for details.

    conf.json
    {
      "s3": {
        "endpoint_url": "<your-s3-endpoint-url>",
        "key": "<your-credentials-key>",
        "secret": "<your-credentials-secret>"
      }
    }
    
  2. Go to Settings Settings Editor and choose jupyter-fs from the left sidebar.

    Image title

    Image title Click on Add under Resources and fill out the fields:

    • name: Can be any name for the remote storage.
    • url: The left part (before ://) specifies the protocol to access the remote storage. In this case, it is s3. Besides, there are various other protocols available, e.g., local storage, FTP, SFTP and ZIP. For a full overview have a look at the fsspec documentation and the links therein. The right part (after ://) contains the bucket name, optionally followed by any further path to shrink the results.
    • type Has to be fsspec. It is the only backend currently installed.
    • preferredDir and auth: Can be ignored for s3.
    • defaultWritable: If set, ensure to configure key and secret for write access in step 1.
    • kwargs: The configuration made in step 1 holds for every s3 connection. If this is not applicable, e.g., due to several resources to different s3 storage system, you can give the parameters individually for each resource in the kwargs JSON. Note, due to auto saving and the small input field, it is recommended to copy and paste the kwargs JSON.

    After configuring, a new filesystem should appear in the left sidebar. Click on it to explore the remote storage resource. It can take some time to load, depending on the connection speed. Files can be opened via double click or right-click Open With from the context menu.

    Image title

Info

Configuring even local filesystems via Jupyter-fs / fsspec is especially advantageous over the built-in Jupyter filebrowser for huge amounts of files (several hundred). However, the approach focuses on browsing and getting an overview of the files, not on copying or complete downloading. For the latter a differnt appoarch based on rclone will be available in the future.

Interacting with Jupyter-fs in a notebook

To interact with the configured remote storage resource inside a notebook, Jupyter-fs provides snippets to get started.

Image title

Right-click on a file, select Snippets and then the action to perform. After clicking, code has been copied to the clipboard that can be pasted in a code cell, e.g., by middle-click or Crtl + V. Then you can run the cell, modify parameters etc. and continue to analyse your data.

Image title

For more information what can be done, see the fsspec-documentation.