Skip to content

Advanced Research ComputingUsing Jupyter Notebook and JupyterLab


Setting up a complex Jupyter configuration

The ARC Open OnDemand instances offer you some fixed choices of Python to choose from. The Anaconda distribution(s) will cover most common scenarios except some machine learning packages, e.g., Torch, TensorFlow.

This page gives a step-by-step guide to installing a particular set of Python packages in a virtual environment and then setting that virtual environment up so it can be used with the Jupyter Notebook and JupyterLab applications.

These instructions assume that you are starting with a 'clean slate', by which we mean there are no active Conda environments and nothing is installed in ~/.local. These may work otherwise, but that is the environment that was tested.

This assumes that you want to create the virtual environment in your home directory (~); if you want it elsewhere, change the ~ in the initial cd to the desired directory name. All of the commands that follow are preceded by a $ prompt. If you copy and paste, do not copy the prompt. If there are lines without an initial $, those are output, typically of the command that precedes them.

$ cd ~

To be extra safe, we deactivate any Conda environments. It's fine if running this tells you that The command: conda could not be found. This is a safety measure for those for whom it is found.

$ conda deactivate

The first thing we want to do is insure that we have all, and only, the modules needed for our project, which is climate modeling. To do that we first remove all loaded modules, then load the ones we need.

# Clear modules
module purge

# Load the needed modules
$ module load python3.9-anaconda/2021.11 gcc/10.3.0 \
    proj/9.0.0 geos/3.10.2

The next two steps install an up-to-date version of pip and of virtualenv into your ~/.local directory, which should be in your default PATH.

$ pip install --upgrade --user pip
$ pip install --user virtualenv

We verify this by checking where the system will find the virtualenv and pip commands.

$ which pip
~/climatepy/bin/pip

$ which virtualenv
~/.local/bin/virtualenv

The next command creates the virtual environment in the directory climatepy in the current directory. We are using the -p to specifically request that the python that comes with the python3.9-anaconda module that we loaded be the one in use in the virtual environment. Do NOT forget that!

Once the virtual environment has been created, it needs to be activated, which is done with the second command.

$ virtualenv -p $(which python) climatepy
$ source climatepy/bin/activate
(climatepy) $
An activated virtual environment will change your prompt to have the name of the environment in parentheses before the rest of the prompt. Your prompt will likely look more like this sample (the hostname may be different).

(climatepy) [yourname@gl-login1 ~]$

We have a large list of packages that we are going to install, but when we installed these on our own computer, we noticed that quite a few are dependencies of others and will be installed automatically. Here is the comment list of pip commands to install our desired environment.

# Install the needed packages

# metpy installs numpy, scipy, pandas, matplotlib, proj, pyproj, xarray
pip install metpy
pip install cmocean
pip install netCDF4
pip install glob2
pip install geos
# cartopy install shapely, pyshp
pip install cartopy
pip install proj

# pyyaml is needed by proj but not installed as a dependency; Bad people! Bad!
pip install pyyaml

Note here that two of these packages need the proj and geos libraries available both to install properly and to run.

We are now done installing the Python packages we need for our science and data analysis.

Registering the virtual environment with Jupyter

Now that we have a fully installed virtual environment, we need to tell Jupyter where it is and how to use it. Note that we already have Jupyter because it is installed with the Anaconda Python distribution we are using.

We first have to install a Python package that does the registration, then we use it. Note that the name is independent of the name of the virtual environment, but it will lessen future confusion if you use the same name in both places.

$ pip install ipykernel
$ python -m ipykernel install --user --name=climatepy

You can now double-check that it is installed as a Jupyter kernel with

$ jupyter kernelspec list
Available kernels:
  climatepy    /home/<yourname>/.local/share/jupyter/kernels/climatepy
  python3      /home/<yourname>/climatepy/share/jupyter/kernels/python3

Using the new kernel

For this example, we will use the Great Lakes cluster as an example, but the steps are the same for Armis or Lighthouse.

For this set of Python packages, we needed to have additional software modules loaded in addition to the python3.9-anaconda module. You enter these into the 'Module commands' box.

Module commands box showing which modules to load.

You need these modules whether you are using Jupyter Notebook or JupyterLab.

Using with Jupyter Notebook

Go to the Great Lakes cluster and bring up the Jupyter Notebook form.

Once your Jupyter notebook has started, you should see climatepy listed as an available kernel under the 'New' pull down menu.

New menu showing climatepy.

Using with JupyterLab

Go to the Great Lakes cluster and bring up the JupyterLab form.

Once your JupyterLab has started, you should see a Python icon for climatepy in both the Notebook and Console portions of the JupyterLab Launcher pane.

JupyterLab Launcher with climatepy icons.