Image Analysis 1 (Jupyter): Slicing, Filters and Sliders 1

Starting to explore image analysis by learning how to slice a stack, apply filters, and create sliders. This is very similar to the other tutorial Marx made on April 7, but more optimized for using with jupyter notebooks. image_analysis_1

Image Analysis 1: Slicing, Filters and Sliders

Introduction

So far we've had two sessions on image analysis in Python. In the first, we installed all the necessary packages and ensured they were all playing nicely with each other (no small task!). In the second, we got our toes wet with opening CZI files and examining their properties and contents via the console as well as with some some simple plotting functions. Hopefully in this session, I'll refresh your memory on all these points and build on them by introducing filters as well as some additional ways of viewing image data with matplotlib.

Getting Organized

First download the following files into whatever working folder you have for this notebook. You can of course put them anywhere, but I find having the files you're using in your working folder makes coding 10x easier, especially when using interactive interfaces like Jupyter notebooks. (The fewer times I have to stretch my pinky to the backslash key, the better!)

The first file is a confocal movie of a developing embryo from my first project in the Eisen lab. I chose it since as a movie of z-stacks with multiple channels, it has a lot of dimensions for us to play around with and get comfortable with slicing high dimensional arrays. (Technically, in numpy jargon each dimension in an array, for example the y-coordinate of an image, is called an axis. The unfortunate overlap with matplotlib jargon is just a burden we have to bear.)

The second ("stax.py") is a module of "helper functions." (A module is just Python jargon for a file of Python code, but it usually has the implication that whatever code inside it is general purpose enough to be re-used.) Since many image processing functions in scipy and scikit-image don't directly operate on multidimensional images, I wrote some functions to streamline my code and avoid the need for writing loops all over the place. We won't actually use it today, but I figured I would introduce it.

The third ("juvu.py") is also a module of "helper functions," but instead focuses on visualizing multidimensional arrays. Some functions are definitely specific to my own analyses, but there are also some very useful functions that output movies or interactive windows for scrolling through different slices of an array.

Starting your session

First open a terminal window, navigate to your working folder, and start a Python session. Feel free to substitute whatever Jupyter commands necessary.

cd ur_werking_dir python

I should note that at least for vanilla Python installations, and I'm sure it's also true of Jupyter as well, it's very important in this case to start your session of Python in whatever folder contains the module of helper functions. Python imports modules much like your shell finds commands, that is, it has a PATH variable (separate but related to your system PATH!) that it searches for matching modules. Typically, Python adds whatever folder it was started from to its PATH. If all of this went over your head, don't worry about. Just start Python from the folder that has stax.py and juvu.py.

Importing modules and loading the CZI

Let's first import all the modules we'll be using today and load the CZI file.

In [6]:
import czifile
import matplotlib.pyplot as plt
import numpy as np
import skimage.filters as filters
import juvu

tchzstack = czifile.imread('180817_dev1.czi')

I call ndarrays (n-dimensional arrays) of images "stacks" generically. In this case, I'm adding "tchz" to remind myself of the identity of each axis. But let's say you don't know that ahead of time. How can you find out? I usually find the most useful first step for parsing a new CZI file is looking at the sizes of the axes. These are accessed with the stack's "shape" attribute.

In [7]:
tchzstack.shape
Out[7]:
(1, 1, 57, 2, 15, 256, 512, 1)

We can see this stack contains eight axes. Starting from the third position, the axes are time, channel, z, y, and x, respectively. I know this because I'm familiar with my imaging parameters, so hopefully you can do the same when you have CZI files of your own.

This only accounts for five of the eight axes, however. What are the other three? The truth is I'm not entirely sure, but I suspect they correspond to other aspects of the CZI specification which are not used in this file, for example different regions in a tiled image. Maybe someone will collect some crazy confocal data one day and figure it out.

Anyway, those extra axes are kind of annoying since they don't store any additional information. (It's kind of like if you were graphing points on an xy plane, and for every ordered pair I told you, "Oh, by the way, z = 0.") We can use the squeeze method of an ndarray to get rid of those spare axes.

In [8]:
tchzstack = tchzstack.squeeze()  # We'll re-assign the squeezed stack to its original name to keep things simple
tchzstack.shape
Out[8]:
(57, 2, 15, 256, 512)

Much better.

Slicing stacks

So what do the images in this stack actually look like? We can use a function from matplotlib to show us, but first we need to extract a single image from the stack using numpy's handy slicing notation. To grab 4th image in a stack from the 1st channel, at the 21st time point we use:

In [9]:
image = tchzstack[20, 0, 3]  # Remember that Python indexing starts from 0

Using this syntax, we haven't specified the ranges for the last two axes, so numpy gives us everything. This is the entire image, which is what we want. However, there are a few different ways of slicing an ndarray, depending on how much we want to type. Before getting into the weeds of numpy slicing, though, let's take a look at what this image actually is.

First though, let's specify what backend we'll tell Jupyter to use. A backend is just the magic under the hood that turns code into lines and colors on the screen. Our plotting library, matplotlib, provides a few backends for Jupyter notebooks that have different strengths. The first we'll use its "inline" backend, which renders our plots as static images in the output cells. My impression is this is usually the default, but we'll be explicit here, by using the command %matplotlib inline. This is a bit of special Jupyter syntax, so Python will throw up an error if you try to use it outside of a notebook. Much like variables, Jupyter remembers the backend between cells, so we'll only have to specify it once.

In [10]:
%matplotlib inline
plt.imshow(image);  # We use a semi-colon to suppress the notebook from printing some extraneous output

It's an embryo, as promised. The first channel corresponds to Zelda-sfGFP, which is why the nuclei are so bright. Feel free to play around with the slicing indices to get display different images.

As mentioned, there are a few ways of slicing ndarrays. For example the following are equivalent to the previous slice.

In [11]:
image = tchzstack[20, 0, 3, ...]  # More explicit; ellipses indicate take every entry in the remaining axes
image = tchzstack[20, 0, 3, :, :]  # Most explicit; colons indicate take every entry in that axis

We can also choose to specify ranges within axes using normal Python slicing notation. Below I'm selecting the region between 101 to 150 along the y-axis and 201 to 300 along the x-axis.

In [12]:
image = tchzstack[20, 0, 3, 100:150, 200:300]
plt.imshow(image);

We can combine these notations in all kinds of crazy ways. For example, numpy is clever and will interpret the ellipses as applying to the remaining axes, even when they're between explicit ranges. The following extracts the region between 101<=y<=150 and 201<=x<=300 for each channel and each z-slice at the 21st time point.

In [13]:
image = tchzstack[20, ..., 100:150, 200:300]
image.shape
Out[13]:
(2, 15, 50, 100)

As expected, we have both channels and all z-slices. Numpy removed the time axis for us, though, since it only had a single entry after slicing.

We can even extract a single pixel.

In [14]:
image = tchzstack[20, 0, 3, 150, 200]
image
Out[14]:
23

In this case our "image" is just a number, 23. That's all these image stacks really are, just organized collections of numbers. More specifically, in their raw form from the microscope they're either 8- or 16-bit unsigned integers, which in English is a whole number between 0 and 255 or 0 and 65,535, respectively. The details aren't all that important for what we're doing today, but the key takeaway is that most image analysis is just fancy arithmetic.

Viewing stacks efficiently

Viewing single images at a time is nice, but what if I told you it was possible to slice a stack interactively without Fiji? Such is the magic of Jupyter's widgets.

...

They're actually not that magical and kind of finicky to set up for something like this, so one of the functions in the juvu module does the hard work for us. It's creatively called viewer. Our current backend doesn't render interactive figures ,though, so we're also going to have to change it to "notebook." Unfortunately, switching backends can sometimes make Jupyter upset. I've dug into this a little, and it seems to originate with some issues that matplotlib has with switching its backend on the fly. If this happens, first try to re-run the cell a few more times. If that doesn't help, re-start your kernel. Worst comes to worst, we can use the "notebook" backend to render the entire notebook.

In [17]:
%matplotlib notebook
juvu.viewer(tchzstack, ('t', 'ch', 'z'));  # Accepts labels for each axis; otherwise numbered

These sliders are a little slow, likely because either the default Jupyter refresh rate is low or because Jupyter sits on top of so many layers that it takes some time for the changes to propagate to the backend. I have a version of this code that renders sliders outside of a notebook using matplotlib's widgets, but they're also a little a finicky. I've toyed with other GUI packages, like Qt, but I haven't had the time to fully explore it. If anyone's bored, a session on GUIs would be a great addition to our coding club series!

The viewer function is flexible and can accept stacks of arbitrary dimensions. Let's see how it handles a normal z-stack.

In [20]:
zstack = tchzstack[20, 0, ...]
juvu.viewer(zstack, 'z');

juvu has a different function for viewing a stack as a movie. It only accepts "1stacks," however, that is stacks with only one axis in addition to x and y. There's also this bizarre quirk with the matplotlib animation library that it only renders a movie if you assign to some variable. For example, the movie won't play if the second line reads as juvu.movie_1stack(tstack, refresh=10);. This only happens sometimes, though. When I was using a different backend in previous versions, it worked fine. Don't ask me why.

In [21]:
tstack = tchzstack[:, 1, 3, ...]  # 2nd channel, 4th z-slice, all time points
movie = juvu.movie_1stack(tstack, refresh=10);

There is also a function for viewing a "2stack" as a movie. I intended this as a way to simultaneously view every slice of a z-stack over time, but due to inefficient use of space by the matplotlib layout engine, it falls somewhat short of that goal. Take a look. (By the way, this function requires a "shape" parameter to set the number of rows and columns. In this case, it's 3x5 since there are 15 z-slices.)

In [22]:
tzstack = tchzstack[:, 0, ...]
movie = juvu.movie_2stack(tchzstack[:, 0, ...], (3, 5));

I'm sure you're all wondering at this point how we can save these movies to use in presentations. That, unfortunately, is its own journey that I'll save for a future session.

Filters

So now that we've seen what's in our stack, how do we actually start processing the images? That depends a lot on what you're doing, but here I'll try to introduce some basic operations that you can use to build more complex pipelines.

One common operation for reducing the complexity of a stack (at least for visualization purposes) is a max projection. We can do this with a single command!

In [23]:
maxz = tchzstack.max(axis=2)
juvu.viewer(maxz, ('t', 'ch'))

We specify axis=2 to tell numpy to take the maximum only over the z-axis. Note we're not using the built-in max function, but rather a method of the ndarray. If we try max(tchzstack), we'll get an error since the built-in Python function lives in one-dimensional world and tries to compare slices, which makes numpy upset.

Max project is a little boring, so let's do something more interesting and apply a Gaussian filter. Gaussian filters are often the first step of image processing pipelines since by blurring the image, they remove noisy pixels which may trip up later steps. As a side note, manipulating an image as part of segmentation pipeline is entirely acceptable (unless of course you're specifically trying to bias your segmentation to achieve some downstream result). However once you've extracted the regions of interest from the image, you must use the raw data to calculate any statistics. There are almost certainly exceptions to this rule, but it stands as a general principle.

Before we use a Gaussian filter, let's talk about what a filter actually does. Most broadly, a filter in image analysis is essentially another word for a function we apply to an image, for example adding 1 to every pixel. However, in practice we often use it to refer more specifically to a kind of local averaging, meaning that for every pixel in our image, we look around at some neighborhood around that pixel, and compute a new value for that pixel based on some kind of weighted average of all the surrounding pixels. In the simplest cases, the neighborhood is a square, and all the pixels in that square are weighted equally. Of course, there's no limit to the complexity of the window or the weighting we use. For example, in a Gaussian filter, the window approximates a circle and the weights are based on a Gaussian (normal) distribution. Formally, the window together with its weighting function is called the "kernel," and the process of applying this kernel to an image is called "convolution."

If all this went over your head, don't worry. It's most intuitive to see this visually. If you Google "kernel convolution," you'll find many diagrams. I also encourage you to check out the slides from this tutorial (https://github.com/WhoIsJack/python-bioimage-analysis-tutorial), which have a step-by-step demonstration of a convolution on a 1D array.

Anyway, fortunately we don't have to worry about the details of this process, since other people have already coded it for us. Though scipy has an image-processing module, it's a little sparse, so we'll instead be using scikit-image, which has all the same basic functions as scipy in addition to many others. We'll first test it on the single slice we were using earlier.

In [24]:
%matplotlib inline
image = tchzstack[20, 0, 3]
blur = filters.gaussian(image, sigma=3)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,9))
ax1.imshow(image);
ax2.imshow(blur);
Traceback (most recent call last):
  File "/Users/cierapink/opt/miniconda3/lib/python3.7/site-packages/matplotlib/cbook/__init__.py", line 196, in process
    func(*args, **kwargs)
  File "/Users/cierapink/opt/miniconda3/lib/python3.7/site-packages/matplotlib/animation.py", line 1467, in _stop
    self.event_source.remove_callback(self._loop_delay)
AttributeError: 'NoneType' object has no attribute 'remove_callback'

On the left is the original image, and on the right is the filtered one. You can see it's been aggressively blurred, which was intentionally done to demonstrate its effect. We can always dial it back later by decreasing sigma, which is effectively the radius of our kernel.

You might notice the blurred image appears more intense, which is actually an artifact of how matplotlib displays images. It automatically scales the color bar between the maximum and minimum value. This is usually convenient, but it can trick our eyes if we're not aware of it.

How do actually decide where to set sigma? From my experience, this is more art than science, unfortunately. I've spent an inordinate amount of time stuck in a loop of applying a filter and displaying the image in an attempt to find the best parameters. I eventually wrote a modified version of the viewer function that streamlined this process. In yet another burst of creativity, I called it viewer_filter. Let's use it on our image and the Gaussian filter. It has a somewhat involved set of parameters, so I'll first write the entire function call and unpack it afterwards.

In [25]:
%matplotlib notebook
import matplotlib.pyplot as plt
juvu.viewer_filter(tchzstack, filters.gaussian, {'sigma': (0, 10, 5, 0.5)}, labels=('t', 'ch', 'z'), vmin=0, vmax=0.25)

The first two arguments are fairly straightforward. Notice, however, filters.gaussian does not have parentheses. This means we're referencing the function by name rather than actually executing it. The third argument dictates the slider parameters and is the most complex. It's a dictionary where each parameter in the filter function is keyed by its name to a tuple of (min slider value, max slider value, initial slider value, slider increment). The last argument is labels for the sliders that slice the stack.

I've also used the vmin and vmax arguments to manually set the minimum and maximum value of the color scale. If you don't set them, the viewer will automatically scale the color between the min and max of the image currently displayed, which can make comparing slices difficult. Unfortunately, a few nuances make setting a global min and max in advance a little tricky. The first is the skimage filters automatically rescale the input image to floating-point (decimal) number. So for an 8-bit image, it divides all the values by 255. The second is that for some filters, especially ones where you take differences of images, the original scale becomes totally meaningless anyway. The best approach is to normalize the color scale to the min and max values of the entire filtered stack, but since there's essentially no way to know in advance what these will be, we're left with using trial-and-error.

As a quick note, some of scikit-image's filters, including its Gaussian filter, generalize to ndarrays. This means filtering a single image will yield a different result than filtering a z-stack, since the latter is averaging over the additional z dimension. If you feed the filter an entire tchzstack, it will average over time and channels too, which is probably not what you want, so be mindful of your inputs.

juvu has a second filter viewer function called viewer2_filter, which presents a before and after view.

In [26]:
juvu.viewer2_filter(tchzstack, filters.gaussian, {'sigma': (0, 10, 5, 0.5)}, labels=('t', 'ch', 'z'))

We can combine filters in increasingly complex ways as well! Check out the following two, which implement a nearly complete thresholding pipeline.

In [28]:
def DoG(im, s1, s2):  # difference of gaussians (contrast enhancer)
    return filters.gaussian(im, s1) - filters.gaussian(im, s2)

juvu.viewer_filter(tchzstack, DoG, {'s1': (0, 5, 0.1, 0.1), 's2': (0, 10, 0.5, 0.5)}, ('t', 'ch', 'z'))
juvu.viewer_filter(tchzstack, lambda im, s1, s2, thresh: DoG(im, s1, s2) > thresh, {'s1': (0, 5, 0.1, 0.1), 's2': (0, 10, 0.5, 0.5), 'thresh': (0, 0.25, 0, 0.005)}, ('t', 'ch', 'z'))

Conclusion

Anyway, I'll bring it to a close here. We've really only scratched the surface of image analysis so far. In the future, hopefully we can continue to dive into more advanced filters, including morphological filters, applying filters and thresholds in bulk, and converting raw microscopy data to RGB images.