Image Analysis 1: Slicing, Filters and Sliders 1

Starting to explore image analysis by learning how to slice a stack, apply filters, and create sliders. This is for use with the Matplotlib pop out window. See the April 14th version of this tutorial to work better with Jupyter Notebooks inline viewer. image_analysis_1

Image Analysis 1: Slicing, Filters and Sliders

Introduction

So far we've had two sessions on image analysis in Python. In the first, we installed all the necessary packages and ensured they were all playing nicely with each other (no small task!). In the second, we got our toes wet with opening CZI files and examining their properties and contents via the console as well as with some some simple plotting functions. Hopefully in this session, I'll refresh your memory on all these points and build on them by introducing filters as well as some additional ways of viewing image data with matplotlib.

Getting Organized

First download the following files into whatever working folder you have for this notebook. You can of course put them anywhere, but I find having the files you're using in your working folder makes coding 10x easier, especially when using interactive interfaces like Jupyter notebooks. (The fewer times I have to stretch my pinky to the backslash key, the better!)

The first file is a confocal movie of a developing embryo from my first project in the Eisen lab. I chose it since as a movie of z-stacks with multiple channels, it has a lot of dimensions for us to play around with and get comfortable with slicing high dimensional arrays. (Technically, in numpy jargon each dimension in an array, for example the y-coordinate of an image, is called an axis. The unfortunate overlap with matplotlib jargon is just a burden we have to bear.)

The second ("stax.py") is a module of "helper functions." (A module is just Python jargon for a file of Python code, but it usually has the implication that whatever code inside it is general purpose enough to be re-used.) Since many image processing functions in scipy and sklearn don't directly operate on multidimensional images, I wrote some functions to streamline my code and avoid the need for writing loops all over the place. We won't actually use it today, but I figured I would introduce it.

The third ("vu.py") is also a module of "helper functions," but instead focuses on visualizing multidimensional arrays. Some functions are definitely specific to my own analyses, but there are also some very useful functions that output movies or interactive windows for scrolling through different slices of an array.

Starting your session

First open a terminal window, navigate to your working folder, and start a Python session. Feel free to substitute whatever Jupyter commands necessary.

cd ur_werking_dir python

I should note that at least for vanilla Python installations, and I'm sure it's also true of Jupyter as well, it's very important in this case to start your session of Python in whatever folder contains the module of helper functions. Python imports modules much like your shell finds commands, that is, it has a PATH variable (separate but related to your system PATH!) that it searches for matching modules. Typically, Python adds whatever folder it was started from to its PATH. If all of this went over your head, don't worry about. Just start Python from the folder that has stax.py and vu.py.

Importing modules and loading the CZI

Let's first import all the modules we'll be using today and load the CZI file.

In [1]:
import czifile
import matplotlib.pyplot as plt
import numpy as np
import skimage.filters as filters
import stax
import vu

tchzstack = czifile.imread('180817_dev1.czi')

I call ndarrays (n-dimensional arrays) of images "stacks" generically. In this case, I'm adding "tchz" to remind myself of the identity of each axis. But let's say you don't know that ahead of time. How can you find out? I usually find the most useful first step for parsing a new CZI file is looking at the sizes of the axes. These are accessed with the stack's "shape" attribute.

In [2]:
tchzstack.shape
Out[2]:
(1, 1, 57, 2, 15, 256, 512, 1)

We can see this stack contains eight axes. Starting from the third position, the axes are time, channel, z, y, and x, respectively. I know this because I'm familiar with my imaging parameters, so hopefully you can do the same when you have CZI files of your own.

This only accounts for five of the eight axes, however. What are the other three? The truth is I'm not entirely sure, but I suspect they correspond to other aspects of the CZI specification which are not used in this file, for example different regions in a tiled image. Maybe someone will collect some crazy confocal data one day and figure it out.

Anyway, those extra axes are kind of annoying since they don't store any additional information. (It's kind of like if you were graphing points on an xy plane, and for every ordered pair I told you, "Oh, by the way, z = 0.") We can use the squeeze method of an ndarray to get rid of those spare axes.

In [3]:
tchzstack = tchzstack.squeeze()  # We'll re-assign the squeezed stack to its original name to keep things simple
tchzstack.shape
Out[3]:
(57, 2, 15, 256, 512)

Much better.

Slicing stacks

So what do the images in this stack actually look like? We can use a function from matplotlib to show us, but first we need to extract a single image from the stack using numpy's handy slicing notation. To grab 4th image in a stack from the 1st channel, at the 21st time point we use:

In [4]:
image = tchzstack[20, 0, 3]  # Remember that Python indexing starts from 0

Using this syntax, we haven't specified the ranges for the last two axes, so numpy gives us everything. This is the entire image, which is what we want. However, there are a few different ways of slicing an ndarray, depending on how much we want to type. Before getting into the weeds of numpy slicing, though, let's take a look at what this image actually is.

In [5]:
%matplotlib inline
plt.imshow(image);  # We use a semi-colon to suppress the notebook from printing some extraneous output

It's an embryo, as promised. The first channel corresponds to histone-RFP, which is why the nuclei are so bright. Feel free to play around with the slicing indices to get display different images.

As mentioned, there are a few ways of slicing ndarrays. For example the following are equivalent to the previous slice.

In [6]:
image = tchzstack[20, 0, 3, ...]  # More specific; ellipses indicate take every entry in the remaining axes
image = tchzstack[20, 0, 3, :, :]  # Most specific; colons indicate take every entry in that axis

We can also choose to specify ranges within axes using normal Python slicing notation. Below I'm selecting the region between 101 to 150 along the y-axis and 201 to 300 along the x-axis.

In [7]:
image = tchzstack[20, 0, 3, 100:150, 200:300]
plt.imshow(image);

We can combine these notations in all kinds of crazy ways. For example, numpy is clever and will interpret the ellipses as applying to the remaining axes, even when they're between explicit ranges. The following extracts the region between 101<=y<=150 and 201<=x<=300 for each channel and each z-slice at the 21st time point.

In [8]:
image = tchzstack[20, ..., 100:150, 200:300]
image.shape
Out[8]:
(2, 15, 50, 100)

As expected, we have both channels and all z-slices. Numpy removed the time axis for us, though, since it only had a single entry after slicing.

We can even extract a single pixel.

In [9]:
image = tchzstack[20, 0, 3, 150, 200]
image
Out[9]:
23

In this case our "image" is just a single number, in this case 23. That's all these image stacks really are, just organized collections of numbers. More specifically, in their raw form from the microscope they're either 8- or 16-bit unsigned integers, which in English is a whole number between 0 and 255 or 0 and 65,535, respectively. The details aren't all that important for what we're doing today, but the key takeaway is that most image analysis is just fancy arithmetic.

Viewing stacks efficiently

Viewing single images at a time is nice, but what if I told you it was possible to slice a stack interactively without Fiji. Such is the magic of matplotlib's widgets.

...

They're actually not that magical and kind of finicky to set up, so one of the functions in the vu module does the hard work for us. It's creatively called viewer. Unfortunately, Jupyter notebooks don't render interactive figures, so we're also going to have to change the display settings.

In [10]:
%matplotlib qt
vu.viewer(tchzstack, ('t', 'ch', 'z'));  # Accepts labels for each axis; otherwise numbered

These sliders tend to freeze easily, so I recommend you don't actually "slide" your mouse across them, but instead click where you want to set them. Honestly, Jupyter has better built-in widgets, but the nice thing about these is you can use them outside of a notebook. (The real reason is I don't know how to use Jupyter widgets. I've toyed around with other GUI packages, like Qt, but I haven't had the time to fully explore it. If anyone's bored, a session on GUIs would be a great addition to our coding club series!)

The viewer function is flexible and can accept stacks of arbitrary dimensions. Let's see how it handles a normal z-stack.

In [11]:
zstack = tchzstack[20, 0, ...]
vu.viewer(zstack, 'z');

vu has a different function for viewing a stack as a movie. It only accepts "1stacks," however, that is stacks with only one axis in addition to x and y.

In [12]:
tstack = tchzstack[:, 1, 3, ...]  # 2nd channel, 4th z-slice, all time points
vu.movie_1stack(tstack, refresh=10);

There is also a function for viewing a "2stack" as a movie. I intended this as a way to simultaneously view every slice of a z-stack over time, but due to inefficient use of space by the matplotlib layout engine, it falls somewhat short of that goal. Take a look.

(By the way, this function requires a "shape" parameter to set the number of rows and columns. In this case, it's 3x5 since there are 15 z-slices.)

In [13]:
tzstack = tchzstack[:, 0, ...]
vu.movie_2stack(tchzstack[:, 0, ...], (3, 5));

I'm sure you're all wondering at this point how we can save these movies to use in presentations. That, unfortunately, is its own journey that I'll save for a future session.

Filters

So now that we've seen what's in our stack, how do we actually start processing the images? That depends a lot on what you're doing, but here I'll try to introduce some basic operations that you can use to build more complex pipelines.

One common operation for reducing the complexity of a stack (at least for visualization purposes) is a max projection. We can do this with a single command!

In [14]:
maxz = tchzstack.max(axis=2)
vu.viewer(maxz, ('t', 'ch'))

We specify axis=2 to tell numpy to take the maximum only over the z-axis. Note we're not using the built-in max function, but rather a method of the ndarray. If we try max(tchzstack), we'll get an error since the built-in Python function lives in one-dimensional world and tries to compare slices, which makes numpy upset.

Max project is a little boring, so let's do something more interesting and apply a Gaussian filter. Gaussian filters are often the first step of image processing pipelines since by blurring the image, they remove noisy pixels which may trip up later steps. As a side note, manipulating an image as part of segmentation pipeline is entirely acceptable (unless of course you're specifically trying to bias your segmentation to achieve some downstream result). However once you've extracted the regions of interest from the image, you must use the raw data to calculate any statistics. There are almost certainly exceptions to this rule, but it stands as a general principle.

Before we use a Gaussian filter, let's talk about what a filter actually does. Filtering in image analysis is essentially of kind of local averaging. For every pixel in our image, we look around at some neighborhood around that pixel, and compute a new value for that pixel based on some kind of weighted average of all the surrounding pixels. In the simplest cases, the neighborhood is a square, and all the pixels in that square are weighted equally. Of course, there's no limit to the complexity of the window or the weighting we use. For example, in a Gaussian filter, the window approximates a circle and the weights are based on a Gaussian (normal) distribution. Formally, the window together with its weighting function is called the "kernel" and the process of applying this kernel to an image is called "convolution."

If all this went over your head, don't worry. It's most intuitive to see this visually. If you Google "kernel convolution" you'll find many diagrams. I also encourage you to check out the slides from this tutorial (https://github.com/WhoIsJack/python-bioimage-analysis-tutorial), which have a step-by-step demonstration of a convolution on a 1D array.

Anyway, fortunately we don't have to worry about the details of this process, since other people have already coded it for us. Though scipy has an image-processing module, it's a little sparse, so we'll instead be using scikit-image, which has all the same basic functions as scipy, in addition to many others. We'll first test it on the single slice we were using earlier.

In [15]:
%matplotlib inline
image = tchzstack[20, 0, 3]
blur = filters.gaussian(image, sigma=3)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,9))
ax1.imshow(image);
ax2.imshow(blur);

On the left is the original image, and on the right is the filtered one. You can see it's been aggressively blurred, which was intentionally done to demonstrate its effect. We can always dial it back later by decreasing sigma, which is effectively the radius of our kernel.

You might notice the blurred image appears more intense, which is actually an artifact of how matplotlib displays images. It automatically scales the color bar between the maximum and minimum value. This is usually convenient, but it can sometimes trick our eyes. We can manually set the maximum and minimum value of the color bar with the vmin and vmax parameters of imshow.

How do actually decide where to set sigma? From my experience, this is more art than science, unfortunately. I've spent an inordinate amount of time stuck in a loop of applying a filter and displaying the image in an attempt to find the best parameters. I eventually wrote a modified version of the viewer function that streamlined this process. In yet another burst of creativity, I called it viewer_filter. Let's use it on our image and the Gaussian filter. It has a somewhat involved set of parameters, so I'll first write the entire function call and unpack it afterwards.

In [16]:
%matplotlib qt
vu.viewer_filter(tchzstack, filters.gaussian, {'sigma': (0, 10, 5, 0.5)}, labels=('t', 'ch', 'z'))

The first two arguments are fairly straightforward. Notice, however, filters.gaussian does not have parentheses. This means we're referencing the function by name rather than actually executing it. The third argument dictates the slider parameters and is the most complex. It's a dictionary where each parameter in the filter function is keyed by its name to a tuple of (min slider value, max slider value, initial slider value, slider increment). The last argument are labels for the sliders that slice the stack.

There are also vmin and vmax arguments that were my attempt to rectify the normalization issues I mentioned previously, but I still have to work out some bugs before they're useable.

As a quick note, scikit-image's filters generalize to ndarrays. This means filtering a single image will yield a different result than filtering a z-stack, since the latter is averaging over the additional z dimension. If you feed the filter an entire tchzstack, it will average over time and channels too, which is probably not what you want, so be mindful of your inputs.

vu has a second viewer_filter function called viewer1_filter, which presents a before and after view.

In [17]:
vu.viewer2_filter(tchzstack, filters.gaussian, {'sigma': (0, 10, 5, 0.5)}, labels=('t', 'ch', 'z'))

We can combine filters in increasingly complex ways as well! Check out the following two, which implement a nearly complete thresholding pipeline.

In [18]:
def DoG(im, s1, s2):  # difference of gaussians (contrast enhancer)
    return filters.gaussian(im, s1) - filters.gaussian(im, s2)

vu.viewer_filter(tchzstack, DoG, {'s1': (0, 5, 0.1, 0.1), 's2': (0, 10, 0.5, 0.5)}, ('t', 'ch', 'z'))
vu.viewer_filter(tchzstack, lambda im, s1, s2, thresh: DoG(im, s1, s2) > thresh, {'s1': (0, 5, 0.1, 0.1), 's2': (0, 10, 0.5, 0.5), 'thresh': (0, 0.25, 0, 0.005)}, ('t', 'ch', 'z'))

Conclusion

Anyway, I'll bring it to a close here. We've really only scratched the surface of image analysis so far. In the future, hopefully we can continue to dive into more advanced filters, including morphological filters, applying filters and thresholds in bulk, and converting raw microscopy data to RGB images.