About Me

I'm a PhD student at the University of Toronto. I'm part of the Computer Science Department, and I research computational biology under Alan Moses.

I focus on unsupervised machine learning techniques for the analysis of microscopy images. I believe that microscopy images contain rich information about biology, but they're underused because analysis of these images has traditionally been subjective and time-consuming, requiring biologists to look at each image manually. This approach is incompatible with current technologies, where robots can take hundreds of thousands of images in a single experiment. I develop ways for computers to "look" at these images, automatically discovering interesting biology for us. In some cases, the computer can identify patterns that are too complex for us to identify by human eye, or organize its findings systematically to make novel biological insights. This allows us to discover new biology from microscopy images, in an objective and systematic way.

Any questions or comments? Feel free to get in touch with me by email:

My Research Projects

Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting (2018 - Preprint)

Citation: Lu AX, Kraus OZ, Cooper S, Moses AM. Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. BioRxiv 2018 Aug 20
Link: https://www.biorxiv.org/content/early/2018/08/20/395954

One of the ways biologists extract information from microscopy images is that they use features, or measurements taken on the image. For example, if you're interested in studying protein subcellular localization, or what part of a cell a protein is located in (e.g. the nucleus or the cytosplasm), you might measure the distance of a protein (the green blobs in the pictures below) from the edge of the cell (in red). Or you might come up with other measures, like the size of these blobs, or how bright they are, or how many there are. Most of the time, humans have to manually design these measurements. However, not only is the process of designing features time-consuming, but it's not guaranteed to measure interesting biology in the images very well, because humans are limited and don't always define complex, robust, and informative metrics.

In this study, we asked, what if a computer could automatically learn how to design features for extracting insights about protein subcellular localization from images of cells? To "teach" a computer how to do this, we asked it to learn how to solve a simple problem: we gave the computer an image of a cell, and another image of the shape of a different cell, and asked the computer to predict what the protein in the first cell would look like if it were expressed in the second cell. By learning to solve this problem, the computer learns features that are much better at categorizing biology in images than human-designed features.

Our method isn't the only method that can teach computers how to design features. Another way is to have a human show the computer images, and tell the computer what the image is (e.g. if it's of a cat, or a dog, or an airplane). Compared to this approach, ours doesn't learn as good features. But these methods require the human to manually label every image the computer is given in its learning process - usually, a computer will need anywere between tens of thousands to millions of images to learn good features. In contrast, our approach is fully automated, so the computer teaches itself without any human guidance. This reflects massive time-savings for humans - imagine having to look at a million images and tell the computer what each of these images are!

An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images (2016)

Citation: Lu AX, Moses AM. An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images. PLoS One 2016 Jul 21;11(7):e0158712.
Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158712
Code: https://github.com/alexxijielu/protein_change_profiles

The localization of a protein within a cell is often very important to its function. For example, transcription factors are active when they're in the nucleus of the cell, but not when they're in the cytoplasm. We can study protein localization changes in response to drug treatments, environmental stresses, or genetic mutations, using high-throughput microscopy screens. In these screens, every single protein has been individually attached to a fluorescent marker, and imaged under a microscope. Then, given a screen where cells have been grown in "normal" conditions, and another screen where cells have been treated with a drug, we can look at the images to see if the protein has changed localization or not. Doing this lets us understand how a cell reconfigures its entire set of proteins in response to different conditions, giving us insight on how cells adapt to different environments.

Because there are so many proteins (~6,000 in the yeast proteome), looking at all of these images individually is time-consuming. We want to use computers to do this for us. However, sometimes different screens will have different effects on cells - for example, drugs might cause the shape of cells to change - that will confuse the computer and make it report that it's found changes when there aren't really any. We developed a machine learning algorithm to correct for these systematic effects, so the changes found by the computer are mostly real localization changes. Importantly, our method is unsupervised. This means that the user does not need to tell the computer what the systematic differences between screens are - the computer automatically learns these from the data itself. This makes our method easier for biologists to apply to their own data without a lot of additional work compared to other methods.

Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins (2018)

Citation: Lu AX, Chong YT, Hsu IS, Strome B, Handfield LF, Kraus O, Andrews BJ, Moses AM. Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins. eLife 2018;7:e31872.
Link: https://elifesciences.org/articles/31872
Protocol and Data: https://bio-protocol.org/e3022

In the project above, I showed how we could automatically discover protein localization changes between different screens. The next question was then to ask - what would happen if we had a whole bunch of different screens, reflecting different stresses and drug treatments? We reasoned that we might discover interesting things about proteins that would not be obvious if we looked at any single screen alone. For example, if a protein would react to many different stresses in the same way, these might be proteins that are responsible for general responses to stress. But in contrast, if a protein reacted to only one stress, it might be a specific kind of protein that only responds in one type of environment. Finding these distinctions deepens our knowledge about how proteins work, but also leads to important clues in developing drug treatments. For example, if you wanted to come up with an anti-cancer drug, you'd want it to target a protein that only responds in cancer cells specifically, but not in the normal, diverse conditions that a cell might encounter.

We analyzed the entire yeast proteome in 24 different screens - integrating over 600,000 different images in this study. We found a lot of cool patterns in the ways different proteins responded, that were unexpected - for example, in addition to finding specific and general responses to stress, we also found proteins that would behave in really different ways in different stresses, or unexpected changes in proteins that were thought to do one thing, but had a change that implied that they were functional in different ways. These analysis help us come up with new biological hypotheses, highlighting biology that biologists wouldn't have previously noticed without these types of big, systematic analyses.