Approaching Large Computational Projects

How to Approach a Large Computational Research Project

Below are some general strategies and tips for approaching the design of large computational data analysis project. These strategies are by no means standards of what you should do, rather, suggestions. Every project, person, and team is different for approaching a computational project, so feel free to judge for yourself if these do or do not work for you. Below is the culmination of my experiences and those suggestions that came up during a discussion in the Eisen Lab Coding Club Discussion on May 26, 2020.

How I think of Designing a workflow

Below is a Figure from a paper I wrote with Sara Stoudt and Valeri Vasquez at the Berkeley Institute of Data Science (releasing soon!). This figure illustrates how I see data analysis really occurring in academic data science. You go iterate through phases that are defined by who your audience is for your analysis code. At any phase you can develop research products. The whole workflow and even the design of the workflow constitutes research.

Workflow vs Pipeline Terminology:

General Computational Project Workflow Tips

Image Analysis Workflow Tips

Overly simplified Image Analysis steps:

  1. Input image and explore
  2. Reduce noise and Segment a Feature (back and forth)
  3. Analyze

Tips