As two researchers at Carnegie Mellon said so eloquently in a recent paper, "One of the major pursuits of science is to explain complex phenomenon in simple terms."

Over the past month I've seen this "simple" theme play out in a number of ways. Even in entirely different situations the end goal is always the same: take a highly complicated slew of information and mold it into something usable. A prime example of this presents with high dimensional data sets, but I've also heard it discussed at length in the context of clinical applicability.

John P. Cunningham and Byron M. Yu's paper gives a great explanation of when dimensionality reduction can cut through an enormous cloud of data and pull out the meaningful features. Their research is in neuroscience, so the need to condense input without losing the connections between the neurons is key. In the paper,

Interestingly, I also saw a very similar rationale exercised by the Agency for Healthcare Research and Quality's (AHRQ) in developing a condensed socioeconomic index score. Given a huge number of potential demographic variables from which to choose, the AHRQ turned to Principal Components Analysis (PCA) to dimensionally reduce the selection and then weight the resulting variables. Very different application, but very similar desired outcome.

I mentioned clinical applicability because it has very strong parallels to dimensionality reduction: take a very dynamic picture and distill down to a simple result. For example, given the dose-response relationship and pharmacokinetics of drug X, how much should be administered? Some situations call for categorical or dichotomized dependent variables, while others involve more sophisticated modeling that comes down to p-values. It's an interesting dialogue that doesn't have one right answer. Nevertheless, it's an important theme to keep in mind when thinking about how we ask questions.

Over the past month I've seen this "simple" theme play out in a number of ways. Even in entirely different situations the end goal is always the same: take a highly complicated slew of information and mold it into something usable. A prime example of this presents with high dimensional data sets, but I've also heard it discussed at length in the context of clinical applicability.

John P. Cunningham and Byron M. Yu's paper gives a great explanation of when dimensionality reduction can cut through an enormous cloud of data and pull out the meaningful features. Their research is in neuroscience, so the need to condense input without losing the connections between the neurons is key. In the paper,

*Dimensionality reduction for large-scale neural recordings*(Nature Neuroscience, August 2014), the authors discuss why and how dimensionality reduction can pull out underlying explanatory variables, often referred to as latent variables. The process allows for sanity-checking visualizations as well as data-driven hypothesis development.Interestingly, I also saw a very similar rationale exercised by the Agency for Healthcare Research and Quality's (AHRQ) in developing a condensed socioeconomic index score. Given a huge number of potential demographic variables from which to choose, the AHRQ turned to Principal Components Analysis (PCA) to dimensionally reduce the selection and then weight the resulting variables. Very different application, but very similar desired outcome.

I mentioned clinical applicability because it has very strong parallels to dimensionality reduction: take a very dynamic picture and distill down to a simple result. For example, given the dose-response relationship and pharmacokinetics of drug X, how much should be administered? Some situations call for categorical or dichotomized dependent variables, while others involve more sophisticated modeling that comes down to p-values. It's an interesting dialogue that doesn't have one right answer. Nevertheless, it's an important theme to keep in mind when thinking about how we ask questions.