Towards More Popsicle Catapults: Statistics as a Branch of Logic

[This letter is part of the Little Letter Republic, a project whose purpose is to build community in St. Louis and beyond.]

Dear Sebastian,

I really enjoyed our discussion. In particular I’ve been thinking a lot about what you called statistics by logic or perhaps verbal statistics. When we spoke I noted that I’ve been thinking a lot about how computation has expanded the pedagogical frontier by allowing “learning by doing” for math & stats via scripting. More specifically I think discretization is an incredibly promising path for teaching a much wider population of students the core concepts of statistical thinking (and calculus and derivation, with statistics as the “practical application”). I’m not the only one thinking this way, see eg. Think Bayes, and of course this is precipitated by realizing that a tremendous amount of real-world computation is done via discretization and simulation.

However our conversation push me to think more broadly, not just about what could be taught effectively if a student learns a little scripting, but also what a student can learn simply through language and imagery.

I’m really coming around to the idea that the core basic ideas of statistical reasoning can be taught through discrete and empirical concepts such as histograms and empirical cumulative density functions. Not just univariate statistics, but multivariate statistics, which are critical for thinking about “real world” statistical modeling. Much of statistical modeling is about capturing the joint and conditional distributions of data, and what I call “F(X), g(F(X)), and F(g(F(X)))” – that is, distributions, functions of distributions, and distributions of functions of distributions (much of inference is about how functions of distributions are themselves distributed).

The more I think about it, the more I think that the foundational concepts for practical real-world statistical reasoning can be effectively taught to students with even a fairly low level of mathematical background, say addition/subtraction and multiplication/division.

To be clear I have in mind laying conceptual foundations, not teaching students how to prove results. The key to thinking about the world in statistical terms is the think about the world as full of potential counterfactuals, realizing that any such counterfactuals are inherently noisy, and understanding that ‘random’ doesn’t mean ‘hopelessly unknowable’ – rather we can (and do!) know quite a lot about randomness. The world shifts into probabilities, and decision-making shifts into “how much probability is needed before a decision can be made?”

The goal of this “conceptual foundations” instruction would be to ‘start with the why’ – “you’re going to be learning probability at some point (perhaps integration and derivation as well) – why?” The math of probability can be dry, the goal of this conceptual foundation would be to provide the motivation / inspiration to learn these.

I have in mind that ‘conceptual foundations’ could be something like a sequence, perhaps following the “F, g(F), F(g(F))” path – first basic distributional qualities (what a RV is, shapes of distributions, and percentiles, and concepts such as mean, median, variance, correlation + multivariate distributions), then functions of distributions (revisit mean, var, cov – ‘think back and notice – these are all functions!’, methods for capturing joint and conditional distributions {OLS, ML methods}), and then distributions of functions of distributions (all via resampling – revisit mean, var, cov, OLS, ML – “look, these are random themselves! what should we do about that? Well, go back to the beginning…”). Also, this naturally introduces the idea of counterfactual reasoning, though that could be introduced earlier and point to here.)

I really do think that a lot of these things can be taught at a conceptual level with discrete distributions and pictures. Depending on the amount of time available, I think some of these things can be directly illustrated by exercises – for example, by playing games or having contests (even ‘against nature’). I think practical interesting projects where groups of students determine eg. which random process has greater mean than another could be run (eg. build little Popsicle-stick catapults, then measure repeatedly what their ranges are, then determine if their ranges can be told apart statistically – then have a game where you use the little catapults; nothing focuses the mind like a little competition! … or even simpler variations on this theme, see the Appendix below for even simpler ideas)

There are other practical experiences that could be incorporated – ‘eyeballing’ data and seeing if something looks strange; making forecasts, both informally (eg. as part of expected-value reasoning in real life) and formally (and assessing forecast errors for example). There are a number of things that can be illustrated visually, and broken down into discrete steps, such that it is accessible much earlier than usual.

Many of these ideas are taught at the college or graduate level (or beyond) – I think these key concepts can be taught at the elementary level.

To be clear, the goal is to teach the reasoning from elementary concepts, to give students a vision of what is possible, give them a ‘why’ for all the perhaps drier math and stats that they will learn in more detail in later courses. I personally find it much easier to learn a hard thing when I know why I’m learning it, when I’ve had some of the inspiration of seeing the power of a tool put to use – I’ve been able to plough through much more difficult material when I know why I’m doing it.

Beyond that however, I’ve found statistical and counterfactual reasoning to be incredibly powerful tools in making my own life decisions and learning about what is happening in the world. Statistics and data are increasingly the languages of knowledge, and this will only increase as we move towards the future. We also need leaders who can think in statistical terms, and the earlier we can teach the concepts the better.

Finally, I think these foundations can be extended – to return to my earlier idea that scripting can greatly extend ones learning ability, if what I have described above is “foundational statistical concepts”, a follow-up sequence could be something like “foundational statistical computation”, which uses textbooks such as Think Bayes, Cosma Shalizi’s Advanced Data Analysis from an Elementary Point of View, and Efron and Hastie’s Computer Age Statistical Inference to implement many of the concepts explored in the “foundational concepts” sequence (and may only require spreadsheets as a minimum). But that is conversation for another time…

Best, Nate

Appendix A – Some Ideas for Basic Concepts

I wanted to jot down some ideas about teaching basic concepts quickly.

I really do think that statistics can be “taught backwards” in a sense – start from higher-level concepts and and big pictures and trace out the major ideas first via illustration. For example I think that the concept of a distribution can be illustrated quite naturally with histograms (discrete distributions), and this can almost immediately be used to illustrate variance and correlations (for example a 2-D distribution of eg. age and height). Or even start simpler with extremely simple games played with dice before progressing to real-life data. Dice alone could illustrate a the ideas of randomness (one die), histograms (2 dice), mean, variance, covariance, and broad shapes of distributions (skew vs non-skewed). Those concepts could be taught first with things like dice and framed as a binary choices, “compare playing a game with a die that is normal, 1,2,3,4,5,6, vs an opponent who gets to use 2,3,4,5,6,7. Which would you choose?” Varying the mean could be made very explicit and varying the variance could be made very explicit, eg. comparing a game played with 1,2,3,4,5,6 vs 1,1,1,1,1,6 (lower variance not always better) vs 1,1,1,1,1,16 (“if all you care about was the average, which dice would you choose to play a game, or indifferent?” “If you liked a game with higher variance which would you choose?” …etc). Single and multiple dice would provide natural ways to talk about shapes of distributions (PMF of 2 regular dice is a symmetric distributions, vs eg. 1 regular die and a 1,1,1,1,1,6 die would produce a skewed distribution).

Depending on the ages of people taught and the time set aside for learning, students could actually play common games, but beforehand choose which of the dice or combination of dice they’d like to use – get a visceral experience of a “same mean but high variance” die vs a “regular” die, or skewed distributions vs symmetric distributions.

This could lead naturally to discussions about more real-world distributions – eg. population data, with marginal and joint distributions of height, weight, age for example. I’ve already been using histograms and demographic examples to causally teach basics of mean, variance, correlation, etc to friends and family, and it works quite well.

On more sophisticated modeling of joint distributions:

At the end of the day a lot of statistical modeling is about capturing joint and conditional distributions of data. When statistics was invented the profession had to use incredibly clever analytical tricks to do this; thus OLS regression is the BLUE of the conditional mean of a joint distribution. But with modern computation we can capture many of these things directly. Part of the success of machine learning has been through exploiting various ways computation can directly capture these joint distributions, and the construction of these estimators are often simpler to learn than the analytics and assumptions needed for understanding OLS. (For example K-nearest neighbors directly captures the idea of the conditional mean by simply taking a local average, rather than working through the tooling of OLS. Of course this requires big data and big computation, but we’re getting more an more of those every day.)

Again, I’m not the only one thinking this way – Cosma Shalizi’s Advanced Data Analysis from an Elementary Point of View points this out explicitly in the first chapter, and Efron and Hastie’s Computer Age Statistical Inference can be seen as the simulation-driven version of this idea for inference.

These kinds of “looks to the future” are important for illustrating an idea to an aspiring student – the “concepts” course wouldn’t endeavor to teach these things, but rather when the questions arises, “wait can we take the average in a way that incorporates many things at once?” – one can answer, ’yes, that’s what regression does, and that is what ML concepts do …quick illustration… and you’ll learn about that in a lot more detail in the future if you’re interested! Good question!”

But again, I think the main ideas of nearly all of commonly used statistics and inference can be taught with addition/subtraction and multiplication/division. The only other concept I’d add is “resampling” which is also I think very straightforward to describe.

Finally, one critical skill is being able to formulate an idea as implying prediction that can be measured, put down probabilities, and then measuring the outcome and checking your ‘forecast error’ – a major role of forecasting is imply framing your ideas in a way you can measure and learn from, an important skill to learn at any stage.