My 2nd year of grad school I took a fantastic course called Computational Methods in Population Biology (ECL 233), taught by my one of my co-advisors, Sebastian Schreiber and one of my QE committee members, Marissa Baskett. The course was pretty evenly split between ecologists and applied math students, and focused on pretty applied mathematical modeling concepts. As a quantitative ecologist with relatively sparse formal mathematical training, but pretty solid computational/R skills, this course was incredible. Being able to implement models in R meant I could poke and prod the models, changing parameters or investigating intermediate values. This helped me translate the more formal mathematical models into data I could work with like any other. Richard McElreath notes this pedagogical benefit in posterior distribution sampling in Bayesian statistics in his fantastic book, Statistical Rethinking1 I’m not usually one to fawn over academic stuff, but Statistical Rethinking absolutely changed the way I think. Not only did it give me a deep appreciation and intuition for Bayesian data analysis, but it is an absolute pedagogical marvel. If you haven’t read it yet, run, don’t walk, to grab a copy..
Back when I took ECL 233, I was a relatively confident R user, but nowhere near as competent as I am now. In particular, I’ve come to embrace a
tidier approach to working with data, trying to keep things in dataframes/tibbles as much as possible. This has been remarkably slick while simulating data for the sake of testing out statistical models or more formal simulation-based calibration (perhaps the topic of a later blog post). You set up parameters in their own columns, so each row has a full set of parameters necessary to generate data, then you store the resulting data in a list-column.
In discrete-time population models, you can’t vectorize everything since the population size is calculated from the population size at the last time step, so you’ve gotta use a
for loop at some point. What this often means is that population simulations are stored in matrices or arrays; for example, you might have a matrix where each column corresponds to an
r value for your model, and each row corresponds to a time step. You then use a
for loop to generate the time series for each
r value’s column. R is pretty nice for working with matrices and arrays, and they’ll often be the fastest/most efficient way to implement big ole models. But in some cases, it would be really nice to be able to use a more
tidy approach, for the sake of plotting and organization. It can be really nice to have all your parameters neatly associated with all your simulated outcomes in a single tibble. I hate having a bunch of disparate vectors and matrices floating around for a single model, so this approach really appeals to me.
To demonstrate this approach, I reformatted one of my ECL 233 homework assignments looking at the Ricker model2 Ricker’s original paper and its ability to generate C H A O S. I’ll show how a tidy approach makes it easy to look at different parameter values, plot our results, and calculate various things about our model.