A tidy framework and infrastructure to systematically assemble spatio-temporal indexes from multivariate data
Indexes are useful for summarising multivariate information into a single metric for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, most indexes are not designed and implemented in a way that makes it easy to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. I developed a modular data pipeline recommendation to assemble indexes, and it allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes’ robustness. Figure 1 shows the Global Gender Gap Index, comprised of four dimensions (economy, education, health, and politics) in a linear combination with equal weights of 0.25. The tour animation shows how the index value and country ranking changes as the weight asigned to the politics dimension changes. This work has been ublished in Journal of Computational and Graphical Statistics.
cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data
For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for others, it is important to analyze different aspects of the spatio-temporal data simultaneously, for instance, temporal trends of multiple variables across locations. In order to facilitate the study of different portions or combinations of spatio-temporal data, we introduce a new class, cubble, with a suite of functions enabling easy slicing and dicing on different spatio-temporal components. Figure 2 is created by analysing the daily maximum temperature data form Global Historical Climatology Network (GHCN) across stations in two Australia states, using the cubble data structure and the glyph maps are created using the geom_glyph()
function, also implemented in the cubble package, as follows:
|>
tmax ggplot(aes(x_major = long, x_minor = month, y_major = lat, y_minor = tmax, ...)) +
geom_sf(..., inherit.aes = FALSE) +
geom_glyph_box(...) +
geom_glyph(...) +
...
This work has been accepted by Journal of Statistical Software and won the ASA John M. Chambers Statistical Software Award.
Visual Diagnostics for Constrained Optimisation with Application to Guided Tours
Projection pursuit is a technique used to find interesting low-dimensional linear projections of high dimension data by optimising an index function on projection matrices. The index function could be non-linear, computationally expensive to calculate the gradient, and may have local optima, which are also interesting for projection pursuit to explore. This work has designed four diagnostic plots to visualise the optimisation routine in projection pursuit, and Figure 3 is one of them, plotting two optimisation paths in a 5D unit sphere space. This work has been published in the R Journal.