I thought the moment was perfect to give y’all an update on the latest approaches I’ve been using to share data across {shiny} modules, along with some thoughts and comments on the "stratégie du petit r".
But I’m trying to keep things easy to maintain. Given the current size of the codebase, adding more layers or going deeper would make the code far more complex and harder to maintain with no real benefit. So yes, some of these modules are not perfect, and they might not be doing “just one thing”.
You know what they say: “perfect is the enemy of good.”
I’ve come to terms with this idea for two reasons:
Data frames are lists, and I don’t see any good reason to forbid passing a data.frame as an argument to a function.
JavaScript is full of functions that take scalar values and a list of parameters, and it works well.
For example, making an HTTP request in JS looks like this:
fetch(
"/api/users",
{
method: "GET",
headers: {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": "Bearer YOUR_TOKEN",
}
}
)
Modules usually live in two scopes:
They do things within themselves
They do things that need to be passed to other modules
Doing things within themselves is pretty standard and doesn’t require a lot of thought (as long as you don’t forget the ns() 😅), but sharing things from one module to another in a reactive context can be more challenging
One thing I’ve learned over the years is that what works for example apps can be a nightmare in a production context. The official Shiny docs recommend the following pattern: return one or more reactive() objects that can be passed to other modules.
If you feel like it’s a mess and complex to reason about, that’s because it is. And we’re in a simple case where data travels at the same depth in the stack.
As a side note, I think reactive() objects are conceptually neat, but I don’t think they should be your go-to building block.
It’s built in React, and it works just like {shiny} does (well, from a conceptual point of view): you have stateful objects, and when these objects change, they trigger another part of the app to be recomputed. In our case, whenever you interact with the first tab, the second tab (with the visualization) is updated.
To sum up, some objects are created at the top level and used to share data and trigger reactivity from one “module” to the other.
Note: my colleague Arthur pointed that Vue.js has something called store in Pinia. I’m not exactly sure how it works but apparently it’s more or less the same as reactiveValues. And Claude confirmed it 😄
THE “STRATÉGIE DU PETIT R”
One strategy we recommended is what we called the “stratégie du petit r”. Looking back, I can admit that it was a poor choice of name, but you know, sh*t happens.
The principle is quite simple: instead of returning and passing reactive() objects as arguments, you create one or more reactiveValues() at an upper level, which you then pass downstream to lower-level modules. reactiveValues() behave a lot like environments, meaning that values set down the stack are available everywhere.
I still think this is a valid way to share data, but only if you avoid applying it too literally and focus on how to work with it in practice.
The main criticism I’ve read about this approach is that you’ll end up with a huge r object with 300 entries in it, creating a monster that’s impossible to debug.
So yes, these monsters exist. But I don’t think the idea itself is the problem. It’s always easier to blame the tool than to acknowledge the lack of understanding behind its misuse. Or, as Beckett wrote, “Voilà l’homme tout entier, s’en prenant à sa chaussure alors que c’est son pied le coupable.” (“There’s man all over for you, blaming on his boots the faults of his feet.”)
Here are some random thoughts:
The corollary of the last point is simple: you need several reactiveValues(), operating at different scopes in your application.
STORAGE USING AN R6 OBJECT
One downside I can think of when using the reactiveValues() strategy I just described is that, well, it’s reactive, meaning it can lead to uncontrolled reactivity if things aren’t scoped correctly.
One pattern I’ve used in an app is combining an R6 object, used to store and process data, with the trigger mechanism from {gargoyle}. Basically, the idea behind {gargoyle} is simple: instead of relying on the reactive graph to invalidate itself, you init flags that are triggered in the code, and when a flag is triggered, the context where the flag is watched is invalidated.
It’s a bit longer to implement, but you get better control over what is happening.
Combined with this, you can use an R6 object that is passed along the modules, and that gets transformed to store, process, and serve the data.
You can read more about this in “15.1.3 Building triggers and watchers” and “15.1.4 Using R6 as data storage” in Chapter 15 of the Engineering Shiny book.
SESSION$USERDATA
This one should be used with a lot of caution, but it can be very effective if you know what you’re doing (and if you don’t have too many things to share).
The session object is an environment available everywhere in your Shiny app. It represents the current interaction between each user and the R session (i.e., each user has their own). This environment has a special slot called userData that can be populated with data, and it is scoped to the session.
The way I’ve used it in the past is via wrappers, which would look like:
set_this <- function(value, session = shiny::getDefaultReactiveDomain()){
session$userData$this <- compute_this(value)
}
get_this <- function(session = shiny::getDefaultReactiveDomain()){
session$userData$this
}
So anywhere I need it, I’ll use the wrapper function instead of session$userData$this. I would generally use it to define things at the top level that need to be accessible everywhere downstream, but I feel it might be a bit complex to manage if you need to pass data from mod_3_a to mod_3_g.
The documentation says it can be used “to store whatever session-specific data (we) want”, but my gut feeling is that it’s best not to shove too much into it. But I don’t have any rationale reason and I’d be happy to be proven wrong.
AN ENVIRONMENT IN THE SCOPE OF THE PACKAGE/TOP LEVEL OF THE APP
This is something a lot of R developers do: define an environment inside the package namespace so that, when the package is loaded, you can CRUD into it. For example, there are some (well, several) in {shiny}:
> shiny:::.globals
The function shinyOptions() writes to it, and getShinyOption() reads from it.
This pattern can be used as global storage, but be careful: it’s not session-scoped, so whatever is in this environment is shared across sessions.
AN EXTERNAL DATABASE OR STORAGE SYSTEM
Another solution is to store values in an external database, and query that DB inside modules.
If you try to implement this solution, two things to keep in mind are:
Make the data session-scoped, i.e., use session$token to identify the current session, and remove the data when the session ends.
You’ll need to handle reactivity manually, for example with {gargoyle}.
For example, with {storr}:
# Mimicking a session
session <- shiny::MockShinySession$new()
# In module 1
st <- storr::storr_rds(here::here())
st$set("dataset", mtcars, namespace = session$token)
# In module 2
st <- storr::storr_rds(here::here())
st$get("dataset", namespace = session$token)
Of course, this is a short piece of code and you’ll need more engineering, but you get the idea.