Skip to content

add fun_avg to ppc_avg functions #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

add fun_avg to ppc_avg functions #349

wants to merge 8 commits into from

Conversation

tjmahr
Copy link
Collaborator

@tjmahr tjmahr commented May 13, 2025

Following #348, allow user to set the averaging function so that, e.g., user can choose median for heavy-tailed distributions.

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)

# Defaults to mean
bayesplot::ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x) +
  ggplot2::expand_limits(x = c(-10, 10))

bayesplot::ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, fun_avg = median) + 
  ggplot2::expand_limits(x = c(-10, 10))

Created on 2025-05-13 with reprex v2.1.1

@codecov-commenter
Copy link

codecov-commenter commented May 13, 2025

Codecov Report

Attention: Patch coverage is 97.26027% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.58%. Comparing base (23e00b5) to head (4293eb8).

Files with missing lines Patch % Lines
R/ppc-errors.R 97.14% 1 Missing ⚠️
R/ppc-scatterplots.R 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
- Coverage   98.60%   98.58%   -0.03%     
==========================================
  Files          35       35              
  Lines        5539     5589      +50     
==========================================
+ Hits         5462     5510      +48     
- Misses         77       79       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@TeemuSailynoja TeemuSailynoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fast reaction. This looks like a clean implementation to address the listed issue. My only comment is about the documentation, which could mention the expected format of fun_avg.

R/ppc-errors.R Outdated
Comment on lines 13 to 14
#' @param fun_avg Function to apply to compute the posterior average.
#' Defaults to `"mean"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ppc_stat functions, we have a similar argument, by the name stat, which does a very similar job:

#' @param stat A single function or a string naming a function, except for the
#' 2D plot which requires a vector of exactly two names or functions. In all
#' cases the function(s) should take a vector input and return a scalar
#' statistic. If specified as a string (or strings) then the legend will
#' display the function name(s). If specified as a function (or functions)
#' then generic naming is used in the legend.

We could align this doc to read, for example:

#' @param fun_avg A function or a string naming a function for computing the
#' posterior average. In both cases, the function should take a vector input and
#' return a scalar statistic. If specified as a string, then the legend will
#' display the function name. If specified as a function
#' then generic naming is used in the legend.
#' Defaults to "mean".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Average y - y_rep axis label is not affected. I didn't want to make yrep_avg_label() and error_avg_label() depend on fun_avg or change the default "Average y - y_rep" labels.

It does affect the $rep_label in ppc_scatter_avg_data(y, yrep) when fun_avg is a string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, i'm just noticing Aki's comment. I'll switch to stat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tjmahr. The code looks good. And I agree with changing to stat

Comment on lines 14 to 15
#' @param fun_avg Function to apply to compute the posterior average.
#' Defaults to `"mean"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@avehtari
Copy link
Member

I was also thinking stat instead of fun_avg.

It would be good to change the y-axis label to show the stat which was used, and also add parentheses. For example:
"median(y - y_rep)" or "y - median(y_rep)"

@kruschke
Copy link

kruschke commented May 14, 2025

Following up on @avehtari , also would be good to change the x-axis label to show the x variable name instead of just generic "$x$".

@tjmahr
Copy link
Collaborator Author

tjmahr commented May 14, 2025

  1. I have changed fun_avg to have the name stat.
  2. The stat name is shown on the axis label.
  3. I have made the stat argument handle functions given by strings, function objects, anonymous function objects, and quoted functions names.

Because the function name is used for both labeling and invoking a function name, I added as_tagged_function() that keeps track of the expression provided by the user.


library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)
y <- l$y 
yrep <- l$yrep 
x <- l$x

stat is inlined into the axis label

ppc_scatter_avg(y, yrep, stat = "mean")

stat can be a function

ppc_scatter_avg(y, yrep, stat = median)

or a symbol

ppc_scatter_avg(y, yrep, stat = quote(median))

stat can be a primitive function

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)

stat can be an anonymous function, but it gets a generic label

ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1))

Created on 2025-05-14 with reprex v2.1.1

@TeemuSailynoja
Copy link
Collaborator

Following up on @avehtari , also would be good to change the x-axis label to show the x variable name instead of just generic "$x$".

If I'm not wrong, this is more challenging, as when passing x = data_frame$col_name, the information of the name of the column is lost, and only a numeric vector is passed to x.

@tjmahr
Copy link
Collaborator Author

tjmahr commented May 15, 2025

If I'm not wrong, this is more challenging, as when passing x = data_frame$col_name, the information of the name of the column is lost, and only a numeric vector is passed to x.

No, you can capture that. (plot() does that.) It's really easy in a single function setup.

library(rlang)
library(ggplot2)
f <- function(x, y) {
  qx <- enquo(x)
  qy <- enquo(y)
  data <- data.frame(x = x, y = y)
  ggplot(data) + 
    aes(x, y) + 
    geom_point() + 
    labs(
      x = as_label(quo_get_expr(qx)),
      y = as_label(quo_get_expr(qy))
    )
}

data <- data.frame(
  y = bayesplot::example_y_data(),
  x = bayesplot::example_x_data()
)
f(data$x, data$y)

Created on 2025-05-15 with reprex v2.1.1

The hard part for us is that when you call f(x = my_x, y = my_y) which internally calls g(x, y), then g() loses access to the user expression unless you take extra care along the way. (Or use quosures. I'm not sure yet.) We do that when we modify data and call some internal pp_() function to render the plot.

My update yesterday kept track of the expression for stat along the way but it was pretty tricky.

Edit: I figured out how to handle the inner function problem by using quosures/tunnelling.

@tjmahr tjmahr marked this pull request as draft May 15, 2025 19:23
@tjmahr
Copy link
Collaborator Author

tjmahr commented May 15, 2025

  1. Label anonymous functions with "stat"
  2. Support ~ style anonymous functions
  3. Simplify as_tagged_function() by using quosures

library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)
y <- l$y 
yrep <- l$yrep 
x <- l$x

Function name

ppc_scatter_avg(y, yrep, stat = "mean")

ppc_error_scatter_avg(y, yrep, stat = "mean")

Function object

ppc_scatter_avg(y, yrep, stat = median)

ppc_error_scatter_avg(y, yrep, stat = median)

Primitive function

ppc_scatter_avg(y, yrep, x, stat = min)

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)

Anonymous function

ppc_scatter_avg(y, yrep, x, stat = function(x) quantile(x, .1))

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = function(x) quantile(x, .1))

Anonymous function (formulas)

ppc_scatter_avg(y, yrep, x, stat = ~ quantile(.x, .1))

ppc_error_scatter_avg_vs_x(y, yrep, x, stat = ~ quantile(.x, .1))

Created on 2025-05-15 with reprex v2.1.1

@tjmahr
Copy link
Collaborator Author

tjmahr commented May 16, 2025

This patch should be ready for review.

The expression for x now appears in the axis label.

library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
  "https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json", 
  simplifyVector = TRUE
)

ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, stat = "sd")

Created on 2025-05-16 with reprex v2.1.1

@tjmahr
Copy link
Collaborator Author

tjmahr commented May 19, 2025

@kruschke:

But the related thread regarding residuals #349 seems to suggest that residuals are computed as s t a t ( y − y r e p ) , with s t a t ( y r e p ) not separately, explicitly computed. Hmmm...?

Sorry if I'm late to the party, but shouldn't the residual label be $y - stat(y_{rep})$, not $stat(y - y_{rep})$? Isn't $y - stat(y_{rep})$ being computed internally, not $stat(y - y_{rep})$?

stat(y - y_rep) is what is computed internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants