Skip to content

Make median and quantile consistent to base R#254

Open
hcirellu wants to merge 12 commits intomainfrom
median
Open

Make median and quantile consistent to base R#254
hcirellu wants to merge 12 commits intomainfrom
median

Conversation

@hcirellu
Copy link
Copy Markdown
Collaborator

median and quantile have unexpected results when comparing it with integer transfered to integer64.

main:

median(as.integer64(c(1, 3)))
# integer64
# [1] 3

PR:

median(as.integer64(c(1, 3)))
# integer64
# [1] 2

I had to fix two existing tests.

Closes #247

@hcirellu
Copy link
Copy Markdown
Collaborator Author

I also had to change the optimizer64 in a way that the comparing quantile is called with type=7, which is the default in R. In addition I needed to add a custom round function, that does not round to the next even number for the conversion from double to integer64.
This way it meets the integer64 logic of calculating the in-between values in sortqtl.integer64 and orderqtl.integer64 where the resulting value is calculated by the interpolation of the neighboring values, i.e.:

neighboring_values[1L,] + (neighboring_values[2L,] - neighboring_values[1L,])*(sel%%1)

This boils down to:

as.integer64(1L) + as.integer64(2L - 1L)*0.5
# integer64
# [1] 2

which is different to coercing the double result to integer64 for comparison:

as.integer64(1L + (2L - 1L)*0.5)
# integer64
# [1] 1

Is this ok? Or should the calculation in sortqtl.integer64 and orderqtl.integer64 be changed to meet the requirement that median(as.integer(1:2)) equals as.integer64(median(1:2)), which seems inconsistent with the interpolation of neighboring values using +, -, and * for integer64?
WDYT?

@hcirellu hcirellu marked this pull request as ready for review January 27, 2026 16:02
@MichaelChirico MichaelChirico added this to the 4.7.0 milestone Mar 6, 2026
@@ -281,7 +281,7 @@ test_that("sorting methods work", {
expect_identical(rank(x, method="orderrnk"), x_rank)

x = as.integer64(1:100)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about

x32 = 100 * (1:100)
x64 = as.integer64(x32)

quantile(x32/100, names=FALSE)
# [1]   1.00  25.75  50.50  75.25 100.00
quantile(x32, names=FALSE)
# [1]   100  2575  5050  7525 10000
quantile(x64, names=FALSE)
# integer64
# [1] 100   2600  5000  7500  10000

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in main. In this PR the result would be

quantile(x64, names=FALSE)
# integer64
# [1] 100   2575  5050  7525  10000

which is what I expect, because it is "identical" to the default behaviour in R. But that depends on the individual view.
When comparing the different modes (type=7 is the default)

setNames(`row.names<-`(Reduce(rbind, lapply(1:9, \(t) quantile(x32, type=t, names=FALSE)), init = data.frame(rep(list(numeric()),5))), c(paste0("type=", 1:9))), names(quantile(x32)))
#         0%      25%  50%      75%  100%
# type=1 100 2500.000 5000 7500.000 10000
# type=2 100 2550.000 5050 7550.000 10000
# type=3 100 2500.000 5000 7500.000 10000
# type=4 100 2500.000 5000 7500.000 10000
# type=5 100 2550.000 5050 7550.000 10000
# type=6 100 2525.000 5050 7575.000 10000
# type=7 100 2575.000 5050 7525.000 10000
# type=8 100 2541.667 5050 7558.333 10000
# type=9 100 2543.750 5050 7556.250 10000

it shows, that the results differ alot.
Do we want to support the types 1-9, too? (Right now we only support a non-existent type=0.)
For a consistent median based on qtile it seems, that we "need" a qtile with type=7. Or we implement median independent of quantile.

Or didn't I get your question right? Is it about adjusting the test from 1:100 to your proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

median diverges of base when length of vector is even

2 participants