-
-
Notifications
You must be signed in to change notification settings - Fork 234
sql/expression/function/aggregation: Change aggregation functions to work better with group by expressions. #540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…y from aggregation functions.
…work better with group by expressions.
I like this method a lot. Really clever imo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, but I'm wondering if there could be a better way using the analyzer.
The analyzer has a rule called flatten_aggretation_expressions
that does something very similar, i.e. it turns (SUM(a) / SUM(b))
into a projection of (suma / sumb)
onto a new GroupBy
node that expands the single select expression (SUM(a) / SUM(b))
into two, SUM(a) as suma, SUM(b) as sumb
.
If I'm squinting at this, it seems like something similar could work here, where the distinct expression is itself treated (appropriately) as an aggregate expression and flattened. Then it automatically gets a new buffer for every grouping key, no need to duplicate expression trees. I haven't thought about it that deeply though, might not be workable. Worth a little investigation.
"github.com/dolthub/go-mysql-server/sql/expression" | ||
) | ||
|
||
func duplicateExpression(ctx *sql.Context, expr sql.Expression) (sql.Expression, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be a public Clone method in the expression package, next to TransformUp
…ies where aggregations do not get handled correctly by the analyzer.
…which can receive aggregated rows and produce a value.
…nd group by nodes.
006eab8
to
180bba5
Compare
… Allow Window nodes to have aggregation expressions as well.
This got a little bit gnarly. Broadly:
This is ready for review and I think this is the approach we should broadly go with for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, no real comments
@@ -143,7 +143,7 @@ func (i *windowIter) Next() (sql.Row, error) { | |||
return nil, err | |||
} | |||
case sql.Aggregation: | |||
row[j], err = expr.Eval(i.ctx, i.buffers[j]) | |||
row[j], err = i.buffers[j][0].(sql.AggregationBuffer).Eval(i.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could use a TODO, it needs to be given the same treatment as the non-window aggregates
sql/analyzer/validation_rules.go
Outdated
// GroupBy is the only node that can support evaluating an Aggregation. | ||
// | ||
// See https://github.com/dolthub/go-mysql-server/issues/542 for some queries | ||
// that should be supported but that currently trigger this validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there is a typo here
The existing code seems to be expecting rows to arrive in order of the group by expression. But the analyzer does not arrange for that to happen.
Instead, this PR changes the approach so that each aggregation function duplicates its Child expression(s) as part of its per-aggregation state. That way it can Eval against its per-group-by Child and get the correct results out of
Distinct
for example.This PR updates
AVG
,FIRST
,LAST
,MAX
,MIN
andSUM
to do this.COUNT(DISTINCT ...)
is handled by a special expression node instead, and nothing has been changed inCount
orCountDistinct
.group_concat
also seems to handle DISTINCT itself, and so I have not changed anything there. Json aggregation did not look immediately amenable to combining withDISTINCT
, because theUpdate
functions seemed to error when the child expression returnednil
, so I have not changed them either.