Skip to content

Suggest to create an extended statistics entry about a functional dependency between columns #7

@sfkeller

Description

@sfkeller

After some discussion over here ankane/dexter#32 I'v following suggestion:

Optimization assumes that columns are independent, so a query plan can go wrong if there are functional dependencies (FD) between columns which he doesn't know. Real world datasets often have FDs between columns because of denormalization or just because the data is like this.

Actually, finding FDs for all columns could become a rather hard problem to solve. But hopefully this challenge becomes feasible if applied only to columns as part of given slow queries.

Expected behaviour:

PostgreSQL offers CREATE STATISTICS in order to inform the planner about FDs. So, looking e.g. at the freely available database "ATP tennis tour", and given a slow query involving tourney_id, tourney_name, pg_plan_advsr could make a specific suggestion like this:

create statistics atp_matches_2019_tourney_id_tourney_name (dependencies) 
	on tourney_id, tourney_name 
	from atp_matches_2019;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions