Skip to content

Migrate UnivariateFinite (for categorical distributions) out to new packageΒ #504

@ablaom

Description

@ablaom

In line with #416, I propose we move UnivariateFinite out to a new package called CategoricalDistributions.jl.

If this were okay with the current host of MLJBase.jl (I need to check this @vollmersj) it might make sense for this package to live at JuliaData (host of CategoricalArrays.jl) or JuliaStats (host of Distrtibutions.jl). I wonder what curators of those organisations think of that idea?

@nalimilan @bkamins @andreasnoack @devmotion @matbesancon

Recall that UnivariateFinite consists of the following:

  • A composite type UnivariateFinite{S,V,R,P<:Real} for encoding the probability distribution associated with a finite labelled set of points, as opposed to the distribution Categorical from Distributions.jl, whose sample space is always a collection of integers. The sample space of a UnivariateFinite instance is a CategoricalPool object from CategoricalArrays.jl.

  • Implementation of relevant parts of the Distributions.jl API, including rand, pdf, logpdf support, params, mode, and fit (which fits to a CatgoricalVector).

  • A wrapper UnivariateFiniteArray for arrays of such objects (sharing a common sample space / pool). This type, implementing the AbstractArray API, is optimised for fast indexing, and for broadcasting of pdf, and logpdf (which turned out to be essential in our applications to machine learning).

  • A fairly elaborate constructor for UnivariateFiniteArray objects from matrices of probabilities. See this docstring

Technical note. I'm hoping this migration should be fairly painless but there is one issue to be aware of: Currently the UnivariateFinite constructor stub lives in MLJModelInterface but the type and all real functionality lives in MLJBase (which depends on MLJModelInterface). The reason for this was to keep MLJModelInterface (the sole dependency of third party packages inplementing MLJ's model API) super lightweight. So this needs sorting out.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions