Skip to content

multiple regression functions with table argument #1763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 23, 2024

Conversation

ds26gte
Copy link
Contributor

@ds26gte ds26gte commented Sep 26, 2024

statistics.arr: Added three functions that perform multiple regression given a table, the input cols, and the output col.

  • multiple-regression-coeffs: returns list of coefficients
  • multiple-regression-table: returns coeffs as 2-col table
  • multiple-regression-fun: returns predictor function

given a table, the input cols, and the output col.
- multiple-regression-coeffs: returns list of coefficients
- multiple-regression-table: returns coeffs as 2-col table
- multiple-regression-fun: returns predictor function
fun multiple-regression-coeffs(t :: Table, input-cols :: List<String>, output-col :: String):
doc: "returns a list of regression coefficients, given a table, a list of independent columns, and the corresponding output column"
# spy: t, input-cols, output-col end
x_s_s = zip-num-lists(input-cols.map(lam(col-name :: String): t.get-column(col-name) end))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

# spy: intercepts end
mx_X = intercepts.augment(MX.lists-to-matrix(x_s_s))
# spy: mx_X end
mx_Y = MX.col-matrix.make(raw-array-from-list(y_s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be done as table-to-matrix(t.get-columns([list: output-col]), for symmetry's sake; it's not any more or less efficient, per se.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

MX.mtx-to-list(MX.mtx-least-squares-solve(mx_X, mx_Y))
end

fun multiple-regression-table(t :: Table, input-cols :: List<String>, output-col :: String):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: Didn't @schanzer implement these functions in his starter files somewhere? (I'm pretty certain I gave him nearly this code, so it's worth double-checking...)

[table-from-columns: {"coefficient-name"; coeff-names}, {"coefficient-value"; B}]
end

fun multiple-regression-fun(t :: Table, input-cols :: List<String>, output-col :: String) -> (List<Number> -> Number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can't possibly be needed as written; don't we have multiple-regression already? This looks like copy-pasted code, and fiddly code at that, so we should deduplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that the current trio of functions take a table, and column names as arguments. The Bootstrap group decided that all three outputs (viz., list of coefficients, table of coefficients, and predictor function) were needed, and it wasn't clear which one (or perhaps two) should be treated as basic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my point. We already have https://github.com/brownplt/pyret-lang/blob/horizon/src/arr/trove/statistics.arr#L236-L244 that defines B_pred_fun; don't duplicate that code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a duplicate. multiple-regression-fun calls multiple-regression-coeffs to get a list of coefficients, which it calls B. The B_pred_fn then uses fold2 to do a dot-product its argument list with B. It avoids having to convert the argument list to a vector.

In the original multiple-regression, we get a vector of coefficients, which it calls B. Its B_pred_fn converts its argument list into a vector and then performs a vector dot product with the vector B.

The only commonality between the two B_pred_fns is that they check the length of their incoming list the same way to decide if an exception needs to be thrown. Once past that check, the body is different.

One could write an external function that is called in both places (with B as argument) to create B_pred_fn, but even so, one would have to do an extra list/vector casting in one of the two functions, so that the Bs are the same type.

@ds26gte ds26gte merged commit 8d9fba0 into brownplt:horizon Oct 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants