-
Notifications
You must be signed in to change notification settings - Fork 116
multiple regression functions with table argument #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
given a table, the input cols, and the output col. - multiple-regression-coeffs: returns list of coefficients - multiple-regression-table: returns coeffs as 2-col table - multiple-regression-fun: returns predictor function
src/arr/trove/statistics.arr
Outdated
fun multiple-regression-coeffs(t :: Table, input-cols :: List<String>, output-col :: String): | ||
doc: "returns a list of regression coefficients, given a table, a list of independent columns, and the corresponding output column" | ||
# spy: t, input-cols, output-col end | ||
x_s_s = zip-num-lists(input-cols.map(lam(col-name :: String): t.get-column(col-name) end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better done as x_s_s = table-to-matrix(t.select-columns(input-cols))
using https://github.com/brownplt/pyret-lang/blob/horizon/src/arr/trove/matrices.arr#L1047 and https://github.com/brownplt/pyret-lang/blob/horizon/src/js/trove/table.js#L712
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
src/arr/trove/statistics.arr
Outdated
# spy: intercepts end | ||
mx_X = intercepts.augment(MX.lists-to-matrix(x_s_s)) | ||
# spy: mx_X end | ||
mx_Y = MX.col-matrix.make(raw-array-from-list(y_s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be done as table-to-matrix(t.get-columns([list: output-col])
, for symmetry's sake; it's not any more or less efficient, per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
src/arr/trove/statistics.arr
Outdated
MX.mtx-to-list(MX.mtx-least-squares-solve(mx_X, mx_Y)) | ||
end | ||
|
||
fun multiple-regression-table(t :: Table, input-cols :: List<String>, output-col :: String): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: Didn't @schanzer implement these functions in his starter files somewhere? (I'm pretty certain I gave him nearly this code, so it's worth double-checking...)
[table-from-columns: {"coefficient-name"; coeff-names}, {"coefficient-value"; B}] | ||
end | ||
|
||
fun multiple-regression-fun(t :: Table, input-cols :: List<String>, output-col :: String) -> (List<Number> -> Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can't possibly be needed as written; don't we have multiple-regression
already? This looks like copy-pasted code, and fiddly code at that, so we should deduplicate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that the current trio of functions take a table, and column names as arguments. The Bootstrap group decided that all three outputs (viz., list of coefficients, table of coefficients, and predictor function) were needed, and it wasn't clear which one (or perhaps two) should be treated as basic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not my point. We already have https://github.com/brownplt/pyret-lang/blob/horizon/src/arr/trove/statistics.arr#L236-L244 that defines B_pred_fun
; don't duplicate that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a duplicate. multiple-regression-fun
calls multiple-regression-coeffs
to get a list of coefficients, which it calls B
. The B_pred_fn
then uses fold2
to do a dot-product its argument list with B
. It avoids having to convert the argument list to a vector.
In the original multiple-regression
, we get a vector of coefficients, which it calls B
. Its B_pred_fn
converts its argument list into a vector and then performs a vector dot product with the vector B
.
The only commonality between the two B_pred_fn
s is that they check the length of their incoming list the same way to decide if an exception needs to be thrown. Once past that check, the body is different.
One could write an external function that is called in both places (with B
as argument) to create B_pred_fn
, but even so, one would have to do an extra list/vector casting in one of the two functions, so that the B
s are the same type.
statistics.arr: Added three functions that perform multiple regression given a table, the input cols, and the output col.