133 code parser #139

m7pr · 2023-08-23T08:58:51Z

Closes #133

This PR introduces a feature that can be utilized in a broad usage. Currently it only extends qenv class, but the big picture is that we will be able to change the way we provide data in teal::init.

Current behavior

Currently teal::init takes teal.data::teal_data() as an input which takes actual R objects as an input with an extra companion of code specification, which in all cases is the code used to create the R object that is being passed. This results in code duplication: we first create a code to create the object, and then we copy-paste this code into an object specification at teal.data::teal_data().

new_iris <- transform(iris, id = seq_len(nrow(iris)))
new_mtcars <- transform(mtcars, id = seq_len(nrow(mtcars)))

app <- init(
  data = teal_data(
    dataset("new_iris", new_iris, code = "new_iris <- transform(iris, id = seq_len(nrow(iris))"),
    dataset("new_mtcars", new_mtcars, code = "new_mtcars <- transform(mtcars, id = seq_len(nrow(mtcars)))"),
  )
)

Proposed alternative

The alternative to the above, proposed in this PR, is having a functionality called code parser. This functionality understands which parts of the code (passed as a character) is needed to create a specific object (with all it's dependent objects and dependent side-effects). Thanks to that, we don't need to pass object specifications and code separately - we can just pass the code, which will be evaluated and which will be parsed so that under the hood objects are created and their respective code is assigned to them automatically.

This PR only introduces changes to qenv object. Further changes to how teal.data::teal_data() or teal::init data parameter work will be needed. qenv received a new field called code_dependency that is a list needed to restore the object and it's side effects. Below are a few examples on extraction of the code of ADSL object

code objects

library(dplyr)
code = '
arm_mapping <- list(
  "A: Drug X" = "150mg QD",
  "B: Placebo" = "Placebo",
  "C: Combination" = "Combination"
)
color_manual <- c("150mg QD" = "#000000", "Placebo" = "#3498DB", "Combination" = "#E74C3C")
# assign LOQ flag symbols: circles for "N" and triangles for "Y", squares for "NA"
shape_manual <- c("N" = 1, "Y" = 2, "NA" = 0)
ADSL <- goshawk::rADSL
goshawk::rADLB-> ADLB
iris2 <- iris # @effect ADLB ADSL
var_labels <- lapply(ADLB, function(x) attributes(x)$label)
iris3 <- iris'
code2 = '
ADLB <- ADLB %>%
  dplyr::mutate(AVISITCD = dplyr::case_when(
    AVISIT == "SCREENING" ~ "SCR",
    AVISIT == "BASELINE" ~ "BL",
    grepl("WEEK", AVISIT) ~
      paste(
        "W",
        trimws(
          substr(
            AVISIT,
            start = 6,
            stop = stringr::str_locate(AVISIT, "DAY") - 1
          )
        )
      ),
    TRUE ~ NA_character_
  )) %>%
  dplyr::mutate(AVISITCDN = dplyr::case_when(
    AVISITCD == "SCR" ~ -2,
    AVISITCD == "BL" ~ 0,
    grepl("W", AVISITCD) ~ as.numeric(gsub("[^0-9]*", "", AVISITCD)),
    TRUE ~ NA_real_
  )) %>%
  # use ARMCD values to order treatment in visualization legend
  dplyr::mutate(TRTORD = ifelse(grepl("C", ARMCD), 1,
                                ifelse(grepl("B", ARMCD), 2,
                                       ifelse(grepl("A", ARMCD), 3, NA)
                                )
  )) %>%
  dplyr::mutate(ARM = as.character(arm_mapping[match(ARM, names(arm_mapping))])) %>%
  dplyr::mutate(ARM = factor(ARM) %>%
                  reorder(TRTORD)) %>%
  dplyr::mutate(
    ANRHI = dplyr::case_when(
      PARAMCD == "ALT" ~ 60,
      PARAMCD == "CRP" ~ 70,
      PARAMCD == "IGA" ~ 80,
      TRUE ~ NA_real_
    ),
    ANRLO = dplyr::case_when(
      PARAMCD == "ALT" ~ 20,
      PARAMCD == "CRP" ~ 30,
      PARAMCD == "IGA" ~ 40,
      TRUE ~ NA_real_
    )
  ) %>%
  dplyr::rowwise() %>%
  dplyr::group_by(PARAMCD) %>%
  dplyr::mutate(LBSTRESC = ifelse(
    USUBJID %in% sample(USUBJID, 1, replace = TRUE),
    paste("<", round(runif(1, min = 25, max = 30))), LBSTRESC
  )) %>%
  dplyr::mutate(LBSTRESC = ifelse(
    USUBJID %in% sample(USUBJID, 1, replace = TRUE),
    paste(">", round(runif(1, min = 70, max = 75))), LBSTRESC
  )) %>%
  ungroup()'

code3 = '
attr(ADLB[["ARM"]], "label") <- var_labels[["ARM"]]
attr(ADLB[["ANRHI"]], "label") <- "Analysis Normal Range Upper Limit"
attr(ADLB[["ANRLO"]], "label") <- "Analysis Normal Range Lower Limit"
mtcars # @effect ADLB
options(prompt = ">") # @effect ADLB

# add LLOQ and ULOQ variables
ADLB_LOQS<-goshawk:::h_identify_loq_values(ADLB)
goshawk:::h_identify_loq_values(ADLB)->ADLB_LOQS
ADLB = dplyr::left_join(ADLB, ADLB_LOQS, by = "PARAM")
iris6 <- list(ADLB, ADLB_LOQS, ADSL)
iris5 <- iris'

q1 <- teal.code:::new_qenv()
q2 <- teal.code::eval_code(q1, code = code)
q3 <- teal.code::eval_code(q2, code = code2)
q4 <- teal.code::eval_code(q3, code = code3)

get_code(q2, deparse = FALSE, names = "ADLB")
get_code(q3, deparse = FALSE, names = "ADLB")
get_code(q4, deparse = FALSE, names = "ADLB")
get_code(q4, deparse = FALSE, names = "var_labels")
get_code(q4, deparse = FALSE, names = "ADSL")
get_code(q4, deparse = FALSE, names = c("ADSL", "ADS", "C"))
get_code(q4, deparse = FALSE, names = c("var_labels", "ADSL"))
get_code(q4)

Side effects

The functionality might be a bit complicated. The main reason for that is the handling of side effects. Often in a code there are side effects that can not be directly connected with specific objects. If you connect objects with assign operators (like <-, =, ->) then it is easy to understand the dependency structure between objects and code lines. However if you have side effects, like the creation of a database connection, that influences all other operations in the code, it is not possible to be guessed just by the static code analysis. Hence we introduce a possibility to pass # @effect object_name tag at the end of the line, to specify on which objects does this line has effects. The bottleneck of this solution is that, we operate on a parsed code that looses information about comments. The comments are stored in it's srcref attribute that is put into utils::getParseData() function, which requires us to have some extra meta-information stored if we want to also restore lines that are side effects.

Notes

The relation between objects is assumed to be passed by <-, = or -> assignment operators. No other object creation methods (like assign, or <<- or any non-standard-evaluation method) are supported. This is solved by # @effect tag

…ineering/teal.code into 133_code_parser@main

R/utils-code-dependency.R

R/qenv-eval_code.R

chlebowa · 2023-09-20T13:12:21Z

R/qenv-get_code.R

@@ -3,6 +3,7 @@
 #' @name get_code
 #' @param object (`qenv`)
 #' @param deparse (`logical(1)`) if the returned code should be converted to character.
+#' @param names (`character(n)`) if provided, returns the code only for objects specified in `names`.


Food for thought: is it a better API to have this argument or to have a separate function, say get_object_code?

we can also have get_code() that extracts a list for all the objects, and you could call get_code()['object_name'], unsure what is the best way in here yet

R/qenv-concat.R

R/qenv-eval_code.R

… not pecificed/created in the same eval_code

m7pr · 2023-09-26T10:00:51Z

Hey @chlebowa for this

Also,

q <- eval_code(new_qenv(), "a <- 1")
get_code(q, deparse = FALSE, names = "a")

returns character when names is not NULL.

Yeah, it returns character for deparse = TRUE

testthat::test_that(
  "get_code returns the same class when names is specified and when not",
  {
    q <- eval_code(new_qenv(), "a <- 1")
    testthat::expect_identical(
      get_code(q, deparse = FALSE, names = "a"),
      get_code(q, deparse = TRUE)
    )
  }
)

…ineering/teal.code into 133_code_parser@main

R/qenv-eval_code.R

R/qenv-class.R

R/utils-code-dependency.R

chlebowa · 2023-09-27T12:48:04Z

R/utils-code-dependency.R

+      "Objects not found in 'qenv' environment: ",
+      paste(names[!(names %in% ls(qenv@env))], collapse = ", ")
+    )
+  }


Suggested change

}

return(character(0L))

}

I allow the function to work further, because if someone asks for 3 objects and 1 of them does not exist, you at least get the code for other two. Maybe it's better if we put error in here

R/utils-code-dependency.R

averissimo

The implementation of code parser is smart and complex 💯

I've been testing and thinking about this and can't shake off the feeling of the existence of a bunch of exceptions that may exist outside the control of the insightsengineering team

However, I'm not finding a lot of them and the ones I find are a bit specific 😁

Mostly when accessing data from packages or an initial assignment via assign('yada') # @effect yada

Minor edge cases

It's been hard to find situations where it fails, which is nice in something as complex as this!!

I guess both the examples below come from the initial object not being detected as "assigned"

Data from packages

I believe this case might be plausible

testthat::test_that("code_parser load data from package & effect hint", {

  code <- 'data(iris) # @effect iris'
  
  q1 <- teal.code::eval_code(teal.code:::new_qenv(), code = code)
  
  # Makes sure the object is on qenv
  q2 <- testthat::expect_output(
    teal.code::eval_code(q1, code = "print(NROW(iris))"),
    "150"
  )
  
  get_code(q1, deparse = FALSE, names = "iris") |> 
    length() |> 
    expect_gt(0)

  parsed_code <- get_code(q2, deparse = FALSE, names = "iris")

  expect_gt(length(parsed_code), 0)
  expect_false(is.na(parsed_code))
})

`assign` as first call

Related to the one above, although I guess it's not as plausible but exists nonetheless

testthat::test_that("code_parser with assign & effect hint", {

  code <- 'assign("ADSL", iris) # @effect ADSL'
  
  q1 <- teal.code::eval_code(teal.code:::new_qenv(), code = code)
  
  # Makes sure the object is on qenv
  q2 <- testthat::expect_output(
    teal.code::eval_code(q1, code = "print(NROW(ADSL))"),
    "150"
  )
  
  get_code(q1, deparse = FALSE, names = "ADSL") |> 
    length() |> 
    expect_gt(0)
  
  parsed_code <- get_code(q2, deparse = FALSE, names = "iris")
  
  expect_gt(length(parsed_code), 0)
  expect_false(is.na(parsed_code))
  
})

m7pr · 2023-09-28T06:59:46Z

Thanks @averissimo for kind words. This is a joint team effort, so there were multiple people involved in coming up with great ideas and suggestions. For the cases that you found with assign and data I think we have a statement, that this will not work yet

teal.code/R/utils-code-dependency.R

Lines 4 to 5 in 76d1edd

    
           #' @details The relation between objects is assumed to be passed by `<-`, `=` or `->` assignment operators. No other 
        
           #' object creation methods (like `assign`, or `<<-` or any non-standard-evaluation method) are supported. To specify

as we are aware of our limitations.

averissimo · 2023-09-28T08:24:58Z

code <- 'assign("ADSL", iris) # @effect ADSL'
...
code <- 'data(iris) # @effect iris'

@m7pr I'm aware of that documentation 🙂 and the examples above have the hint, however, it's not catching it when getting the code.

m7pr · 2023-09-28T09:01:24Z

ah, got you! Alrighty then, thanks for pointing this up. I think I can have an extra look on this

m7pr · 2023-09-28T09:04:13Z

We had a call today with @gogonzo and @chlebowa where we decided to simplify the approach.
The main change will be change in the default behavior of eval_code. It will change expressions and languages input to characters and the main functionality will be provided in eval_code for a character signature. We will just store the object@code as a character vector (not expression as it is now), and we will extend object@code at every eval_code. The whole parsing machinery will be transferred to get_code and executed on a whole object@code.

m7pr · 2023-09-29T13:41:02Z

Hey, working on a new approach on a separate branch so it's easier to track change of the final approach against the main
#146

Incorporated some of the feedback provided by @chlebowa and @averissimo but not all yet. Work in progress

m7pr · 2023-10-02T13:35:20Z

Hey @averissimo I incorporated your 2 examples in tests in other PR #146

m7pr · 2023-10-06T12:35:35Z

closing in favour of #146

Fixes #133 Alternative to #139 --------- Signed-off-by: Marcin <[email protected]> Co-authored-by: go_gonzo <[email protected]> Co-authored-by: Aleksander Chlebowski <[email protected]> Co-authored-by: Dawid Kałędkowski <[email protected]> Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Aleksander Chlebowski <[email protected]>

m7pr and others added 15 commits August 15, 2023 16:25

#133 attempt on eval_code for character input for code-parser

2ab4a1c

#133 improve code-parser

2dde5b3

#133 code to extract needed side-effects

0e8c7d3

#133 deal with side effects

54d3b5c

#133 move code-parser to prototypes in dev/

e14e508

#133 code execution for code-parser

48f5ca9

add TODO

0474ae5

#133 extend assumptions and todo

f57dde6

#133 implement R/utils-doce-dependency.R

f7ccc0e

#133 fix missing occurence and effect for eval_code on an empty new_qenv

b04b7da

#133 improve get_code_dependencies

c048346

#133 update man pages

ba3850b

#133 trim down the occurence table in recursive search

ce86f83

#133 fix effects lenght in the output

b781910

Merge b781910 into 760b73f

e6259de

m7pr added the core label Aug 23, 2023

github-actions bot and others added 4 commits August 23, 2023 09:01

[skip actions] Restyle files

48ff379

typo

b68d6a4

Merge branch '133_code_parser@main' of https://github.com/insightseng…

78caf26

…ineering/teal.code into 133_code_parser@main

Empty-Commit

aca73b9

m7pr requested a review from chlebowa August 23, 2023 09:54

dependabot-preview bot and others added 5 commits August 23, 2023 09:56

[skip actions] Roxygen Man Pages Auto Update

f9be775

#133 spelling check

ce01d42

Merge branch '133_code_parser@main' of https://github.com/insightseng…

7542e93

…ineering/teal.code into 133_code_parser@main

Empty-Commit

9f4b278

#133 lintr changes

e1aabed

gogonzo self-assigned this Sep 6, 2023

gogonzo reviewed Sep 6, 2023

View reviewed changes

R/utils-code-dependency.R Outdated Show resolved Hide resolved

gogonzo reviewed Sep 6, 2023

View reviewed changes

R/utils-code-dependency.R Outdated Show resolved Hide resolved

gogonzo reviewed Sep 6, 2023

View reviewed changes

R/qenv-eval_code.R Outdated Show resolved Hide resolved

chlebowa reviewed Sep 20, 2023

View reviewed changes

R/qenv-concat.R Show resolved Hide resolved

gogonzo reviewed Sep 25, 2023

View reviewed changes

R/qenv-eval_code.R Show resolved Hide resolved

m7pr and others added 3 commits September 26, 2023 11:55

#133 @effect returns this line for affected binding even if object is…

c420fa3

… not pecificed/created in the same eval_code

Merge c420fa3 into 41f660b

5af03b1

[skip actions] Restyle files

81a2656

m7pr and others added 9 commits September 26, 2023 13:15

#133 allow to return side_effects for side_effected objects

0817128

Merge branch '133_code_parser@main' of https://github.com/insightseng…

638e4bd

…ineering/teal.code into 133_code_parser@main

Merge 638e4bd into 41f660b

8a2b7b5

[skip actions] Restyle files

9d129c8

one more test

4e764d2

Merge branch '133_code_parser@main' of https://github.com/insightseng…

dc8faa9

…ineering/teal.code into 133_code_parser@main

Merge dc8faa9 into 41f660b

35f47ab

[skip actions] Restyle files

79d3c00

[skip actions] Roxygen Man Pages Auto Update

76d1edd

chlebowa reviewed Sep 27, 2023

View reviewed changes

averissimo reviewed Sep 27, 2023

View reviewed changes

m7pr mentioned this pull request Sep 29, 2023

#133 code parser alternative #146

Merged

m7pr closed this Oct 6, 2023

m7pr deleted the 133_code_parser@main branch November 7, 2023 09:13

m7pr mentioned this pull request Oct 30, 2024

211 [.qenv S3 method + replacement of @id, @warnings, and @messages fields #216

Merged

8 tasks

Uh oh!

133 code parser #139

133 code parser #139

Uh oh!

Conversation

m7pr commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current behavior

Proposed alternative

Side effects

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chlebowa Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

m7pr Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

m7pr commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chlebowa Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

m7pr Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

averissimo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Minor edge cases

Data from packages

assign as first call

Uh oh!

m7pr commented Sep 28, 2023

Uh oh!

averissimo commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m7pr commented Sep 28, 2023

Uh oh!

m7pr commented Sep 28, 2023

Uh oh!

m7pr commented Sep 29, 2023

Uh oh!

m7pr commented Oct 2, 2023

Uh oh!

m7pr commented Oct 6, 2023

Uh oh!

Uh oh!

m7pr commented Aug 23, 2023 •

edited

Loading

m7pr commented Sep 26, 2023 •

edited

Loading

averissimo left a comment •

edited

Loading

`assign` as first call

averissimo commented Sep 28, 2023 •

edited

Loading