class: center, middle, inverse, title-slide # Debugging ## ⚔
in R ### Julia Piaskowski ### 2022-03-08 --- # When you encounter an R error: 1. Check package documentation 1. Ask around: RStudio community, R4DS slack, Stack Overflow, your local oracle 1. Check source code (maybe) #### You will probably need to work with debugging functions eventually --- class: center, middle, inverse ## debugging = removing code errors <br> # How do you debug? --- # Retrace your steps 1. Go through a process step-by-step and see where things went wrong 1. Start with the action that occurred that first and proceed to the second action, the third, etc... --- # Retrace Example [Agricultural Data from Tidy Tuesday](https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-09-01) ```r library(dplyr); library(tidyr); library(purrr) agdata <- readRDS(here::here("static", "slides", "data", "agdata.rds")) yields <- agdata[[4]] colnames(yields) <- gsub("\\(tonnes per hectare\\)", "", names(yields)) wheat <- yields %>% tidyr::pivot_wider(id_cols = Year, names_from = Entity, values_from = Wheat) %>% purrr::discard(~all(is.na(.))) %>% mutate(ethiopia = case_when( is.na(Ethiopia) ~ `Ethiopia PDR`, TRUE ~ Ethiopia)) %>% select(-Ethiopia, -`Ethiopia PDR`) ``` ``` ## Error in `stop_subscript()`: ## ! Can't subset columns that don't exist. ## x Column `Wheat` doesn't exist. ``` <mark>What went wrong? </mark> --- # Fixed: ```r yields <- agdata[[4]] colnames(yields) <- gsub(" \\(tonnes per hectare\\)", "", names(yields)) wheat <- yields %>% tidyr::pivot_wider(id_cols = Year, names_from = Entity, values_from = Wheat) %>% purrr::discard(~all(is.na(.))) %>% mutate(ethiopia = case_when( is.na(Ethiopia) ~ `Ethiopia PDR`, TRUE ~ Ethiopia)) %>% select(-Ethiopia, -`Ethiopia PDR`) ``` --- # Another retrace example **Working with nested functions makes debugging hard** #### Hard: ```r result <- function4(function3(function2(function1(my_object, extra_args = 1),more_args = TRUE), even_more_args = "show_me_the_money"), final_arg = FALSE) ``` -- #### Easier ```r pre_result1 <- function1(my_object, extra_args = 1) pre_result2 <- function2(pre_result1, more_args = TRUE) pre_result3 <- function3(pre_result2, even_more_args = "show_me_the_money") result <- function4(pre_result3, final_arg = FALSE) ``` *(Lesson: avoid writing nested functions)* --- # Consider writing a function if.... * You need to do something repeatedly * That thing that needs to be repeated has several steps * If you have to *cut-paste-replace* more than twice, consider a function to automate that process --- # Why write functions? * To save yourself work * To avoid mistakes <br><br> 👍 DRY = **D**on't **R**epeat **Y**ourself <br> 👎 WET = **W**rite **E**verything **T**wice / **We** **E**njoy **T**yping --- # Crash course in functions R functions follow a general structure: ```r my_function_name <- function(arg1, arg2) { final_output <- action(arg1, arg2) return(final_output) } ``` Their usage: ```r my_result <- my_function_name(arg1 = ..., arg2 = ...) ``` --- # Crash course in functions * Let's define a function (plotting) * And describe the arguments it takes ```r my_function_name <- function(arg1, arg2) { # arg1 & arg2 = numeric vectors of identical lengths plot(arg1, arg2) } ``` --- # Crash course in functions * Call the function just written: ```r my_function_name(1:10, 1:10 + rnorm(10)) ``` <img src="debugging_files/figure-html/unnamed-chunk-10-1.png" height="60%" /> *(functions are usually more complicated than this example)* .footnote[[longer function writing tutorial](https://agstats.io/post/writing-r-functions/) on writing functions] --- class: center, middle, inverse # [exercise](exercises/exercise-4.html) --- ## What to do when function errors happen * `traceback` ```r f <- function(x) x + 1 g <- function(x) f(x)^3 g("a") ``` ``` ## Error in x + 1: non-numeric argument to binary operator ``` ```r #traceback() ``` --- class: center, middle, inverse # [mini-exercise](exercises/exercise-5.html) --- # Trace * sometimes you only want R to alert you when something specific happens - this is where [trace](https://stat.ethz.ch/R-manual/R-devel/library/base/htmlgi/trace.html) can be helpful * this examines `print()` when it receives numeric input that is greater than 3. ```r trace(print, quote(if (is.numeric(x) && x > 3) cat("We have a problem: \n")), print = FALSE) ``` ``` ## Tracing function "print" in package "base" ``` ``` ## [1] "print" ``` ```r print(4) ``` ``` ## We have a problem: ## [1] 4 ``` Remember to turn `trace` off: ```r untrace(print) ``` ``` ## Untracing function "print" in package "base" ``` --- ## What to do when function errors happen * sprinkle `print()` and/or `str()` statements in your function (not a formal debug method, just something that works) ```r log_hist <- function(var, color = "darkcyan", do_log=TRUE) { if(do_log == TRUE) { print("doing the log var") logvar = log(var) str(logvar) hist(logvar, col = color); abline(v = mean(logvar), lwd = 2, col = "red3") } else hist(var, col = color); abline(v = mean(var), lwd = 2, col = "red3") } ``` -- ```r log_hist(rnorm(100), do_log=TRUE) ``` ``` ## [1] "doing the log var" ``` ``` ## Warning in log(var): NaNs produced ``` ``` ## num [1:100] NaN -2.206 NaN -2.823 -0.953 ... ``` ![](debugging_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ## `browse()` A more powerful way to examine objects in a function: ```r fertil <- agdata[[2]] colnames(fertil) <- c("country", "code", "year", "cereal_yield", "N_appl") ``` ```r my_lm_fun <- function(df, y, cty) { m1 = lm(y ~ year, data = df, subset = df[,df$country == cty]) a1 = anova(m1) r2 = summary(m1)$r2 return(list(coefficients = coef(m1), anova = a1, rsq = r2)) } ``` ```r my_lm_fun(df = fertil, country = "Canada") ``` ``` ## Error in my_lm_fun(df = fertil, country = "Canada"): unused argument (country = "Canada") ``` --- class: center, middle, inverse # [exercise](exercises/exercise-6.html) --- ## How do you examine an error for a function you didn't write? * `debug()` ```r debug(stats::lm) ``` ```r test <- lm(cereal_yield ~ N_appl, data = fertil) ``` Turn if off: ```r undebug(stats::lm) ``` --- # You can put this in your .Rprofile! e.g. ```r options(error = traceback) ``` or ```r options(error = browser) ``` ....and more --- class: center, middle, inverse <iframe src="https://giphy.com/embed/ReImZejkBnqYU" width="530" height="300" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/angry-computer-frustrated-ReImZejkBnqYU">image credit: GIPHY</a></p> ## Keep Calm and Debug On --- # Miscellaneous R tip: running R scripts in a terminal ```r Rscript -e "Sys.Date()" ``` ```r Rscript path/to/my/script.R ``` **Why do this?** You might be running a long process that does not require your supervision or an interactive RStudio session. R can use *a lot* of memory just running processes, but RStudio, the graphical user interface, is also a memory hog.