Scalable self-paced e-learning of statistical programming with fine-grained feedback and assessment

Cynthia A. Huang*

Department of Econometrics and Business Statistics, Monash Business School

Mitchell O’Hara-Wild

Department of Econometrics and Business Statistics, Monash Business School

TeachR (in a nutshell)

  • instructor focused tool for automated assessment of R code and authoring interactive resources
  • allows for examination of R session environment, inputs and outputs
  • scales access for students and sandboxes execution for markers via WebAssembly with WebR
  • offers an accessible literature programming interface via Quarto
  • wraps meta-programming techniques into helper functions such as search_ast() and exists_in()

About the Team

TeachR, and its predecessor Monash Learn R, include contributions from:

  • Emi Tanaka
  • Mitchell O’Hara-Wild
  • Danyang Dai
  • Jessica Leung
  • Rob J Hyndman
  • Monash EBS department funding
  • Cynthia A. Huang (me!)
  • Krisanat Anukarnsakulchularp
  • Tashya Sathyajit
  • Paulo Rodrigues
  • Janith Wanniarachchi

Background & Motivation

Support, Feedback and Assessment for Statistical Programming

Automated Assessment in CS

Paiva, Leal, and Figueira (2022) provide a state-of-the-art review of automated assessment in computer science:

Testing techniques

  • output comparison vs. unit testing
  • static vs. dynamic (evaluated) analysis

Feedback generation

Challenges and Future Directions

  • tools vs. research prototypes
  • scale & security considerations

CS vs. Statistical Computing

  • Paiva, Leal, and Figueira (2022) mention:
    • special CS domains (e.g. databases, computer graphics)
    • assessing final projects (e.g. web or mobile apps) in addition to computational thinking
    • no mention of data science or data-driven reports.
  • In DS/Stats Computing we care about:
    • fluency in skills like wrangling, modelling, visualising etc.
    • statistical validity and theoretical soundness
    • storytelling and communication effectiveness

Current solutions and challenges

Interactive Tutorials and Feedback:

  • Monash Learn R, FreeCodeCamp
  • Codecademy, DataCamp, other commercial offerings
  • … other institutional offerings
  • … various LLM powered chatbots / systems

In-Browser Code Execution:

  • juniper
  • WebR
  • … proprietary solutions

Assessment Authoring Tools:

They are:

  • instant*-ish
  • scalable*-ish
  • lacking usability

Our journey toward scalable self-paced e-learning ⛰️- R used in many classes- Students' skills vary!- Limited classroom time for supplemental catch up support - Created learnr.numbat.space- Code ran remotely on mybinder.org - Sometimes slow to start - Server load issues- Code runs on student computer locally with webr!- Literate programming approach- Faster, cheaper, more scalablequarto-webr-teachr

TeachR

teachr = quarto + webr

TeachR

teachr = quarto + webr + (✨stats & compsci pedagogy)

Code Testing: Available Objects

Outputs
results, plots, tables etc.
  • .printed: printed objects
  • .errored: execution errors
  • .warned: execution warnings
  • .messaged: execution messages
Inputs
code submitted for execution
  • .src: raw source code
  • .code: parsed code (i.e. abstract syntax tree)
Environment
objects, packages, states etc.
  • Saved objects
  • .packages(): Loaded packages
  • File system contents
  • Random seed state

When assessing data analysis code it can be useful to examine multiple available objects

Demo: Roll-A-Dice


    

Demo: Roll-A-Dice


    

Demo: Roll-A-Dice


    

Anatomy of the webr-teachr chunkThe question'sstarter codeThe answer checking codelearnr-academy/quarto-webr-teachr

Anatomy of the webr-teachr chunklearnr-academy/quarto-webr-teachrFill in the blanks with <<solution>>Hint given to student if this is TRUE

Demo Code: Roll-A-Dice

# Write some code to roll a dice
roll_a_dice <- <<function(){sample(1:6, size = 1L)}>>
# Then, roll the dice!
roll_a_dice()

???

old_seed <- .Random.seed
rolls <- sapply(seq_len(1000), function(x) roll_a_dice())
if(search_ast(.code, .expr = sample(size = 1))) {
  cat("💡 It's better to use integers (size = 1L)
       instead of numeric/double (size = 1)\n")
}
c(
  "Your code should use the sample() function as shown in class." =
    !search_ast(.code, .fn = sample),
  "Your function should randomly select dice values." =
    identical(old_seed, .Random.seed),
  "Your function should return a single dice value." =
    length(roll_a_dice()) != 1L,
  "Your function should return an integer." = !is.integer(rolls),
  "Your function returns dice numbers less than 1." = min(rolls) < 1L,
  "Your function returns dice numbers more than 6." = max(rolls) > 6L,
  "You should try your function!" =
    !exists_in(.printed, is.integer) && !search_ast(.code, .fn = roll_a_dice)
)

Remaining Challenges and Future Work?

  • What types of learning and assessment could we automate well enough?
    • Formative assessment outside of the classroom?
    • Expanding exposure to errors/code variants/debugging strategies?
  • What new vocabulary and frameworks are needed for adoption in StatsEd?
    • content reuse and remixing?
    • intergration with LLM-powered feedback generation?
  • Other ideas? Talk to me: cynthia.huang@monash.edu!

References

Keuning, Hieke, Johan Jeuring, and Bastiaan Heeren. 2019. “A Systematic Literature Review of Automated Feedback Generation for Programming Exercises.” ACM Transactions on Computing Education 19 (1): 1–43. https://doi.org/10.1145/3231711.
Paiva, José Carlos, José Paulo Leal, and Álvaro Figueira. 2022. “Automated Assessment in Computer Science Education: A State-of-the-Art Review.” ACM Transactions on Computing Education 22 (3): 1–40. https://doi.org/10.1145/3513140.

Appendix

The ingredients for evaluating code (automatically!) 🧁Outputs (results, plots, tables)The results of executed code,useful for checking if the correctanswer has been achieved.A section of data analysis codeoften produces multiple outputs.Extra care is needed to check alloutputs for the right answer.Inputs (code)The code entered by the student,useful for checking method/style.* .src - The raw source code* .code - The parsed code (AST)* .printed - The printed objects* .errored - The execution errors* .warned - The warnings* .messaged - The messagesEnvironmentUsed for checking things such as:* Saved objects* Loaded packages* File system contents* Random seed state💡

Extended Demo: Roll-A-Dice

Assessment using Outputs

The code!Use exists_in() to see if anything.printed was an integer (dice roll)(also works for .errored, .warned, .messaged)


    

Assessment using the Environment

The code!Using sample (or any randomness)will change the .Random.seedCheck if the seed didn't changed(nothing random happened)


    

Assessment using Inputs

The code!Search the code for if the function sample() was used


    

Assessment of Code Style

The code!Give a tip without reportingthe answer as incorrect


    

Journey

Our journey toward scalable self-paced e-learning ⛰️* R used in many courses* Student skills vary greatly* Limited classroom time forsupplemental support

Our journey toward scalable self-paced e-learning ⛰️* R used in many courses* Student skills vary greatly* Limited classroom time forsupplemental support* Created learnr.numbat.space* Interactive in-browser code exercises with code ran remotely on mybinder.org* Adapted from the spacy course template by ines 🤩

Our journey toward scalable self-paced e-learning ⛰️* Many classes using R for stats* Student's R skills vary LOTS!* Valuable class time is spent re-teaching the basics of R * Created learnr.numbat.space* Interactive in-browser code exercises with code ran remotely on mybinder.org* Adapted from the spacy course template by ines 🤩* Interactive in-browser code exercises ran locally with webr!* Faster, cheaper, more scalable* More options for checking code* Extension for quarto docsquarto-webr-teachr

Live Demos

  • https://workshop.nectric.com.au/user2024/exercises.html