Tidymodels Computing Supplement

Author

Max Kuhn

Published

2024-09-23

Preface

This is a computing supplement to the main website that uses the tidymodels framework for modeling. The structure is similar to the website, but the content here shows how to use this software (and sometimes others) for each topic.

We also want these materials to be reusable and open. The sources are in the source GitHub repository with a Creative Commons license attached (see below).

To cite this work, we suggest:

@online{aml4td.tidymodels,
  Author = {Kuhn, M and Johnson, K},
  title = {{Tidymodels Computing Supplement to Applied Machine Learning for Tabular Data}},
  year = {2023},
  url = { https://tidymodels.aml4td.org},
  urldate = {2024-09-23}
}

License

This work is licensed under CC BY-SA 4.0

Intended Audience

Readers should have used R before but do not have to be experts. If you are new to R, we suggest taking a look at R for Data Science.

You do not have to be a modeling expert either. We hope that you have used a linear or logistic regression before and understand basic statistical concepts such as correlation, variability, probabilities, etc.

How can I ask questions?

If you have questions about the content, it is probably best to ask on a public forum, like cross-validated or Posit Community. You’ll most likely get a faster answer there if you take the time to ask the questions in the best way possible.

If you want a direct answer from us, you should follow what I call Yihui’s Rule: add an issue to GitHub (labeled as “Discussion”) first. It may take some time for us to get back to you.

If you think there is a bug, please file an issue.

Can I contribute?

There is a contributing page with details on how to get up and running to compile the materials (there are a lot of software dependencies) and suggestions on how to help.

If you just want to fix a typo, you can make a pull request to alter the appropriate .qmd file.

Please feel free to improve the quality of this content by submitting pull requests. A merged PR will make you appear in the contributor list.

Computing Notes

Quarto was used to compile and render the materials

Quarto 1.6.3
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.6.3
[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)
[✓] Checking LaTeX....................OK
      Using: TinyTex
      Version: 2023
[✓] Checking basic markdown render....OK
[✓] Checking Python 3 installation....OK
      Version: 3.9.18
      Jupyter: (None)
      Jupyter is not available in this Python installation.
[✓] Checking R installation...........OK
      Version: 4.4.1
      LibPaths:
      knitr: 1.48
      rmarkdown: 2.28
[✓] Checking Knitr engine render......OK

R version 4.4.1 (2024-06-14) was used for the majority of the computations. torch 2.0.1 was also used. The versions of the primary R modeling and visualization packages used here are:

aorsf (0.1.5) bestNormalize (1.9.1) brulee (0.3.0)
caret (6.0-94) Cubist (0.4.4) dials (1.3.0)
dimRed (0.2.6) downlit (0.4.4.9000) dplyr (1.1.4)
e1071 (1.7-16) embed (1.1.4) fastICA (1.2-5.1)
finetune (1.2.0) ggplot2 (3.5.1) gt (0.11.0)
hardhat (1.4.0) hstats (1.2.1) igraph (2.0.3)
lme4 (1.1-35.5) Matrix (1.7-0) mixOmics (6.25.1)
modeldata (1.4.0) modeldatatoo (0.3.0) naniar (1.1.0)
parsnip (1.2.1) patchwork (1.3.0) probably (1.0.3)
purrr (1.0.2) ragg (1.3.3) RANN (2.6.2)
recipes (1.1.0) rmarkdown (2.28) rsample (1.2.1)
RSpectra (0.16-2) rstudioapi (0.16.0) rules (1.0.2)
sf (1.0-17) sfd (0.1.0) spatialsample (0.5.1)
splines2 (0.5.3) stopwords (2.3) textrecipes (1.0.6)
tidymodels (1.2.0) tidyr (1.3.1) tidysdm (0.9.5)
torch (0.13.0) tune (1.2.1) usethis (3.0.0)
viridis (0.6.5) workflows (1.1.4) workflowsets (1.1.0)
xml2 (1.3.6) yardstick (1.3.1)