Lab 01 - Hello R!

Learning goals

Terminology

We’ve already thrown around a few new terms, so let’s define them before we proceed.

I like to think of R as the engine of the car, and RStudio is the dashboard.

Starting slow

As the labs progress, you are encouraged to explore beyond what the labs dictate; a willingness to experiment will make you a much better programmer. Before we get to that stage, however, you need to build some basic fluency in R. Today we begin with the fundamental building blocks of R and RStudio: the interface, reading in data, and basic commands.

And to make versioning simpler, this is a solo lab. Additionally, we want to make sure everyone gets a significant amount of time at the steering wheel.

Getting started

Download R

If you don’t have R installed.

Go to the CRAN and download R, make sure you get the version that matches your operating system.

If you have R installed

If you have R installed run the following code

R.version
               _                           
platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          4                           
minor          0.5                         
year           2021                        
month          03                          
day            31                          
svn rev        80133                       
language       R                           
version.string R version 4.0.5 (2021-03-31)
nickname       Shake and Throw             

This should tell you what version of R you are currently using. If your R version is lower then 3.6.0 I would strongly recommend updating. In general it is a good idea to update your R version, unless you have a project right now that depend on a specific version of R.

Download RStudio

We recommend using RStudio as your IDE if you don’t already have it installed. You can go to the RStudio website to download and install the software.

Launch RStudio

You can also open the RStudio application first and then create a project by going file -> new project...

Create a new Rmarkdown file

file -> new file -> R markdown...

Hello RStudio!

RStudio is comprised of four panes.

Packages

R is an open-source language, and developers contribute functionality to R via packages. In this lab we will work with three packages: palmerpenguins which contains the dataset, and tidyverse which is a collection of packages for doing data analysis in a “tidy” way.

Load these packages by running the following in the Console.

library(tidyverse)
library(tidymodels)
library(palmerpenguins)

#install.packages("devtools")
devtools::install_github("tidymodels/parsnip")

If you haven’t installed these packages yet and R complains, then you can install these packages by running the following command. (Note that R package names are case-sensitive)

install.packages(c("tidyverse", "palmerpenguins"))

Note that the packages are also loaded with the same commands in your R Markdown document.

Warm up

Before we introduce the data, let’s warm up with some simple exercises.

The top portion of your R Markdown file (between the three dashed lines) is called YAML. It stands for “YAML Ain’t Markup Language”. It is a human friendly data serialization standard for all programming languages. All you need to know is that this area is called the YAML (we will refer to it as such) and that it contains meta information about your document.

YAML

Open the R Markdown (Rmd) file in your project, change the author name to your name, and knit the document.

Data

The data frame we will be working with today is called penguins and it’s in the palmerpenguins package.

count the number of species and islands with dplyr::count()

Visualize the distribution of body_mass_g with ggplot

ggplot(penguins, aes(body_mass_g)) +
  geom_histogram()

Look at the correlation between body_mass_g and some of the other variables

ggplot(penguins, aes(body_mass_g, ___)) +
  geom_point()

Modeling

Fit a linear model using parsnip to model body_mass_g

lm_spec <- linear_reg() %>%
  set_engine("lm")

lm_fit <- lm_spec %>%
  fit(___ ~ species + island + bill_length_mm + bill_depth_mm + flipper_length_mm, 
      data = penguins)

lm_fit

Get parameter estimates:

tidy(lm_fit)

Figures

You’re done with the data analysis exercises, but we’d like you to do two more things:

Click on the gear icon in on top of the R Markdown document, and select “Output Options…” in the dropdown menu. In the pop up dialogue box go to the Figures tab and change the height and width of the figures, and hit OK when done. Then, knit your document and see how you like the new sizes. Change and knit again and again until you’re happy with the figure sizes. Note that these values get saved in the YAML.

You can also use different figure sizes for different figures. To do so click on the gear icon within the chunk where you want to make a change. Changing the figure sizes added new options to these chunks: fig.width and fig.height. You can change them by defining different values directly in your R Markdown document as well.

Once again click on the gear icon in on top of the R Markdown document, and select “Output Options…” in the dropdown menu. In the General tab of the pop up dialogue box try out different syntax highlighting and theme options. Hit OK and knit your document to see how it looks. Play around with these until you’re happy with the look.

Optional

If you have time you can explore the different ways you can add styling to your rmarkdown document.

Here is a cheatsheet

and a markdown cheatsheet


This set of lab exersixes have been adopted from Mine Çetinkaya-Rundel’s class Introduction to Data Science.