We’ve already thrown around a few new terms, so let’s define them before we proceed.
I like to think of R as the engine of the car, and RStudio is the dashboard.
As the labs progress, you are encouraged to explore beyond what the labs dictate; a willingness to experiment will make you a much better programmer. Before we get to that stage, however, you need to build some basic fluency in R. Today we begin with the fundamental building blocks of R and RStudio: the interface, reading in data, and basic commands.
And to make versioning simpler, this is a solo lab. Additionally, we want to make sure everyone gets a significant amount of time at the steering wheel.
Go to the CRAN and download R, make sure you get the version that matches your operating system.
If you have R installed run the following code
R.version
_
platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major 4
minor 0.5
year 2021
month 03
day 31
svn rev 80133
language R
version.string R version 4.0.5 (2021-03-31)
nickname Shake and Throw
This should tell you what version of R you are currently using. If your R version is lower then 3.6.0 I would strongly recommend updating. In general it is a good idea to update your R version, unless you have a project right now that depend on a specific version of R.
We recommend using RStudio as your IDE if you don’t already have it installed. You can go to the RStudio website to download and install the software.
You can also open the RStudio application first and then create a project by going file -> new project...
file -> new file -> R markdown...
RStudio is comprised of four panes.
On the bottom left is the Console, this is where you can write code that will be evaluated. Try typing 2 + 2
here and hit enter, what do you get?
On the bottom right is the Files pane, as well as other panes that will come handy as we start our analysis.
If you click on a file, it will open in the editor, on the top left pane.
Finally, the top right pane shows your Environment. If you define a variable it would show up there. Try typing x <- 2
in the Console and hit enter, what do you get in the Environment pane?
R is an open-source language, and developers contribute functionality to R via packages. In this lab we will work with three packages: palmerpenguins
which contains the dataset, and tidyverse
which is a collection of packages for doing data analysis in a “tidy” way.
Load these packages by running the following in the Console.
library(tidyverse)
library(tidymodels)
library(palmerpenguins)
#install.packages("devtools")
devtools::install_github("tidymodels/parsnip")
If you haven’t installed these packages yet and R complains, then you can install these packages by running the following command. (Note that R package names are case-sensitive)
install.packages(c("tidyverse", "palmerpenguins"))
Note that the packages are also loaded with the same commands in your R Markdown document.
Before we introduce the data, let’s warm up with some simple exercises.
The top portion of your R Markdown file (between the three dashed lines) is called YAML. It stands for “YAML Ain’t Markup Language”. It is a human friendly data serialization standard for all programming languages. All you need to know is that this area is called the YAML (we will refer to it as such) and that it contains meta information about your document.
Open the R Markdown (Rmd) file in your project, change the author name to your name, and knit the document.
The data frame we will be working with today is called penguins
and it’s in the palmerpenguins
package.
library(tidyverse)
library(tidymodels)
library(palmerpenguins)
penguins
count the number of species
and island
s with dplyr::count()
Visualize the distribution of body_mass_g
with ggplot
ggplot(penguins, aes(body_mass_g)) +
geom_histogram()
Look at the correlation between body_mass_g
and some of the other variables
ggplot(penguins, aes(body_mass_g, ___)) +
geom_point()
Fit a linear model using parsnip
to model body_mass_g
<- linear_reg() %>%
lm_spec set_engine("lm")
<- lm_spec %>%
lm_fit fit(___ ~ species + island + bill_length_mm + bill_depth_mm + flipper_length_mm,
data = penguins)
lm_fit
Get parameter estimates:
tidy(lm_fit)
You’re done with the data analysis exercises, but we’d like you to do two more things:
Click on the gear icon in on top of the R Markdown document, and select “Output Options…” in the dropdown menu. In the pop up dialogue box go to the Figures tab and change the height and width of the figures, and hit OK when done. Then, knit your document and see how you like the new sizes. Change and knit again and again until you’re happy with the figure sizes. Note that these values get saved in the YAML.
You can also use different figure sizes for different figures. To do so click on the gear icon within the chunk where you want to make a change. Changing the figure sizes added new options to these chunks: fig.width
and fig.height
. You can change them by defining different values directly in your R Markdown document as well.
Once again click on the gear icon in on top of the R Markdown document, and select “Output Options…” in the dropdown menu. In the General tab of the pop up dialogue box try out different syntax highlighting and theme options. Hit OK and knit your document to see how it looks. Play around with these until you’re happy with the look.
If you have time you can explore the different ways you can add styling to your rmarkdown document.
Here is a cheatsheet
and a markdown cheatsheet
This set of lab exersixes have been adopted from Mine Çetinkaya-Rundel’s class Introduction to Data Science.