Wednesday

Time: 09:00 - 09:04

Welcome to rstudio::conf 2020

Speaker: Hadley Wickham

Location: Grand Ballroom A

Category: NA

Time: 09:04 - 10:00

Open Source Software for Data Science

Speaker: J.J. Allaire

Location: Grand Ballroom A

Category: Keynote

Open-source software is fundamentally necessary to ensure that the tools of data science are broadly accessible, and to provide a reliable and

trustworthy foundation for reproducible research. This talk will delve into why open source software is so important and discuss the role of corporations as stewards of open source software. I'll also talk about how RStudio is structured and organized to pursue its mission of creating open source software for data science.

Time: 10:00 - 11:00

Data, visualization, and designing with AI

Speaker: Fernanda Viegas, Martin Wattenberg

Location: Grand Ballroom A

Category: Keynote

Recent progress in machine learning has raised a series of urgent questions: How can we train and debug deep learning models? How can we understand

what is going on inside a neural network? And, perhaps most important, how can we design systems that serve people best? We'll show a series of examples from the People+AI Research (PAIR) initiative at Google--ranging from data visualizations for researchers, to tools for medical practitioners, to guidelines for designers--that illustrate how thinking carefully about data can lead to better tools, more effective design, and help humans and AI work together.

Time: 11:30 - 11:51

Case Studies in Customer Success

Speaker: Katie Masiello

Location: Imperial Ballroom

Category: Case Study

The path to becoming a world-class, data-driven organization is daunting. The challenges you will likely face along the way can be thorny, and in

some cases, seem outright impossible to overcome. How do you get teams that traditionally butt heads, such as IT and data science, to complement each other and work in unison? How can you efficiently scale the scope and reach of your data products as requirements change? Your time should be spent doing truly valuable work instead of updating charts and reports. How do you prevent the support structure behind your platform from toppling like a house of cards? Despite these challenges, we think that the end result is worth it: an organization that is equipped to make important decisions, with confidence, using data analysis that comes from a sustainable environment. We see this outcome every day.

Meet You Where You R

Speaker: Lauren Chadwick

Location: Grand Ballroom A

Category: Education

At RStudio, we wake up and go to bed thinking about the positive impact that open source work and data science has had and can have on the world.

To maximize this impact, we find three areas of investment absolutely critical to ensure our open source community keeps up with the world’s changes and outlives us all: 1. Find ways to make R more approachable. 2. Enable teams of all types & sizes (educational, professional, etc.) to be able to leverage the work they’re doing in R, and effortlessly communicate that work to others. 3. Extend the language so our open-source community can continue to be at the forefront of innovation, no matter their preference of tool or language. Underpinning these investments is also the core belief that every data scientist, regardless of skill level, use-case, or professional experience, is an asset to our community. Whether you’re a student currently learning R, a Python fan looking to become multilingual, or the Head of Data Science at NASA, we want you to become a part of our journey. In return, we’ll do our best to ensure that journey is a fulfilling endeavor. This presentation will take a deeper dive into the ways in which you can utilize RStudio's educational offerings and enterprise toolchain in personal, educational, and corporate settings. educational offerings and enterprise toolchain in personal, educational, and corporate settings.

Deploying End-To-End Data Science with Shiny, Plumber, and Pins

Speaker: Alex Gold

Location: Grand Ballroom B

Category: Production

It’s easier than ever to craft a complete R-centric data science pipeline thanks to packages like Shiny, Plumber, and Pins. In this talk, you’ll learn

how to use R to bring your modeling and visualization work into production. You’ll walk away with recipes, tips, and tricks to deploy data, models, and apps to ensure your work is as impactful as possible.

Simplified Data Quality Monitoring of Dynamic Longitudinal Data: A Functional Programming Approach

Speaker: Jacqueline Gutman

Location: Plaza Room

Category: Programming

Ensuring the quality of data we deliver to customers or provide as inputs to models is often one of the most under-appreciated and yet time-consuming

responsibilities of a modern data scientist. This task is challenging enough when working with static data, but when we have access to dynamic, longitudinal, continuously updating data, that complexity can become an asset. We will demonstrate how to to simplify data quality monitoring of dynamic data with a functional programming approach that enables early and actionable detection of data quality concerns. Using purrr as well as tidyr and nested tibbles, we will illustrate the five key pillars of enjoyable, user-friendly data quality monitoring with relevant R code: Readability, Reproducibility, Efficiency, Robustness, and Compositionality. Readability: FP empowers us to abstract away from the mechanics and implementation of comparing two or more related datasets and move towards declaring the intent of features and metrics we want to compare. Reproducibility: By avoiding side-effects and dependencies on external states and inputs, and using functional units which can be easily tested over a variety of inputs, FP reduces the burden to create reproducible code. Perhaps more importantly, FP supports not just reproducibility of results, but reproducibility of workflows that can be continually applied to dynamic datasets. Efficiency: FP enables more efficient code through lazy evaluation, caching, and simplifying implementation over parallel backends. Robustness: FP allows greater testability of our code through modularization and elegant error- handling, with customized fail-safes for data that differs in expected ways over time. Compositionality: FP encourages higher-level reasoning with functions, which in turn drives both readability--through higher-level, more abstract code--and robustness, through modifying function behavior in case errors are encountered.

Time: 11:53 - 12:15

How Vibrant Emotional Health Connected Siloed Data Sources and Streamlined Reporting Using R

Speaker: Sean Murphy

Location: Imperial Ballroom

Category: Case Study

Vibrant Emotional Health is the mental health not-for-profit behind the US National Suicide Prevention Lifeline, New York City's NYC Well program, and

various other emotional health contact center programs and direct services. We engage in emotionally charged conversations with people experiencing a wide variety of mental health and emotional concerns, our programs vary in scope, in resources, and span several technologies. In addition, our data collection and reporting requirements change dynamically in response to emerging clinical needs and reporting requirements from our sponsors. In short, the data we collect is complex, often unstructured, and stored in a variety of sources. In this context, R Markdown Documents have allowed us to interface directly with multiple databases, Google Sheets, API's, csv's, and JSON stores to generate integrated reports. Organizing these reports into R packages with accompanying functions that standardize the calculation of KPI's and apply consistent themes across analyses has allowed us to improve the clarity and aesthetics of our reporting while reducing manual work that was previously needed to produce these reports. Building on this framework we have developed functions to standardize data connections, create reusable data visualizations, and generate reproducible analyses in response to ad hoc analytic requests. These same functions also facilitate the creation of Shiny dashboards where core visualizations that were previously only available in static reports can be manipulated directly by end users to explore clinical and operational trends. These dashboards also facilitate self service reporting by end users. We present here the framework we have developed for our organization wide and program specific packages, the types of functions and artifacts they include and our plans for future development.

Data Science Education in 2022

Speaker: Carl Howe, Greg Wilson

Location: Grand Ballroom A

Category: Education

More people are learning data science every day, and there are more ways for them to learn than ever before. To understand where we are and where we

might be going, this talk looks at what data science education could look like two years from now: far enough away that we can dream, but close enough that we can only dream a little. We explore the balance between automated and collaborative learning, different ways to deliver different kinds of lessons to different kinds of people, and ways in which our tools and practices could improve.

We’re hitting R a million times a day so we made a talk about it

Speaker: Heather Nolis, Jacqueline Nolis

Location: Grand Ballroom B

Category: Production

Often reserved for Elite Engineers, production can be a perilous place for R users - but never fear! For the past year, we at T-Mobile have been

sludging through production outages, nation-wide product launches, and all of the muck that floods from R models being hit over a million times every day. From “we’re strictly a java shop” to a devops team that proudly states “we support Java, node, and R,” this talk will cover the technical hiccups, interdisciplinary communication struggles, and an open-source R package {loadtest} that’s changed the way our team views performance testing. You too can dazzle your enterprise with the power of R.

vctrs: Creating custom vector classes with the vctrs package

Speaker: Jesse Sadler

Location: Plaza Room

Category: Programming

The base R types of vectors enable the representation of an amazingly wide array of data types. There is so much you can do with R. However, there

may be times when your data does not fit into one of the base types and/or you want to add metadata to vectors. vctrs is a developer-focused package that provides a clear path for creating your own S3-vector class, while ensuring that the classes you build integrate into user expectations for how vectors work in R. This presentation will discuss the why and how of using vctrs through the example of debkeepr, a package for integrating historical non-decimal currencies such as pounds, shillings, and pence into R. The presentation will provide a step-by-step process for developing various types of vectors and thinking through the design process of how vectors of different classes should work together.

Time: 12:16 - 12:38

Building a new data science pipeline for the FT with RStudio Connect

Speaker: George Kastrinakis

Location: Imperial Ballroom

Category: Case Study

We have recently implemented a new Data Science workflow and pipeline, using RStudio Connect and Google Cloud Services. This has vastly decreased

our pipeline complexity, allowing us to bring our models and products into scheduled production more quickly. In addition, our workflow, working closely together as a team on all projects on a regular two-week sprint cycle, has increased the range of projects we have been able to take on and complete. To detail some of the key lessons we’ve learned (and some of the difficulties!), we’ll walk you through one of our recent sprints, where we productionalised the generation of a suite of behavioural and demographic features so that they can be more easily plugged in to a range of models and used across the business by the FT’s platform and product teams.

Data science education as an economic and public health intervention in East Baltimore

Speaker: Jeff Leek

Location: Grand Ballroom A

Category: Education

Growth Hacking with R - Product Analytics at Scale using R and RStudio

Speaker: Andrew Mangano

Location: Grand Ballroom B

Category: Production

Salesforce is not only a cloud software solution out of the box, but also a highly customizable platform that can be modified for a wide range of

use cases. In addition to complexity, customer trust is our #1 company value and customer data privacy is abstracted from everyone outside of the customer. Product and Growth Analytics is an emerging field separate from business analytics and data science and focuses on building software product that improve user retention and engagement. Companies like Facebook and AirBnB have robust data science teams focused on product analytics. At Salesforce however, given the scale, customization, and privacy values, product data science is not so straightforward. Utilizing R and Rstudio tools for collaboration and reproducible analytics, the Data Intelligence team is able to solve complex problems at enterprise scale. This talk will preview anonymized predictive and growth analytics work while also highlighting how we work and collaborate cross platform and languages (Python via reticulate).

Asynchronous programming in R

Speaker: Winston Chang

Location: Plaza Room

Category: Programming

Writing regular R code is straightforward: you tell R to do something, it does it, and then it returns control back to you. This is called synchronous

programming. However, if you use R to coordinate threads, processes, or network communication, the regular model may be unable to do what you want, or it may only be able to do it with a significant performance penalty. In this talk I'll explain how asynchronous programming with the later package can handle these kinds of programming problems. I'll also show how to provide a synchronous interface for asynchronous code, so that users will have a simple, familiar way to use your code.

Time: 12:39 - 12:59

How to win an AI Hackathon, without using AI

Speaker: Colin Gillespie

Location: Imperial Ballroom

Category: Case Study

Anyone reading a newspaper or listening to the news is led to believe that AI is the solution to all problems. From self-driving cars to detecting

disease to catching fraud, there doesn’t seem to be a situation that AI can’t tackle. Once “big data” is thrown into the mix, the AI solution is all but certain. But is AI always needed? Over the last eighteen months, Jumping Rivers has entered (and won) four Hackathons. All Hackathons were characterised with “big data” and the need to improve prediction. All Hackathons were won without using AI (or any sort of machine learning). This talk will focus on one particular competition around reducing leakage at Northumbrian Water. Using a combination of R, Shiny, and tidyverse (and a few other tricks), we were able to demonstrate within the short Hackathon time frame that clear presentation of data to the front line engineers was more likely to reduce leakage, than simply providing vague estimates of a potential future leak

Of Teacups, Giraffes, & R Markdown

Speaker: Desiree De Leon

Location: Grand Ballroom A

Category: Education

How do you make your R Markdown lessons feel friendly for learners you’ll never meet? How do you make it engaging so they sit and stay a while? How

do you make it memorable so they come back to visit again? In this talk, I’ll share lessons learned from my experience of making a series of online statistics modules (co-authored by Hasse Walum) that feel accessible and fun-- housed entirely in an R Markdown site, complete with a whimsical, illustrated narrative about teacup giraffes. I’ll show how adding good characters with your audience in mind, good design, and good play helped me make the most of HTML output. To help you get started, I’ll share resources that Alison Hill and I have developed--including a series of cookbooks and out-of-the-box templates-- so that you will have a leg up on applying these ideas to R Markdown collections of your own.

Practical Plumber Patterns

Speaker: James Blair

Location: Grand Ballroom B

Category: Production

Plumber is a package that allows R users to create APIs out of R functions. This flexible approach allows R processes to be accessed by toolchains and

frameworks outside of R. In this talk, we'll look at useful patterns for developing and working with robust APIs built in R using Plumber.

Azure Pipelines and GitHub Actions

Speaker: Jim Hester

Location: Plaza Room

Category: Programming

Open source R packages on GitHub often take advantage of continuous integration services to automatically check their packages for errors. This is

very useful to catch things quickly, as well and increasing confidence for proposed changes, as the Pull Requests can be checked before they are merged. Travis-CI and Appveyor are the most popular current methods. However newer services, Azure Pipelines and GitHub Actions, show promise for being more powerful and simpler to configure and debug. I will discuss these services and demonstrate some of their capabilities and how to configure them for your own use in packages and reports.

Time: 14:15 - 14:37

If you build it, they will come...but then what? Facilitating communities of practice in R

Speaker: Kate Hertweck

Location: Grand Ballroom A

Category: Community

Why did you learn R? Chances are good that if you're an attendee of rstudio::conf, you've found a community of R coders who are willing to share

their knowledge and learn with you. While it's possible to develop expert R coding skills in isolation, most software development and data analysis projects benefit from groups of people working collaboratively, and R communities are unparalleled in their inclusivity and commitment to learning collectively. Such communities, whether they support R coders at a single institution, geographic region, or online, require deliberate planning and effort to develop and sustain. How do you create a group culture that encompasses R users of various skill levels who may be working on diverse problems? How do you assess what members of a community need or prefer? How do you encourage investment and cohesion so the group will sustain itself? This talk will describe potential pitfalls and impediments to creating and facilitating cooperative learning communities for R coding, and will allow you to identify potential strategies for overcoming these challenges so you can continue giving back to the R communities that supported you along the way.

15 Years of R in Quantitative Finance

Speaker: Brandon Farr

Location: Plaza Room

Category: Finance

Use of R in the investment industry is established and growing. This talk will discuss changes seen in 15 years of practice within asset management

firms. I hope discussion of lessons learned and recommendations will benefit those currently in finance and those interested in hearing how the flexibility of R manifests in the financial world.

Accelerating Analytics with Apache Arrow

Speaker: Neal Richardson

Location: Imperial Ballroom

Category: Interface

The Apache Arrow project is a cross-language development platform for in-memory data designed to improve system performance, memory use, and

interoperability. This talk presents recent developments in the 'arrow' package, which provides an R interface to the Arrow C++ library. We'll cover the goals of the broader Arrow project, how to get started with the 'arrow' package in R, some general concepts for working with data efficiently in Arrow, and a brief overview of upcoming features.

Production-grade Shiny Apps with golem

Speaker: Colin Fay

Location: Grand Ballroom B

Category: Shiny

Shiny is an amazing tool when it comes to creating web applications with R. Almost anybody can get a small Shiny App in a matter of minutes, provided

they have a basic knowledge of R. As of today, we can safely tell that it has become the de-facto tool for web application in the R world. Building a proof-of-concept application is easy, but things change when the application becomes larger and more complex, and especially when it comes to sending that app to production—until recently there hasn't been any real framework for building and deploying production-grade Shiny Apps. This is where 'golem' comes into play: offering Shiny developers an opinionated framework for creating production-ready Shiny Applications. With 'golem', Shiny developers now have a toolkit for making a stable, easy-to-maintain, and robust for production web application with R. 'golem' has been developed to abstract away the most common engineering tasks (for example, module creation, addition of external CSS or JavaScript file, ...), so you can focus on what matters: building the application. And once your application is ready to be deployed, 'golem' guides you through testing, and brings you tools for deploying to common platforms. In this talk, Colin and Vincent will present the 'golem' package, first talking about the "why 'golem'?", then presenting the general philosophy behind this framework, and help you get started building your first Shiny App with 'golem'.

Time: 14:38 - 15:00

Embracing R in the Geospatial Community

Speaker: Tina Cormier

Location: Grand Ballroom A

Category: Community

Geospatial analysts work in a wide range of positions within almost every industry. They work in government, non-profit, academic, and private

institutions using geospatial data and technology to answer questions about the environment, agriculture, climate, urban planning and design, marketing, public health, transportation, and myriad other topics. A typical day may include data prep/cleaning, field work, cartography, image analysis, vector analysis, feature engineering, modeling, or database management. This diverse group necessarily uses a diverse set of tools. In this talk, we will explore how R fits into the spatial analyst’s toolkit. What does the geo community think of R? Who uses it? What groups avoid it? What geo-packages are used most? How can we, as a community, make R more appealing for geospatial scientists?

Deep Learning Extraction for Counterparty Risk Signals from a Corpus of Millions of Documents

Speaker: Moody Hadi

Location: Plaza Room

Category: Finance

China has been experiencing rapid growth over the last decade due to economically friendly reforms and a growing skilled and young population. With

this increasing growth, China’s interconnectedness with the global economy has increased significantly. In parallel to this economic evolution, technology has experienced rapid acceleration, which has enabled firms and governments to track and record vast amounts of data. The side effect of this unstructured big data growth is that datasets may be polluted, meaning information can be conflicting, missing, and/or unreliable. This creates a gap in the ability to provide transparency to the exposed firms importing from China: both timely early warning signals and wide coverage of small- and medium-sized enterprises (SMEs). We have been able to address this problem for our end-users by using deep learning to extract information value and opinion from a public corpus to create the needed transparency. Our data science & machine learning stack uses connect, shiny, reticulate, tensorflow and scikit-learn to build the interactive solution to our clients and deploy it using spark and airflow.

Updates on Spark, MLflow, and the broader ML ecosystem

Speaker: Javier Luraschi

Location: Imperial Ballroom

Category: Interface

Making the Shiny Contest

Speaker: Mine Çetinkaya-Rundel

Location: Grand Ballroom B

Category: Shiny

In January 2019 RStudio launched the first-ever Shiny contest to recognize outstanding Shiny applications and to share them with the community.

We received 136 submissions for the contest and reviewing them was incredibly inspiring and humbling. In this talk, we shine a spotlight on the backstage: the inspiration behind the contest, the process of evaluation, what we learned about Shiny developers and how we can better support them, and what we learned about running contests and how we hope to improve the Shiny Contest experience. We also highlight some of the winning apps as well as the newly revamped Shiny Gallery, which features many noteworthy contest submissions. Finally, we introduce the new process for submitting your apps to the Shiny Gallery and, of course, to Shiny Contest 2020!

Time: 15:01 - 15:23

The development of "datos" package for the R4DS Spanish translation

Speaker: Riva Quiroga

Location: Grand Ballroom A

Category: Community

Rpanda trading simulation - from an idea to a multi-user shiny app

Speaker: Nima Safaian

Location: Plaza Room

Category: Finance

The idea of rpanda commodities trading simulation was many years in the making. As energy trading professionals working in the industry, we had

developed insights around how to make risk/reward market calls, and what skills make someone an exceptional commodities trader. Traders are one of the most expensive seats in terms of monetizing value from the assets. We developed rpanda as a simulated environment which replicates closely how real- life physical commodities trading works in order to assist talent development and selection, both in academics and enterprise. My co-founder and I did not know how to design production-ready software, but we always had used R/Shiny for market analysis in our corporate jobs. Rather than hiring expensive app developers, we decided to do it ourselves. We used Rstudio development stack such as Rstudio Connect and open source tools, like plumber to turn our idea into a production-ready app that is used by University of Alberta classes. In this presentation, we share our journey, technical challenges, and how we overcame them.

What's new in TensorFlow for R

Speaker: Daniel Falbel

Location: Imperial Ballroom

Category: Interface

TensorFlow is the most popular open-source platform for machine learning and it's ecosystem is evolving incredibly fast. In this talk we will explore

what's new in TensorFlow 2.0 as well as how to build data pre-processing pipelines using the tfdatasets package and how to use pre-trained models with tfhub.

Styling Shiny apps with Sass and Bootstrap 4

Speaker: Joe Cheng

Location: Grand Ballroom B

Category: Shiny

Customizing the style--fonts, colors, margins, spacing--of Shiny apps has always been possible, but never as easy as we’d like it to be. Canned

themes like those in the shinythemes package can easily make apps look slightly less generic, but that’s small consolation if your goal is to match the visual style of your university, corporation, or client. In theory, one can "just" use CSS to customize the appearance of your Shiny app, the same as any other web application. But in practice, the use of large CSS frameworks like Bootstrap means significant CSS expertise is required to comprehensively change the look of an app. Relief is on the way. As part of a round of upgrades to Shiny’s UI, we’ve made fundamental changes to the way R users can interact with CSS, using new R packages we’ve created around Sass and Bootstrap 4. In this talk, we’ll show some of the features of these packages and tell you how you can take advantage of them in your apps.

Time: 15:24 - 15:44

R: Then and Now

Speaker: Jared Lander

Location: Grand Ballroom A

Category: Community

R has changed a lot since the meetup was founded 10 years ago. Back then we were using base graphics (or lattice) and the apply family of functions

and we didn't have pipes. At the time there was an impressive 1800 packages on CRAN, now there are over 15,000 extending R's reach far beyond its traditional domain of statistics and machine learning into publishing, website building and video generation. The community has grown and changed dramatically during that time, with the New York meetup alone going from 25 to over 10,000 members. During this talk we go through a then-and-now of R code and community to palpably see how everything has changed.

The good, the bad and the ugly: What I learned while consulting across the business as a data scient

Speaker: Ben Barnard

Location: Plaza Room

Category: Finance

A collection of data science stories about current problems that data scientists might face while working in academia, industry, and government. Some

lessons learned, some situations avoided, what I learned, and how I survived my journey. First, I discuss the struggle of advocating for R when senior leaders decide Python is the only appropriate product. Then, I describe why donut charts are superior to pie charts, and why we should all be using them. Finally, the case of the uncatchable “drive-by” stakeholder and where to find them. The fight is real, and the path is long for the evangelical data scientist.

Deep Learning with R

Speaker: Paige Bailey

Location: Imperial Ballroom

Category: Interface

Reproducible Shiny apps with shinymeta

Speaker: Carson Sievert

Location: Grand Ballroom B

Category: Shiny

Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where

others can quickly explore different variables, parameter values, models/algorithms, etc. Although the interactivity is great for many reasons, once an interesting result is found, it’s more difficult to prove the correctness of the result since: (1) the result can only be (easily) reproduced via the Shiny app and (2) the relevant domain logic which produced the result is obscured by Shiny’s reactive logic. The R package shinymeta provides tools for capturing and exporting domain logic for execution outside of a Shiny runtime (so that others can reproduce Shiny-based result(s) from a new R session).

Time: 16:00 - 16:23

Journalism with RStudio, R, and the tidyverse

Speaker: Larry Fenn

Location: Imperial Ballroom

Category: Case Study

The Associated Press data team primarily uses R and the tidyverse as the main tool for doing data processing and analysis. In this talk, some of

the technology behind the published stories will be showcased: - Using dbplyr to work off a hosted database containing 380 million opioid records to identify "pill mills". - Using open-sourced AP style templates for R Markdown and ggplot to quickly produce graphics and reports off breaking news. - Using R Markdown and htmlwidgets to give reporters and editors interactive reports to identify reporting leads.

Flipbooks

Speaker: Evangeline Reynolds

Location: Grand Ballroom B

Category: Learning and Using R

Good examples facilitate accomplishing new or unpracticed tasks in a programmatic workflow. Tools for communicating examples have improved in recent

years. Especially embraced are tools that show code and its resultant output immediately thereafter --- the case of `Jupytr` notebooks and `Rmarkdown` documents. But creators using these tools often must choose between big-picture or narrow-focus demonstration; creators tend to either demo a complete code pipeline that accomplishes a realistic task or instead demonstrate a minimal example which makes clear the behavior of a particular function, but how it might be used in a larger project isn't clear. Flipbooks help address this problem, allowing the creator to present a full demonstration which accomplishes a real task, and gives the viewer the opportunity to focus on unfamiliar steps. A set of flipbook building functions parse code in a data manipulation or visualization pipeline and then build it back up incrementally. Aligned superimposition of new code and output atop previous code and output makes it easy to identify how each code change triggers changes in output. The presentation will guide attendees in creating their own Flipbooks (with Xaringan slides) or mini Flipbooks (gif output).

Approaches to Assay Processing Package Validation

Speaker: Ellis Hughes

Location: Plaza Room

Category: Pharma

In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/

AIDS Research and Prevention) and the lessons learned while creating packages as a team. Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP’s work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability. SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R’s packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together to develop the package, taking advantage of the rich R ecosystem of packages for development such as roxygen2, devtools, usethis, and testthat. Once the code has been developed, the package is validated to ensure it passes all specifications using a combination of testthat and rmarkdown. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials.

Getting things logged

Speaker: Gergely Daroczi

Location: Grand Ballroom A

Category: Programming

One of the greatest strength of R is the ease and speed of developing a prototype (let it be a report or dashboard, a statistical model or rule-based

automation to solve a business problem etc), but deploying to production is not a broadly discussed topic despite its importance. This hands-on talk focuses on best practices and actual R packages to help transforming the prototypes developed by business analysts and data scientist into production jobs running in a secured and monitored environment that is easy to maintain -- discussing the importance of logging, securing credentials, effective helper functions to connect to database, open-source and SaaS job schedulers, dockerizing the run environment and scaling infrastructure.

Time: 16:23 - 16:45

Putting the Fun in Functional Data: A tidy pipeline to identify routes in NFL tracking data

Speaker: Dani Chu

Location: Imperial Ballroom

Category: Case Study

Currently in football many hours are spent watching game film to manually label the routes run on passing plays. Using tracking data, each route can

be described as a sequence of spatial-temporal measurements that varies in length depending on the duration of the play. This data can be conveniently analyzed using nested columns in tidyr and purrr. We demonstrate how model-based curve clustering using Bernstein polynomial basis functions (i.e. Bézier curves) fit using the Expectation Maximization algorithm can cluster route trajectories. Each cluster can then be labelled to obtain route names for each route and create route trees for all receivers. The clusters and routes can be visualized nicely using ggplot and seen developing over time using gganimate.

Learning R with humorous side projects

Speaker: Ryan Timpe

Location: Grand Ballroom B

Category: Learning and Using R

What should you name a new dinosaur discovery, according to neural networks? Which season of The Golden Girls should you watch when playing a drinking

game? How can you build a LEGO set for the lowest price? R is constantly evolving, so as users, we’re constantly learning. Over the past few years, I’ve found that working on side projects is great for hands-on learning - and for me, the more absurd the project, the better. Side projects provide a safe, low-stakes environment to learn new packages and methodologies before using them in work or in production. Sharing those projects can help publicize the package and increase its accessibility, benefiting both the original author and future users. In this talk, I’ll share my experiences with side projects for learning state-of-the-art data science tools and growing as an R user, including how one project helped me land my dream job.

Building a native iPad dashboard using plumber and RStudio Connect in Pharma

Speaker: Aymen Waqar

Location: Plaza Room

Category: Pharma

As companies are becoming aware of the need to embrace data-driven solutions, R has gained a huge momentum over recent years. Getting the insights to

users has become a very important factor of Data Scientist work. While our world has advanced there is a need to build not only web applications, but also applications on mobile that are available offline. We would like to share with you how within months we have gone from nothing to a production- ready application that handles 500 concurrent users in healthcare. There are plenty of challenges to solve including restricted environments, internal processes and users availability. We will show you how to overcome them and iterate fast, navigating through complex infrastructure and integrating with proxy architecture to serve applications to end users in compliant manner. With RStudio Connect and Plumber you can deploy a scalable REST API that can feed insights to your users. This allows you to go one step further and implement native applications for tablets and smartphones. With the right tools, mindset and priorities you can achieve personal success by introducing a digital transformation within your organization, starting with something as small as converting a business critical Excel file that is slow, difficult to edit and maintain, to a robust application. Step by step your organization will evolve and become empowered by your insights uncovering even more untapped potential.

Technical debt is a social problem

Speaker: Gordon Shotwell

Location: Grand Ballroom A

Category: Programming

Technical debt is a big problem for the R community. Even though R has excellent support for testing, documentation and packaging code it has the

reputation that it is not suitable for production applications because data scientists don’t pay enough attention to technical debt within their codebases. Most people think of technical debt as an engineering problem. We choose to make our current work cheaper at the expense of needing to do more work down the road. But when you look closely at the root causes of technical debt they are almost always about interpersonal relationships. Developers have trouble empathizing with other users of their code and so don’t spend the time to make that code easy for future developers to use and understand. In this talk I argue that we should think about technical debt as a social problem because it gives us insight into why it’s so hard to pay back. I then provide a practical roadmap of how to introduce best practices into your data science team.

Time: 16:46 - 17:08

R + Tidyverse in Sports

Speaker: Namita Nandakumar

Location: Imperial Ballroom

Category: Case Study

This talk will use a case study, most likely in hockey, to showcase the many ways in which R and the tidyverse can be used to analyze sports data as

well as the unique priorities and considerations that are involved in applying statistical tools to sports problems.

Toward a grammar of psychological experiments

Speaker: Danielle Navarro

Location: Grand Ballroom B

Category: Learning and Using R

Why does a psychological scientist learn a programming language? While motivations are many and varied the two most prominent are data analysis

and data collection. The R programming language is well placed to address the first need, but there are fewer options for programming behavioural experiments within the R ecosystem. The simplest experimental designs can be recast as surveys, for which there are many options, but studies in cognitive psychology, psychophysics or developmental psychology typically require more flexibility. In this talk I outline the design principles behind xprmntr, an R package that provides wrappers to the a javascript library (jsPsych) for constructing web based psychology experiments and uses the plumber package to call server side R code as needed. In doing so, I discuss limitations to the current implementation and what a "grammar of experiments" might look like.

FlatironKitchen: How we overhauled a Frankensteinian SQL workflow with the tidyverse

Speaker: Nathaniel Phillips

Location: Plaza Room

Category: Pharma

FlatironKitchen: How we overhauled a Frankensteinian SQL workflow with the tidyverse to enable fast, reproducible, elegant analyses of electronic

health records. The increasing availability of real-world electronic health record (EHR) data is revolutionising how pharma companies are developing Personalized Healthcare (PHC) solutions. However, the scale and complexity of EHR data pose major challenges in deriving fit-for-purpose insights systematically and efficiently. The conventional approach, where siloed programmers write (or copy and paste) thousands of lines of undocumented, untested, unconnected SAS and SQL code for every research project is bad for business and ultimately for patients. Our team threw out the conventional approach and turned to R and the tidyverse. The result is FlatironKitchen, a modern R package enabling end-to-end EHR analyses in a cohesive, user- centric platform. FlatironKitchen allows users to “pipe their way” from database connections, to calculating derived variables, to running statistical analyses, to creating stunning visualisations. All of the technical details are both fully documented and seamlessly automised allowing users to focus on only meaningful functions that are fit-for-purpose to EHR analyses. The result: FlatironKitchen code is so simple it actually tells a step- by-step, human readable story about what the data scientist is doing-- a far cry from the Frankensteinian SQL/SAS code from the past. FlatironKitchen represents the best of both worlds in pharmaceutical data science. It gives expert data scientists a library of unit-tested, customisable functions for implementing existing procedures and designing new ones. Simultaneously, it enables those who are ‘coding insecure’ to -- finally -- work directly with data by reducing barriers. FlatironKitchen’s simple, easy-to-use syntax, combined with its training library of tutorials, vignettes and lessons made possible through RMarkdown has shown itself to be truly empowering. In addition to showcasing FlatironKitchen, we share lessons learned, and give a call to action for other pharma companies to embrace R.

Parallel computing with R using foreach, future, and other packages

Speaker: Bryan Lewis

Location: Grand Ballroom A

Category: Programming

Steve Weston's foreach package defines a simple but powerful framework for map/reduce and list-comprehension-style parallel computation in R. One of

its great innovations is the ability to support many interchangeable back-end computing systems so that *the same R code* can run sequentially, in parallel on your laptop, or across a supercomputer. Recent new packages like future package define elegant new programming approaches that can use the foreach framework to run across a wide variety of parallel computing systems. This talk introduces the basics of foreach and future packages with examples using a variety of back-end systems including MPI, Redis and R's default parallel package clusters.

Time: 17:09 - 17:29

Making better spaghetti (plots): Exploring the individuals in longitudinal data with the brolgar pac

Speaker: Nicholas Tierney

Location: Imperial Ballroom

Category: Case Study

There are two main challenges of working with longitudinal (panel) data: 1) Visualising the data, and 2) Understanding the model. Visualising

longitudinal data is challenging as you often get a "spaghetti plot”, where a line is drawn for each individual. When overlaid in one plot, it can have the appearance of a bowl of spaghetti. With even a small number of subjects, these plots are too overloaded to be read easily. For similar reasons, it is difficult to relate the model predictions back to the individual and keep the context of what the model means for the individual. For both visualisation, and modelling, it is challenging to capture interesting or unusual individuals, which are often lost in the noise. Better tools, and a more diverse set of grammar and verbs are needed to visualise and understand longitudinal data and models, to capture the individual experiences. In this talk, I introduce the R package, **brolgar** (BRowse over Longitudinal data Graphically and Analytically in R), which provides new tools, verbs, and grammar to identify and summarise interesting individual patterns in longitudinal data. This package extends upon ggplot2 with custom facets, and the new tidyverts time series packages to efficiently explore longitudinal data.

R for Graphical Clinical Trial Reporting

Speaker: Frank Harrell

Location: Grand Ballroom B

Category: Learning and Using R

For clinical trials a good deal of effort goes into producing both final trial reports and interim reports for data monitoring committees, and

experience has shown that reviewers much prefer graphical to tabular reports. Interactive graphical reports go a step further and allow the most important information to be presented by default, while inviting the reviewer to drill down to see other details. The drill-down capability, implemented by hover text using the R plotly package, allows one to almost entirely dispense with tables because the hover text can contain the part of a table that pertains to the reviewer's current focal point in the graphical display, among other things. Also, there are major efficiency gains by having a high-level language for producing common elements of reports related to accrual, exclusions, descriptive statistics, adverse events, time to event, and longitudinal data. This talk will overview the hreport package, which relies on R, RMarkdown, knitr, plotly, Hmisc, and HTML5. RStudio is an ideal report development environment for using these tools.

Using R to Create Reproducible Engineering Test Reports

Speaker: Ana Alyeska Santos, Braulio Cuandon

Location: Plaza Room

Category: Pharma

Engineers at Biosense Webster, a Johnson and Johnson medical device company that specializes in diagnosing and treating cardiac arrhythmias, write

multiple test reports to comply with FDA regulatory standards. These intricate reports require 36 hours of an engineer’s time on average, constraining the engineers from completing investigations and studies in a timely matter. Writing scripts in R that create reproducible reports can significantly reduce the time spent by an engineer creating these reports allowing them to do a much thorough investigation with a larger scope. Through Shiny, engineers could conveniently have their parameters and recorded data processed and stored in a database by accessing a web link and filling out the required information within a user-friendly interface. Upon the generation of the report, accurate and properly formatted test reports, compliant to both the company and FDA regulatory standards, are produced through Rmarkdown and knitr knitting all the outputs with complete data analysis tools such as normality plots and process capability measurements to a word document that follows company required headers, footers, and headings. The reproducible report creation shown in this report can be extended to other types of test reports and protocols. The pilot phase that has been conducted has shown that complete report production has been decreased from 36 hours to an hour.

Future: Simple Async, Parallel & Distributed Processing in R - What's Next?

Speaker: Henrik Bengtsson

Location: Grand Ballroom A

Category: Programming

Future is a minimal and unifying framework for asynchronous, parallel, and distributed computing in R. It is designed for robustness, consistency,

scalability, extendability, and adoptability - all in the spirit of "developer writes code once, user runs it anywhere". It is being used in production for high-performance computing and asynchronous UX, among other things. In this talk, I will discuss common feature requests, recent progress we have made, and what is the pipeline.

Time: 18:30 - 22:30

Wednesday evening event additional guest

Speaker: NA

Location: NA

Category: Evening Event

Bring a guest to our Wednesday evening event at the California Academy of Sciences.

Thursday

Time: 08:00 - 09:00

Spanish Speakers Breakfast

Speaker: NA

Location: NA

Category: NA

Time: 09:00 - 10:00

Object of type ‘closure’ is not subsettable

Speaker: Jenny Bryan

Location: Grand Ballroom A

Category: Keynote

Your first “object of type ‘closure’ is not subsettable” error message is a big milestone for an R user. Congratulations, if there was any lingering

doubt, you now know that you are officially programming! Programming involves considerably more troubleshooting and debugging than many of us expected (or signed up for). The ability to solve your own problems is an incredibly powerful stealth skill that is worth cultivating with intention. This talk will help you nurture your inner problem solver, covering both general debugging methods and specific ways to implement them in the R ecosystem.

Time: 10:30 - 10:52

Branding and Packaging Reports with R Markdown

Speaker: Jake Thompson

Location: Grand Ballroom A

Category: Communication

The creation of research reports and manuscripts is a critical aspect of the work conducted by organizations and individual researchers. Most often,

this process involves copying and pasting output from many different analyses into a separate document. Especially in organizations that produce annual reports for repeated analyses, this process can also involve applying incremental updates to annual reports. It is important to ensure that all relevant tables, figures, and numbers within the text are updated appropriately. Done manually, these processes are often error prone and inefficient. R Markdown is ideally suited to support these tasks. With R Markdown, users are able to conduct analyses directly in the document or read in output from a separate analyses pipeline. Tables, figures, and in-line results can then be dynamically populated and automatically numbered to ensure that everything is correctly updated when new data is provided. Additionally, the appearance of documents rendered with R Markdown can be customized to meet specific branding and formatting requirements of organizations and journals. In this presentation, we will present one implementation of customized R Markdown reports used for Accessible Teaching, Learning, and Assessment Systems (ATLAS) at the University of Kansas. A publicly available R package, ratlas, provides both Microsoft Word and LaTeX templates for different types of projects at ATLAS with their own unique formatting requirements. We will discuss how to create brand-specific templates, as well as how to incorporate the templates into an R package that can be used to unify report creation across an organization. We will also describe other components of branding reports beyond R Markdown templates, including customized ggplot2 themes, which can also be wrapped into the R package. Finally, we will share lessons learned from incorporating the R package workflow into an existing reporting pipeline.

Building a Medical Device with R

Speaker: Ron Keizer

Location: Plaza Room

Category: Medicine

The InsightRX precision dosing platform tailors in-patient drug doses to individual patients' characteristics and biomarkers, leveraging

pharmacological models of drug metabolism and drug effects. These models are implemented in R, exposed through APIs, and called from a cloud-based web application. The core of our pharmacokinetic/pharmacodynamic simulation functionality is available open source at `github.com/InsightRX/PKPDsim` and `github.com/InsightRX/clinPK`. As a regulated device in Europe (and soon to be in the US) used in over 100 hospitals, the platform is necessarily developed under "design control", meaning that strict product planning and engineering practices are required. This has implications for how the application and APIs are developed and deployed, such as strict version control workflows and implementation of rigorous testing procedures. To meet the requirements for high availability and horizontal scaling, we use a combination of Plumber and OpenCPU, hosted on RStudio Connect and AWS Fargate/ ECS, which cater to the various needs of the development and production environments.

The Glamour of Graphics

Speaker: William Chase

Location: Grand Ballroom B

Category: Visualization

I see a lot of ugly charts. This is to be expected as I work with a lot of academics and data scientists, neither of whom have been trained in how to

design attractive charts. I myself produced many ugly charts during my years as a research scientist, when the design process basically came down to random tweaking until things "looked good". If only I could go back and tell young inexperienced me that there was a better way. In this talk, I will present that better way--a series of design principles that can take any chart from drab to fab. Rather than applying these techniques willy nilly, I will show how they form a layered "Glamour of Graphics" that is structured and can be easily applied to any chart. This Glamour of Graphics has some simple implementations in ggplot, where we will replace geoms, aesthetics, and scales with typography, color, and layout. Finally, I will discuss why looks matter when it comes to charts, and how by following the Glamour of Graphics you can design charts that are more persuasive and more accurately perceived.

RMarkdown Driven Development

Speaker: Emily Riederer

Location: Imperial Ballroom

Category: Workflow

RMarkdown enables analysts to engage with code interactively, embrace literate programming, and rapidly produce a wide variety of high-quality data

products such as documents, emails, dashboards, and websites. However, RMarkdown is less commonly explored and celebrated for the important role it can play in helping R users grow into developers. In this talk, I will provide an overview of RMarkdown Driven Development: a workflow for converting one-off analysis into a well-engineered and well-designed R package with deep empathy for user needs. We will explore how the methodical incorporation of good coding practices such as modularization and testing naturally evolves a single-file RMarkdown into an R project or package. Along the way, we will discuss big-picture questions like “optimal stopping” (why some data products are better left as single files or projects) and concrete details such as the {here} and {testthat} packages which can provide step-change improvements to project sustainability.

Time: 10:53 - 11:15

Don’t repeat yourself, talk to yourself! Repeated reporting in the R universe.

Speaker: Sharla Gelfand

Location: Grand Ballroom A

Category: Communication

If you’re responsible for analyses that need updating or repeating on a semi-regular basis, you might find yourself doing the same work over and over

again. The principle of "don’t repeat yourself" from software engineering motivates us to use functions and packages, the core of repetition in the R universe. For analyses, it can be difficult to know how to use this principle and move beyond "copying and pasting scripts and changing the data file and the object names and updating the dates and results in RMarkdown", especially when there’s some element of human intervention required, whether it be for validating assumptions or cleaning artisanal data. This talk will focus on those next steps, showcasing opportunities to stop repeating yourself and instead anticipate the needs of and communicate effectively with your future self (or the next person with your job!) using project- oriented workflows, clever interactivity, templated analyses, functions, and yes, your own packages.

Development of a web-based clinical decision support application for platelet transfusion management

Speaker: Justin Juskewitch

Location: Plaza Room

Category: Medicine

Development of a web-based clinical decision support application for platelet transfusion management using R and the Tidyverse Blood product

transfusion is a high risk and costly medical procedure. Platelets (blood cells that initiate clotting) are a rare and expensive blood product with a short shelf life. Proper management of platelet transfusions is essential to clinical care, particularly for patients who have developed antibodies against specific platelet types due to pregnancy or past transfusions. By providing platelets that avoid a patient’s known antibodies, improved patient outcomes and better inventory management of a rare blood product are achieved. To address this need, we used R, Tidyverse, and several key packages (Shiny, shinydashboard, dplyr, purrr, httr, officer, flextables, futures) to develop a web-based application (PLTVXM) to help guide platelet inventory selection. PLTVXM queries information on available/pending platelet inventory (and eligible donors) from reports that run in our institutional reporting tool Tableau® via a Tableau Server REST API. Patient antibody and blood type information is securely retrieved from a clinical data lake via an in-house R package (“dart”) and a custom institutional API. The retrieved data is processed by a published algorithm implemented in R and incorporates user input to present sortable tables of patient-specific compatible platelet inventory (and donors) for consideration. The requisite documentation for platelet product reservation or donor recruitment is then autogenerated using institutional form templates. PLTVXM is deployed on an RStudio Connect server which allows seamless integration with our institution’s Active Directory identity management infrastructure. The pilot version of PLTVXM was created by physicians without formal computer programming training in two weeks. After successful demonstration, PLTVXM was approved for clinical validation and future use in our practice. Our experience highlights how R can facilitate creation of dynamic web-based applications for a wide range of business (or clinical) needs.

3D ggplots with rayshader

Speaker: Tyler Morgan-Wall

Location: Grand Ballroom B

Category: Visualization

Learn how a single line of code can transform your data visualizations into stunning 3D using the rayshader package. In this talk, I will show how

you can use rayshader to create beautiful 3D figures and animations to help promote your research and analyses to the public. Find out how to use principles of cinematography to take users on a 3D tour of your data, scripted entirely within R. Leaving the 3D pie charts in the pantry at home, I will discuss how to build interpretable, engaging, and informative plots using all three dimensions.

renv: Project Environments to R

Speaker: Kevin Ushey

Location: Imperial Ballroom

Category: Workflow

The renv package helps you create reproducible environments for your R projects. With renv, you can make your R projects more: Isolated: Installing

a new or updated package for one project won’t break your other projects, and vice versa. Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on. Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go. In this presentation, I'll introduce renv and some of its main workflows.

Time: 11:16 - 11:38

How Rmarkdown changed my life

Speaker: Rob Hyndman

Location: Grand Ballroom A

Category: Communication

Over the last few years, Rmarkdown seems to have taken over my life, or at least my written communication. These days I use Rmarkdown to maintain

my website, write my blog, write textbooks, write academic papers, prepare slides for talks, keep my CV up-to-date, help my students write theses, prepare university policy documents, write letters, prepare exams, write reports for clients, and more. I haven't quite got to the point of using it for shopping lists, but perhaps that's my next Rmarkdown template. I will reflect on the journey in getting to this point, what I've lost and what I've gained. I will also speculate on what might be next in the Rmarkdownification of my life.

Forecasting Platelet Blood Bag Demand to Reduce Inventory Wastage at the Stanford Blood Center

Speaker: Qian Zhao

Location: Plaza Room

Category: Medicine

The Stanford Blood Center collects and distributes blood products to Stanford Hospital. One of these is platelets, a vital clot-forming blood

component with a limited shelf life of a few days. Previous work (Guan et al. , 2017) formulated an optimization problem using features aggregated from the available data to solve the problem of reducing waste. An R package was created for a three-day ordering strategy but has not been put into production due to lack of human trust in modelling accuracy. In summer 2019, the Stanford Data Science for Social Good team, decided to make use of additional patient-level data and models to predict platelet consumption rather than relying solely on aggregated data. Modeling the transfusion recipients into different subpopulations allows for finer-grained predictions on a patient level. We make extensive use of R packages, such as the tidyverse and R Shiny, to conduct exploratory data analysis, build models, and create a user-intuitive dashboard. The Shiny dashboard is designed to display consumption predictions aggregated across all models, consumption predictions for each subpopulation, and historical performance of the model, thereby serving as a valuable tool in building the trust necessary for adopting the algorithmic ordering strategies. Reference Guan, L., Tian, X., et al. (2017). “Big data modeling to predict platelet usage and minimize wastage in a tertiary care system.” PNAS (43) 114: 11368 - 11373. Retrieved from: www.pnas.org/cgi/doi/10.1073/pnas.1714097114

Designing Effective Visualizations

Speaker: Miriah Meyer

Location: Grand Ballroom A

Category: Visualization

RStudio 1.3 Sneak Preview

Speaker: Jonathan McPherson

Location: Imperial Ballroom

Category: Workflow

RStudio 1.3, currently available as a preview release, includes a number of new capabilities that will help you be more productive in R. It's also

more configurable, accessible, and flexible. In this talk, you'll learn to take advantage of these new tools.

Time: 11:39 - 11:59

One R Markdown Document, Fourteen Demos

Speaker: Yihui Xie

Location: Grand Ballroom A

Category: Communication

R Markdown is a document format based on the R language and Markdown to intermingle computing with narratives in the same document. With this simple

format, you can actually do a lot of things. For example, you can generate reports dynamically (no need to cut-and-paste any results because all results can be dynamically generated from R), write papers and books, create websites, and make presentations. In this talk, I'll use a single R Markdown document to give demos of the R packages rmarkdown, bookdown for authoring books (https://bookdown.org), blogdown for creating websites (https://github.com/rstudio/blogdown), rticles for writing journal papers (https://github.com/rstudio/rticles), xaringan for making slides (https:// github.com/yihui/xaringan), flexdashboard for generating dashboards (https://github.com/rstudio/flexdashboard), learnr for tutorials (https:// github.com/rstudio/learnr), rolldown for storytelling (https://github.com/yihui/rolldown), and the integration between Shiny and R Markdown. To make the best use of your time during the presentation, I recommend you to take a look at the rmarkdown website in advance: https://rmarkdown.rstudio.com.

Shiny New Things: Using R to Bridge the Gap in EMR Reporting

Speaker: Brendan Graham

Location: Plaza Room

Category: Medicine

Electronic Medical Records (EMRs) are a treasure trove of information, but tend to fall disappointingly short when it comes to visualizing and

reporting data in a user friendly and intuitive manner. Building reports in an EMR can be a frustrating experience; the developer is at the mercy of how the data is stored within the EMR and the available EMR reporting tools can be bland and uninspiring. But reporting on data in the EMR doesn't have to be this way! Combining the data-rich EMR with R's robust reporting capabilities benefits both developers and consumers of data. This talk will describe how a cross-departmental project team uses an internal R package, RMarkdown reports scheduled via R Studio Connect, and an interactive flexdashboard app to quickly implement solutions to gaps in the reporting capabilities of the EMR. The flexibility of R relative to EMR reporting tools facilitates a design thinking approach to reporting allowing for more user input, customization and quick iteration. Furthermore, the web-based app we developed is able to be embedded within the EMR itself allowing for a more streamlined workflow.

Tidyverse 2019-2020

Speaker: Hadley Wickham

Location: Grand Ballroom B

Category: Visualization

Using Jupyter with RStudio Server Pro

Speaker: Karl Feinauer

Location: Imperial Ballroom

Category: Workflow

This talk is for R admins who want to learn how to set up Jupyter notebooks on RStudio Server Pro. We'll cover prerequisites, basic configuration,

best practices for management, Jupyter Lab, and more.

Time: 12:00 - 13:00

BoF Lunch: Finance

Speaker: NA

Location: NA

Category: NA

BoF Lunch: Insurance

Speaker: NA

Location: NA

Category: NA

Time: 13:00 - 13:21

Best practices for programming with ggplot2

Speaker: Dewey Dunnington

Location: Grand Ballroom A

Category: ggplot2

The ggplot2 package is widely acknowledged as a powerful, dynamic, and easy-to-learn graphics framework when used in an interactive environment. Using

ggplot2 in a package or Shiny app environment adds several constraints which are sometimes circumvented using ggplot2 behaviour that may change in the future. Some best practices include (1) using the `.data` pronoun to refer to the layer data within `aes()` and `vars()` instead of the original variable name, (2) ensuring that `plot()` methods that use ggplot2 explicitly `print()` one or more ggplot objects, (3) defining extension themes that modify a complete theme within ggplot2 (like `theme_gray()`), and (4) testing graphical output using the vdiffr package. Collectively, these practices result in better error messages with unexpected user input and ensure compatibility with most versions of ggplot2, including those to come in the future.

MLOps for R with Azure Machine Learning

Speaker: David Smith

Location: Grand Ballroom B

Category: Modeling

Azure Machine Learning service (Azure ML) is Microsoft’s cloud-based machine learning platform that enables data scientists and their teams to carry

out end-to-end machine learning workflows at scale. With Azure ML's new open-source R SDK and R capabilities, you can take advantage of the platform’s enterprise-grade features to train, tune, manage and deploy R-based machine learning models and applications. In this talk, the attendees will learn how to: *Carry out ML workflows using the authoring experience of their choice, from no-code to code-first options that include Azure ML’s drag-and- drop visual interface for defining workflows and RStudio Server on the Data Science Instance, a hosted VM workstation, for using the Azure ML R SDK from the RStudio browser-based interface. *Use the Azure ML R SDK to manage cloud resources and train, hyperparameter tune, and log and visualize metrics for their models at scale on Azure compute. *Build ML Pipelines in R for defining and orchestrating reusable and reproducible ML workflows. *Deploy, manage, and monitor their R ML models and applications as web services on Azure Container Instance and Azure Kubernetes Service, with an emphasis on robust DevOps and CI/CD for orchestrating and streamlining their end-to-end data science development lifecycle.

Small Team, Big Value: Using R to Design Visualizations

Speaker: Ian Lyttle

Location: Imperial Ballroom

Category: Organizational Thinking

Many R users can feel isolated due to the prevalence of Python or Tableau at their institutions. This talk will focus on how we use R to develop

reference implementations of visualizations (using ggplot2), and to develop corporate-themed color maps (using the colorspace package) to bring value to the entire institution. Color maps can be translated into variety of formats, for Tableau, Qlik Sense, d3, etc., and deployed independently from R. For visualizations, our goal is to translate ggplot2 objects to Vega-Lite specifications, using a package we are developing: ggvega. Vega-Lite visualizations are web-native, and are rendered independently from R. Specifications can be designed to be extensible to new data, allowing them serve as templates, to be deployed and updated for use outside of R. Of course, despite isolation within an institution, our work with the larger R open- source communities provides a foundation on which to build; in fact, we have a lot of company and are having a lot of fun.

Auto-magic package development: Building an R API for building Vega-Lite Specs

Speaker: Alicia Schep

Location: Plaza Room

Category: Programming

Vega-lite is a high-level grammar of interactive graphics implemented in Javascript; it renders interactive visualizations in the browser based on

a JSON specification. In Python and Javascript, the Altair and vega-lite-api packages have demonstrated how the development of APIs to build Vega- Lite graphics can be partially automated based on the Vega-Lite JSON schema, which describes the required format for a Vega-Lite JSON specification. This talk will describe the development of the ‘vlbuildr’ package for building Vega-Lite specifications in R and the ‘vlmetabuildr’ package for building the ‘vlbuildr’ package. The ‘vlbuildr’ package seeks to provide a pipe-friendly, “R-like” functional interface for building up simple to complex specifications for Vega-Lite graphics, which can in turn be rendered as an HtmlWidget by the ‘vegawidget’ R package. Building such an API in a fully automated way from the Vega-Lite schema presents considerable challenges, so the approach taken here was to rely on partial automation. Human judgement dictates the basic contours of the API, such as what groups of functions to include and how various types of building blocks will go together. The part that is automated is filling in many details such as the different variants of a group of functions, the exact parameters needed for each function, and the documentation of those parameters -- the parts that would be extremely tedious to port over!

Time: 13:23 - 13:45

Spruce up your ggplot2 visualizations with formatted text

Speaker: Claus Wilke

Location: Grand Ballroom A

Category: ggplot2

The ggtext package provides various functions to add formatted text to ggplot2 figures, both in the form of plot or axis labels and in the form

of text labels or text boxes inside the plot panel. Text formatting can be achieved through a small subset of markdown, HTML, and CSS directives. Features currently supported include italics, bold, super- and sub-script, as well as changing font size, font family, and color. Basic support for adding images to formatted text is also available.

Totally Tidy Tuning Techniques

Speaker: Max Kuhn

Location: Grand Ballroom B

Category: Modeling

Many models have structural parameters that cannot be directly estimated from the data. These tuning parameters can have a significant effect on

model performance and require some mechanism for finding reasonable values. The tune and workflow packages enable tidymodels users to optimize these parameters using a variety of efficient grid search methods as well as with iterative search techniques (such as Bayesian optimization).

UnicoRns are real

Speaker: Travis Gerke

Location: Imperial Ballroom

Category: Organizational Thinking

Common advice from experienced data scientists to job-seekers is to avoid job postings that describe a "data science unicorn": someone who has

experience performing an unrealistically large array of technical and business-related job duties. Seeking a unicorn is viewed as a potential indicator that the company fails to understand their data science needs, and that new hires will not be poised for success due to lacking support and resources [Robinson & Nolis, 2019]. The R language, particularly when used with RStudio products, has evolved to enable production-level activities in the areas of data wrangling, reporting/dashboarding, database/software engineering, machine learning, and web application development. It is increasingly plausible that a data scientist will be able to efficiently perform a wide variety of job functions with experience only in a single language (R). Indeed, even entry level R users may tread into "unicorn" territory. Current standards for data scientist job descriptions and salaries do not accommodate this nuance, leaving both job-seekers and hiring managers unable to distinguish job requirements which should be read as warning signs from listings which are idyllic matches for the modern R unicorn. In this talk, we present data aggregated from several large compensation analytics companies which summarize current benchmarks for data science job descriptions and corresponding salary ranges. We then suggest job description language to target modern R users, considering both job duty compatibility and job post findability. These descriptions are presented with likely salary range pairings. Attention is given to deviations from traditional degree requirements, years of experience, and demands for multiple programming language literacy which may lack relevance for the R unicorn. Our overarching goal is to provide job description templates which encourage optimal matchmaking between R job seekers and organizations in need of their talents.

Bridging the gap between SQL and R: Introducing queryparser and tidyquery

Speaker: Ian Cook

Location: Plaza Room

Category: Programming

Like it or not, SQL is the closest thing we have to a universal language for working with structured data. Celebrating its 50th birthday in 2020, SQL

today integrates with thousands of applications and has millions of users worldwide. Data analysts using SQL represent a large audience of potential R users motivated to expand their data science skills. But learning R can be frustrating for SQL users. One major frustration is the inability to directly query R data frames with SQL SELECT statements. Eager to use R for tasks that are not possible with SQL (like data visualization and machine learning), these users are dismayed to find that they must first learn an unfamiliar syntax for data manipulation. The popularity of the sqldf package (which automatically exports an R data frame into an embedded database, then runs a SQL query on it) demonstrates this frustration. But now there is a way to directly query an R data frame without moving the data out of R. In this talk, I introduce tidyquery, a new R package that runs SQL queries directly on R data frames. tidyquery is powered by dplyr and by queryparser, a new pure-R, no-dependency SQL query parser.

Time: 13:46 - 14:08

The little package that could: taking visualizations to the next level with the scales package

Speaker: Dana Seidel

Location: Grand Ballroom A

Category: ggplot2

Precise axes, proper data transformation, and informative visual data mappings are critical components to any polished visualization. The scales

package, the unsung hero behind ggplot2’s scale_* infrastructure, includes functions to help any R user manipulate and polish their visualizations. In this presentation, we will explore the functionality of this small but mighty package: demonstrating its functions for polishing guides, e.g. breaks and labels, managing data transformations, and for mapping aesthetic palettes to data.

Neural Networks for Longitudinal Data Analysis

Speaker: Sydeaka Watson

Location: Grand Ballroom B

Category: Modeling

Longitudinal data (or panel data) arise when observations are recorded on the same individuals at multiple points in time. For example, a longitudinal

baseball study might track individual player characteristics (team affiliation, age, height, weight, etc.) and outcomes (batting average, stolen bases, runs, strikeouts, etc.) over multiple seasons, where the number of seasons could vary across players. Neural network frameworks such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) can flexibly accommodate this data structure while preserving and exploiting temporal relationships. In this presentation, we highlight the use of neural networks for longitudinal data analysis with tensorflow and keras in R.

Data Science in Meatspace

Speaker: Ben-Joaquin Gouverneur

Location: Imperial Ballroom

Category: Organizational Thinking

The Data Science community is dominated by folks doing amazing work with data that starts in and never leaves **cyberspace**. This talk is about

best paractices and playbooks for doing data science that involves **meatspace** (the opposite of cyberspace) and why R is such a great language for working with data that originated in the physical world. While the concrete examples in this talk will mostly come from the **manufacturing** space, where I have the most experience, I believe the themes are relevant to many meatspace workflows. We'll talk through effective playbooks that can help you navigate common tasks throughout the life-cycle of a project. We’ll also weave in how R’s glorious package ecosystem, including `tidyverse`, can be combined with other languages like `python`, and with enterprise products like **RStudio Connect** to great effect. Specifically, we'll discuss practices in these areas: * best practices for **data collection** in meatspace * the importance of quantifying **measurement system error** * collecting the correct data for training **computer vision** models * the rarely discussed cost of **maintaining models** in production

List-columns in data.table: Reducing the cognitive and computational burden when working with comple

Speaker: Tyson Barrett

Location: Plaza Room

Category: Programming

The use of list-columns in data frames and tibbles is well documented (e.g. Bryan, 2018), providing a cognitively efficient way to organize results

of complex data (e.g. several statistical models, groupings of text, data summaries, or even graphics) with corresponding data. For example, one can store student information within classrooms, player information within teams, or analyses within groups. This allows the data to be of variable sizes without overly complicating or adding redundancies to the structure of the data. In turn, this can improve the reliability to appropriately analyze the data. Because of its efficiency and speed, being able to use data.table to work with list-columns would be beneficial in many data contexts (e.g. to reduce memory usage in large data sets). Herein, I demonstrate how one can create list-columns in a data table using the by argument in data.table and purrr::map(). I compare the behavior of the data.table approaches to the dplyr::group_nest() function and tidyr::unnest(), two of the several powerful tidyverse nesting and unnesting functions. Results using bench::mark() show the speed and efficiency of using data.table to work with list- columns.

Time: 14:09 - 14:29

Extending your ability to extend ggplot2

Speaker: Thomas Lin Pedersen

Location: Grand Ballroom A

Category: ggplot2

The ggplot2 package continue to be one of the most used frameworks for producing graphics in R. While being extremely flexible, the package itself

can be constrained by the different types of graphic elements and statistic transformations available. Instead of continuing to add new features, the development in recent years have focused on making ggplot2 extensible by other packages, thus distributing development and maintenance. Despite the best of intentions, ggplot2 can feel daunting to extend, due unusual idiosyncrasies, a foreign object system, and a partly obscured rendering model. This talk intend to remove the mystery of extending ggplot2, by describing the basic ways that it can be extended and showcasing a couple of simple extensions that can be build with very little code. Lastly, it will include discussions of some best practices and gotchas that may come in handy when you start out.

Stochastic Block Models with R: Statistically rigerous clusting with rigorous code

Speaker: Nick Strayer

Location: Grand Ballroom B

Category: Modeling

Often a machine learning research project starts with brainstorming, continues to one-off scripts while an idea forms, and finally, a package is

written to disseminate the product. In this talk, I will share my experience rethinking this process by spreading the package writing across the whole process. While there are cognitive overheads involved with setting up a package framework, I will argue that these overheads can serve as a scaffolding for not only good code but robust research practices. The result of this experiment is the SBMR package: a native R package written to fit and investigate the results of Bipartite Stochastic Block Models that forms the backbone of my PhD dissertation. By going over the ups and downs of this process, I hope to leave the audience with inspiration for moving the package writing process closer to the start of their projects and melding research and code more closely to improve both.

Value in Data Science Beyond Models in Production

Speaker: Eduardo Ariño de la Rubia

Location: Imperial Ballroom

Category: Organizational Thinking

ML in production is one of the most obvious ways that data science organizations create value in business. However, these models are at the very end

of a long story of how quantitative research changes and enhances organizations. In this talk I will discuss how I have found DS organization to be truly transformative outside of ML in the loop. Bio: Eduardo Ariño de la Rubia is a DS manager and educator. He loves R and RStudio. He has a Masters in Negotiation, Conflict Resolution and Peacebuilding, which is probably the most useful training he could have received.

Advances in tidyeval

Speaker: Lionel Henry

Location: Plaza Room

Category: Programming

In tidyverse grammars such as dplyr you can refer to the columns in your data frames as if they were objects in the workspace. This syntax is

optimised for interactivity and is a great fit for data analysis, but it makes it harder to write functions and reuse code. In this talk we present some advances in the tidy eval framework that make it easier to program around tidyverse pipelines without having to learn a lot of theory.

Time: 14:45 - 14:50

`livecode`: broadcast your live coding sessions from and to RStudio

Speaker: Colin Rundel

Location: Grand Ballroom A

Category: Lightning Talks

Category: NA

Time: 09:04 - 10:00

Open Source Software for Data Science

Speaker: J.J. Allaire

Location: Grand Ballroom A

Category: Keynote

Time: 10:00 - 11:00

Data, visualization, and designing with AI

Speaker: Fernanda Viegas, Martin Wattenberg

Location: Grand Ballroom A

Category: Keynote

Time: 11:30 - 11:51

Case Studies in Customer Success

Speaker: Katie Masiello

Location: Imperial Ballroom

Category: Case Study

Meet You Where You R

Speaker: Lauren Chadwick

Location: Grand Ballroom A

Category: Education

Deploying End-To-End Data Science with Shiny, Plumber, and Pins

Speaker: Alex Gold

Location: Grand Ballroom B

Category: Production

Simplified Data Quality Monitoring of Dynamic Longitudinal Data: A Functional Programming Approach

Speaker: Jacqueline Gutman

Location: Plaza Room

Category: Programming

Time: 11:53 - 12:15

How Vibrant Emotional Health Connected Siloed Data Sources and Streamlined Reporting Using R

Speaker: Sean Murphy

Location: Imperial Ballroom

Category: Case Study

Data Science Education in 2022

Speaker: Carl Howe, Greg Wilson

Location: Grand Ballroom A

Category: Education

We’re hitting R a million times a day so we made a talk about it

Speaker: Heather Nolis, Jacqueline Nolis

Location: Grand Ballroom B

Category: Production

vctrs: Creating custom vector classes with the vctrs package

Speaker: Jesse Sadler

Location: Plaza Room

Category: Programming

Time: 12:16 - 12:38

Building a new data science pipeline for the FT with RStudio Connect

Speaker: George Kastrinakis

Location: Imperial Ballroom

Category: Case Study

Data science education as an economic and public health intervention in East Baltimore

Speaker: Jeff Leek

Location: Grand Ballroom A

Category: Education

Growth Hacking with R - Product Analytics at Scale using R and RStudio

Speaker: Andrew Mangano

Location: Grand Ballroom B

Category: Production

Asynchronous programming in R

Speaker: Winston Chang

Location: Plaza Room

Category: Programming

Time: 12:39 - 12:59

How to win an AI Hackathon, without using AI

Speaker: Colin Gillespie

Location: Imperial Ballroom

Category: Case Study

Of Teacups, Giraffes, & R Markdown

Speaker: Desiree De Leon

Location: Grand Ballroom A

Category: Education

Practical Plumber Patterns

Speaker: James Blair

Location: Grand Ballroom B