The goal of percentify is to create virtual groups on top of a tibble or grouped_df to allow calculation within percentile ranges of a variable on the whole dataset. You can then efficiently perform various dplyr operations on this resampled_df, like: summarise(), do() and group_map().

Installation

You can install the developmental version of percentify from Github with:

Example

Imagine we want to do some summary statistics at the different percentile ranges of price in diamonds. We start by using percentify_cut to created a percentiled_df on price with splits at 20%, 60%, 80%, 90% and 95%.

library(ggplot2)
library(dplyr)
library(percentify)

We can then use this grouped data.frame with summarise to calculate statistics within each range.

Using collect from dplyr will materialize the groups so they can be used for plotting or other calculations.

diamonds_price %>%
  collect() %>%
  ggplot(aes(x, fill = .percentile_price)) +
  geom_histogram(bins = 100)

PLotting function

The resulting grouped data.frame have ggplot2::autoplot() methods to vizualize the the percentile ranges.

percentify_random(diamonds, price, 0.2, 25) %>%
  autoplot()

Inspiration

The underlying code for this package is inspired by the work done by Davis Vaughan in strapgod.

Code of Conduct

Please note that the ‘quansum’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.