This is a lesson on tidying/untidying data, remixed from Jenny Bryan’s similar lesson using “Lord of the Rings” data. Most text + code is Jenny’s, basically we plopped a new dataset in there 😉


Enough about tidy data. How do I make it messy?

Regardless of which gathering adventure you embarked upon, we’ll all use the Bachelor/Bachelorette data from 538 to practice spreading.

Explore

What are the possible values in the new eliminated column? (hint: distinct possible values)

#> # A tibble: 8 x 1
#>   eliminated
#>   <chr>     
#> 1 R1        
#> 2 E         
#> 3 EQ        
#> 4 EF        
#> 5 EU        
#> 6 R         
#> 7 ED        
#> 8 W

What do those mean? Here is a key:

  • “E” connotes a standard elimination, typically at a rose ceremony.
  • “EQ” means the contestant quits.
  • “EF” means the contestant was fired by production.
  • “ED” connotes a date elimination.
  • “EU” connotes an unscheduled elimination, one that takes place at a time outside of a date or rose ceremony.
  • “R” means the contestant received a rose.
  • “R1” means the contestant got a first impression rose.
  • “W” means the contestant won.

Count

Using that tidy data, count the values in your new eliminated column by contestant name and show.

#> # A tibble: 1,128 x 4
#>    CONTESTANT     SHOW         eliminated     n
#>    <chr>          <chr>        <chr>      <int>
#>  1 01_ALEXA_X     Bachelor     E              1
#>  2 01_AMANDA_M    Bachelor     W              1
#>  3 01_AMBER_X     Bachelor     E              1
#>  4 01_AMY_X       Bachelor     E              1
#>  5 01_ANGELA_X    Bachelor     E              1
#>  6 01_ANGELIQUE_X Bachelor     E              1
#>  7 01_BILLY_X     Bachelorette E              1
#>  8 01_BOB_G       Bachelorette E              1
#>  9 01_BRIAN_C     Bachelorette E              1
#> 10 01_BRIAN_H     Bachelorette EQ             1
#> # ... with 1,118 more rows

Again, we can squint hard at this 1128 row tibble, but if we want to look at numbers like this:

We need to reshape this data (again).

Spread

Let’s spread that counted data, so that we get a column for each possible value in the eliminated column, and those columns hold the values in the n column. Set fill to 0.

Plot

Make a facetted bar plot with this data to show how many winning/losing contestants (hint: geom_col() might be a good choice here) in each show got first impression roses.

Creative Commons License