Skip to contents

Introduction

Survey data frequently faces the issue of dropout — situations where participants leave sections of the survey incomplete. Effectively managing dropouts is key to preserving data quality and gaining a deeper understanding of participants’ response patterns. The dropout package addresses this challenge by providing tools to analyze and interpret participant behavior throughout the survey process.

Use cases of the dropout package

  • Identifying the specific survey points where participants tend to stop completing the survey.
  • Detecting sections that are frequently skipped by respondents.
  • Quantifying the extent and locations of dropouts within the survey.
  • Estimating the proportion of missing values attributed to dropouts in each column.
  • Profiling respondents who discontinued the survey and pinpointing their dropout points.
library(dropout)
#> dropout package (v2.2.1) includes significant updates to the codebase, aimed at reducing unexpected behavior and minimizing dependencies.
#> If these changes cause issues with your existing code, you can access a previous version of the package from the archive.
#> For more information, visit:
#> https://github.com/hendr1km/dropout

Quantifying Dropout with drop_summary

The drop_summary function provides an overview of where and to what extent participants tend to stop answering questions. It highlights patterns of missing values, such as whether participants are skipping specific questions or entire sections of the survey.

drop_summary(flying)
#>                         column drop sec_na sec_length single_na  na complete
#> 1                respondent_id    0      0          0         0   0     1.00
#> 2             travel_frequency    0      0          0         0   0     1.00
#> 3                 seat_recline   18    164         20         0 182     0.82
#> 4                       height    0    164          0        12 194     0.81
#> 5            children_under_18    1    164          0         6 189     0.82
#> 6                 two_armrests    1    164          0         0 184     0.82
#> 7               middle_armrest    0    164          0         0 184     0.82
#> 8                 window_shade    0    164          0         0 184     0.82
#> 9        moving_to_unsold_seat    1    164          0         0 185     0.82
#> 10         talking_to_seatmate    0    164          0         0 185     0.82
#> 11 getting_up_on_6_hour_flight    0    164          0         0 185     0.82
#> 12 obligation_to_reclined_seat    1    164          0         0 186     0.82
#> 13       recline_seat_rudeness    0    164          0         0 186     0.82
#> 14   eliminate_reclining_seats    0    164          0         0 186     0.82
#> 15          switch_for_friends    4    164          0         0 190     0.82
#> 16           switch_for_family    0    164          0         0 190     0.82
#> 17     wake_passenger_bathroom    0    164          0         0 190     0.82
#> 18         wake_passenger_walk    0    164          0         0 190     0.82
#> 19               baby_on_plane    1    164          0         0 191     0.82
#> 20             unruly_children    0    164          0         0 191     0.82
#> 21       electronics_violation    0    164          0         0 191     0.82
#> 22           smoking_violation    0    164          0         0 191     0.82
#> 23                      gender    6      0          0         0  33     0.97
#> 24                         age    0      0          0         0  33     0.97
#> 25            household_income    0      4          2       177 214     0.79
#> 26                   education    0      4          0         2  39     0.96
#> 27      location_census_region    9      0          0         0  42     0.96

Detecting Specific Dropouts with drop_detect

For a more detailed analysis, the drop_detect function identifies individual participants who dropped out of the survey. It returns the index of the participant and the column where the dropout occurred, helping you focus on the critical dropout points.

drop_detect(flying) |>
  head()
#>    drop drop_index       column
#> 1  TRUE          3 seat_recline
#> 2 FALSE         NA         <NA>
#> 3 FALSE         NA         <NA>
#> 4 FALSE         NA         <NA>
#> 5 FALSE         NA         <NA>
#> 6 FALSE         NA         <NA>

Data Cleaning Based on Dropout Information

With the output from drop_detect, you can refine your data by filtering participants. For instance, you may choose to retain only those who completed most of the survey or analyze patterns of early dropout for further insights.