12.0 Household Survey Data Expansion and Analysis
Note: Significant components of this chapter come from Stopher et al.’s (2008) NCHRP Report Chapter 2 and Tierney et al.’s FHWA Manual Introduction (Cambridge Systematics 1994) Chapter 6. Other key contributor is Steve Ruegg.
The section summary page identifies three editing and cleaning tasks that can be completed for any survey effort, including a household travel/ activity survey. Once coded data are entered into the preliminary survey database, the survey team will need to spend a great deal of effort verifying the validity of the entered data, and editing the data, as necessary. Richardson, Ampt, and Meyburg summarize the data editing phase of the household travel/activity survey process, as follows:
“The editing phase of the survey process is perhaps the most boring, but it is also one of the most important tasks. Most survey designers would admit that more time and effort goes into the editing task than almost any of the other tasks; and such effort is worthwhile. It is useless to proceed straight into analysis hoping that the data are free from error; there will always be errors in the data as initially coded” (Richardson, Ampt, and Meyburg, 1995).
12.1 Data Cleaning Tasks
Three simple types of errors are likely to be present in the raw survey dataset:
- Incomplete records;
- Invalid field entries; and
- Inconsistent field entries.
Simple visual inspection of the data file will reveal if records are of variable lengths. If a record is found to have the wrong length, then it has either been miscoded or has been entered incorrectly. The survey team should locate the original data collection instrument, and use it to correct the record.
Next, the survey team should compare all data field entries with the range of acceptable codes for the field, taken from the final code-book. The data collection instruments for records with any fields that are out-of-range should be located, and the records should be adjusted.
Finally, the raw data should be examined by comparing related fields to ensure that they consistent with each other. Richardson, Ampt, and Meyburg recommend the following consistency checks (Richardson, Ampt, and Meyburg, 1995):
- If the last diary trip of the day is not to home, check the data collection instrument;
- If a trip’s origin and destination are the same place, check the data collection instrument;
- Compare place types at origin-and-destination to origin-and-destination activities;
- Calculate implied trip speeds, and check records with very high or very low speeds;
- Check records with unusually long trip times;
- Check records with unusually long walk trips;
- Compare age and possession of a driver’s license;
- All drivers should have licenses;
- Compare age and employment status, school status, retired status;
- Check vehicle, make, model, and body type consistency;
- Check to make sure that the number of diaries matches the number of household members; and
- Check to make sure that vehicle information has been obtained for all vehicles.
If the travel survey team has used a CATI system that has been programmed well, the CATI survey data should not have any of these data cleaning problems listed above. However, if the system failed to prevent an illogical response, the record may need to be deleted, or the respondent household may have to be re-contacted for clarification.
On the other hand, mail survey and telephone PAPI instruments are liable to have a number of data records that need to be verified and edited. Fortunately, the original data collection instruments are likely to be available for review.
12.2 Validation of Survey Responses
Survey teams should seek ways to test the validity of the collected survey data. For interviews, the most simple validation technique is to re-contact a small number of respondents to verify that they completed the survey, and that they provided certain answers in the database. In addition to allowing the survey team to check-up on interviewers, this approach is used to test the reliability of questions (whether they receive the same answers from the same respondent over time). This type of validation can also be used to a limited extent with mail surveys, provided that respondents have listed telephone numbers, or that the mail survey asks respondents for their phone numbers.
Both mail and telephone surveys can be validated with aggregate third-party data from other survey efforts, the Census, or the agency’s databases. For instance, demographic distributions should be compared to Census distributions. In addition, preliminary trip-rate information could be compared with existing model estimates.
12.3 Corrections for Non-Response and Response Errors
12.3.1 Strategies for Dealing with Item Non-Response
In every household travel/activity survey, some respondents will be unwilling or unable to answer all the questions they are asked. In particular, respondents often refuse to answer income questions. If the survey team intends to use income data in later analyses, they need to do one of the following:
In the first strategy, the survey team develops travel models and other analyses with the data records that are complete. These records with missing data are not used in the analyses. This strategy is the most straightforward approach for getting into the analysis, and for obtaining results, but the resulting analyses are hampered by higher levels of imprecision (due to smaller numbers of input data records) and potential non-response bias. If the incomplete records are treated as non-responses, and the survey team has high-quality socioeconomic demographic data, (such as Census data) the bias may be reduced somewhat through the application of correction weights. This correction is described in the next section.
A second approach for dealing with missing data fields is to develop models and analyses using one or more parameters that indicate whether a record is missing certain data items. These parameters are used to explain the non-random nature of item non-response and limit the effect of the bias, but they can be problematic when the estimated models are applied.
Suppose a multiple regression trip generation model were developed from a household travel/activity survey, and that a dummy variable for missing income information (equals one if income data are missing; zero if income data are present) is included in the model estimation. If the coefficient on the summary variable is positive, and significantly different from zero, the model indicates that income non-respondents travel more than income respondents (all other things being equal). The model parameter identifies and quantifies the relationship between income non-response and travel levels. The problem is that when the regression equation is applied to the population, it is necessary to predict how many and which members of the population would have answered the income question if they had been asked it.
Two methods can be used to apply such models. First, the parameter can simply be ignored during application, and be used only in estimation. Alternatively, a separate regression, discrete choice, or simulation model could be constructed to determine whether a household is likely to respond to the particular data item.
Note that the first two strategies for dealing with item non-response do not require the survey team to change the survey database in any way, except for adding a few simple dummy variables. In the third approach, missing data are actually estimated, and input into the database.
To limit the effects of item non-response, European and Australian travel survey teams commonly develop statistical relationships between different survey variables, which can then be used to predict the value of a particular missing item. These methods are also beginning to gain acceptance to some degree in the U.S. (Bhat, 1994)
In its most simple form, data imputation involves using completed survey records to develop mathematical models that try to predict the missing components of the other survey records. For instance, if a household travel survey consists of 1,200 records with valid responses for questions about household income and all other household variables, and 100 records with valid responses for all the variables except income, the survey team could use the 1,200 records to estimate a regression, cross-classification, or discrete choice model that relates income to the other variables, and then apply the estimated model to the 100 other records to predict their income levels. The survey records for these 100 records could then be edited to include the imputed estimates.
The problem with this simple imputation approach is that it fails to recognize that the factors that lead respondents to fail to report data items may be the same factors that are being used as explanatory variables in the imputation model. To address this, Bhat suggests estimating two simultaneous models – one model seeks to explain the reporting/non-responding phenomenon and the other develops the relationship between the variable of interest and other variables – using maximum likelihood estimation techniques.
Some travel surveyors have raised concerns that imputing data can increase bias in some cases. To some analysts, imputing survey data is the same as simply making up data. However, the recent non-response workshop at the TRB Conference On Household Travel Surveys: New Concepts and Research Needs, endorsed the use of imputation techniques, provided that they were done with care and were well-documented. All changes and corrections to the original survey data need to be noted, and where possible, the effects of these changes should be evaluated.
12.3.2 Strategies for Addressing Response Errors
In general, it is quite difficult for survey teams to identify response errors. If a respondent answers a question incorrectly for some reason, the survey team will almost never be able to determine it. However, for com-plicated multi-part questions, such as travel and activity diaries, response errors (primarily in the form of incomplete information) are sometimes discernible and correctable.
Travel and activity diaries, like those shown in Figures 12.4 through 12.11, often request a great deal of information on single activities or trips. It is easy (and quite common) for respondents to miss certain questions under these conditions. Fortunately, it is often possible for survey editors to determine what the respondent should have filled-in. For instance, probably the biggest single response error on travel diaries is for respondents to forget to mark whether a given time was a.m. or p.m.
Clever editing staff can often determine trip purpose, activities and modes (among other things) in incomplete diary entries by examining other trips or activities of the respondent or the respondent’s fellow household members. Although this detective work is generally slow, and tedious, the rewards, in terms of cleaner data for analyses, are significant. It is recommended that survey teams that field diary surveys budget significant resources for such efforts.
If a diary cleaning effort is undertaken, it is imperative that editors:
A special type of diary response error is the unreported trip (or activity). It is very important that the survey team seek to minimize these errors, because one of the most common survey analyses is the calculation of average household trip rates, and because there is evidence that unreported trips are not representative of the population of total trips.
Without other information about respondents’ trips, or activities it is usually very difficult to impute an entire trip, along with its details.
However, there are ways to detect missing trips, including:
- Are the trip ends or activity locations listed in the diaries linked properly? Does trip number N start where trip number N-1 ended?
- Do other household members report being with the respondent at a time or place not reported by the respondent?
- Do all trips back to a respondent’s home seem to be reported? Return home trips are the most commonly omitted trips in diaries.
- Does the respondent make it back home by the end of the period? Not all respondents finish the diary period at their homes, but the vast majority do.
- Does the respondent report the expected amount of time and number of instances of particular activities, such as meals, sleep, etc.?
If the data retrieval is completed with CATI techniques, these or other tests can prompt interviewers to probe for further details. When mail surveys or PAPI retrieval calls are used, the survey team should consider validation/verification contacts, and perhaps adjustments to trip rate calculation procedures.
12.4 Programming and Compiling Data
The key issues related to programming and compiling data for household travel/activity surveys are briefly discussed in this section. The programming and data compiling activities are directly related both to the objectives of the survey analysis and to the design of the survey which is also guided in part by the ultimate survey analysis objectives.
12.4.1 Determining the Adequacy of Responses
Once individual records have been cleaned and edited to the maximum extent possible, they should be evaluated for completeness and usability for analysis. The survey team should examine the database, and drop any data records that:
Have been developed from out-of-scope individuals or households that may have been surveyed by mistake;
- Exhibit serious basic flaws in consistency and logic; or
- Are too incomplete to be useful in the anticipated analysis.
Sometimes, despite efforts to the contrary, survey responses are obtained from people who live outside the study area or for people who should have been screened out of the survey process. Before conducting any analysis, the records for these individuals should be dropped from the database. These records should not be considered in the calculation of response rates, nor should they be included in any tabulations.
The second type of data record that should be removed from the database are those that contain numerous inconsistencies and illogical information. Despite efforts to edit and clean the data, some records will still contain basic flaws that will make them unusable. If attempts to verify or correct response problems are unsuccessful, and the problems are felt to cover several data fields or particularly important pieces of information, the corresponding data record should be marked unusable and deleted from the analysis dataset. In this case, the record would be counted in the same way as a refusal in the calculation of the final survey response rate.
In household travel/activity surveys in which all household members provide information, it is common to be unable to get the requested information from one or a few individuals. Usually, repeated efforts are made to track down these elusive respondents with follow-up calls and mailings, but there will almost always be a few people who cannot be contacted or who refuse to participate. In these cases, it is important that the survey team have a clear description of what will be considered a “completed household”. For some transportation planning analyses, it is essential that data be collected for all household members. For other analyses, planners can work around having some incomplete data by weighting or by estimating key parameters for the missing individuals.
As discussed in Section 12.3, the survey team should determine the definition of a completed record prior to any field work, because the definition will affect the survey follow-up strategy. It is quite easy to spend hundreds of dollars trying to complete surveys on households with elusive members. The survey team should consider at what point it is more cost-effective to give up on a particular household, and find a different one, so the definition of an acceptable response is important.
If a household data record is unacceptably incomplete, according to whatever standard definition the survey team sets, then the record should be marked unusable and dropped from the dataset. The record would be counted the same way as a refusal in the calculation of the final survey response rate.
12.4.2 Database Structure
Household travel/activity surveys can differ considerably with respect to the level of detailed travel information that is collected in the survey, the focus of the survey on the travel behavior of the household as a whole versus its individual members, or the broader analysis framework that may require an activity-based rather than the more traditional trip-specific approach.
The database for a household travel survey could therefore be developed to accommodate a hierarchical structure which can include up to four layers nested within each other. For the most detailed household travel/activity survey, the corresponding units of analysis at each layer would thus be:
- The household treated as a whole;
- Each surveyed member of the household treated individually;
- Each trip made by the household members that were surveyed; and
- Each activity carried out as part of a specific trip.
The original raw database for the household travel/activity survey would include all the household-related information and would thus consist of as many records as there are usable responses. However, such a database could also be used as a basis to create three additional databases that could be used to support the analysis of individual travelers’ trip making, the analysis of specific trips, or the analysis of activities undertaken by the respondents. As a result, each of the four datasets that would be created would have all the information necessary for transportation analysis and modeling.
The contents of each dataset can be distinguished between information that is unique to each layer of analysis and information that is common to two or more hierarchical layers. Information that uniquely characterizes each household such as income and household size, location of residence, automobile ownership, and household trips rates would be linked to or included in each person, trip or activity data record. Similarly, information that uniquely characterizes individual members of the household such as age, sex, occupation, education, location of workplace, and auto availability would be linked to or be repeated in each trip and activity record generated by the respondent. Finally, if activity-based design is being used, trip-specific information on mode choice, total cost, travel time, and distance traveled would be linked to or be repeated in the record of each activity that was part of the same trip.
Although such a database structure results in a set of four internally-consistent datasets from the household survey, it does not always represent the most efficient way of storing information. This is particularly true in the case of extensive surveys with large sample sizes. In such cases, a relational database structure could be used instead of the four-file structure to reduce the amount of overlap and the repetitiveness of information across the different layers.
12.4.3 Expansion of Survey Results
The objective of data expansion is to make it possible to reach valid conclusions about the entire study population based on the survey results. Data expansion for simple random samples is straightforward. Suppose a sample of 100 households is drawn from a population of 65,000 households, and that 12 households of the 100 are found not to have any automobiles available to them. We can expand the survey results to say that there are 65,000 x 12/100=7,800 households in the study area that do not have an available auto. Unfortunately, the expansion of household travel and activity survey data is complicated by two factors:
- Household survey sampling is generally performed using stratified random sampling procedures, rather than simple random sample procedures; and
- Invariably, because of random sampling error and various survey biases, such as nonresponse, interviewer error, etc., the actual survey sample will not be totally representative of the survey population in terms of the variables that explain travel behavior.
Expanding the household survey data set, given these limitations, is discussed below.
12.4.3.1 Using the U.S. Census Data for Survey Expansion
Fortunately, in most cases, the survey team has an excellent source of information on study area respondents from which the household survey data may be expanded – The U.S. Census data. The Census summary tape files provide the survey team with detailed socioeconomic and demographic summary information for small geographic areas. In addition, detailed cross-tabulations of key travel-related data are available from the Census Transportation Planning Package.
These and other Census products are described in Appendix B of this manual.
If the household travel/activity survey is performed more than a few years after the Census, the travel survey team may need to consider ways to update the Census data to more accurately expand the household survey. Some potential strategies for using the Census data for expansion purposes in non-census years include:
Accept the last Census as the most accurate information source and use the data without adjustment;
- Make ad-hoc adjustments to the Census data based on available Census, state, and local estimates and, perhaps, on econometric models developed from Census PUMS data; and
- Use one of the first two options temporarily, and then re-expand the data with the next Census data or with interpolated estimates based on the two Censuses.
12.4.3.2 Household Characteristics Used in Survey Expansion
The survey team should expand the household survey data so that the demographic characteristics that best describe variations in key travel behaviors are made to match those of the study area population. The survey team should select the characteristics based on the expected uses of the survey results and on the availability of data on the survey population.
For most household travel and activity surveys, a key analysis will be the determination of household trip rates and trip generation. Key expansion variables that are commonly used in this situation include:
Geographic location (such as super district);
Household size;
Number of vehicles available per household;
Number of workers per household;
Type of housing unit; and
Household income.
The first three variables are probably the most common in the U.S., but for highly specific analyses, the survey team should consider other appropriate variables. Expansion variables are typically categorized into a small number of categories for expansion. The variables used for expansion can be limited to those used for stratifying the sample (assuring a stratified sample) or include other variables, as well.
12.4.3.3 Data Expansion with a Single Control Variable
When a survey team is seeking to use a single variable for data expansion, the process is not complicated. The expansion factors are just the actual number of people in each data category for the population, divided by the number derived from the survey. Suppose a household travel survey for a region obtains the geographic distribution of households shown in Table 12.1.
Table 12.1 Geographic Distribution of Household Survey Responses for an Example Survey
Subregion | CBD | NE | E | SE | S | SW | W | NW | N | Total |
Responses | 115 | 114 | 111 | 97 | 108 | 113 | 106 | 120 | 116 | 1,000 |
The relatively equal distribution across subregions is consistent with a survey effort that was stratified on the basis of geography. Table 12.2 shows the actual number of households in each subregion based on Census data.
Table 12.2 Actual Distribution of Households for the Example
Subregion | CBD | NE | E | SE | S | SW | W | NW | N | Total |
Responses | 1,200 | 3,800 | 4,400 | 9,000 | 8,500 | 5,200 | 11,400 | 10,300 | 10,400 | 64,200 |
The expansion factors for the household survey are shown in Table 12.3.
Table 12.3 Calculated Expansion Factors for the Example
Subregion | CBD | NE | E | SE | S | SW | W | NW | N |
Responses | 10.4 | 33.3 | 39.6 | 92.8 | 78.7 | 46.0 | 107.5 | 85.8 | 89.7 |
This means that each of the 115 survey records for households in the CBD is equivalent to 10.4 actual households (1,200/115=10.4). Each of the 116 survey records from the North Subregion represents 89.7 households (10,400/116).
This simple expansion process may be applied on two or more variables, if the expansion data provide cross-tabulations. Suppose the survey team wants to expand the survey data based on both household size and number of vehicles available per household. As long as this cross-tabulation is available for the population, from the Census or another source, the process is the same as shown above. Tables 12.4, 12.5 and 12.6 illustrate an example of this expansion.
Note that the calculations are the same as for the previous example, except that the two categories representing one-person households with more than one vehicle are combined. It is generally a good idea to avoid expanding variable categories with either a very small number of responses or with a small actual population because very high or very low expansion factors may result in skewed analyses.
12.4.4 Data Expansion with Multiple Control Variables
Often the survey team faces a situation where it would be advantageous to expand the survey data using two or more variables for which there are no cross-tabulations for the population. This may occur when the Census Bureau does not publish such a cross-tabulation or when the Census cross-tabulation is not yet published. Census Summary Tape File data have become available two to three years before the Census Transportation Planning Package cross-tabulations.
In this situation, survey teams generally rely on the marginal variable totals for controls, and use an iterative method to develop the specific expansion factors. As an example, suppose the survey team wants to develop control totals based on both single variable controls, shown above, but for some reason cross-tabulations are not available. Table 12.7 shows the household survey cross-tabulation of the variables of interest.
Table 12.8 shows the information available for use as control totals. In this situation, the survey team develops the expansion factors iteratively by first developing expansion factors based on the row control totals, then by adjusting these preliminary factors to match column control totals, and then repeating the process until the expansion factors produce a reasonably accurate representation of both the row and column control totals. This iterative procedure, commonly referred to as iterative proportional fitting or the Furness Method, is the same process that is used in the Fratar trip distribution model.
Table 12.4 Household Size and Vehicles Available Responses for the Example Survey

Table 12.5 Actual Distribution of Household Size and Vehicles Available for the Example

Table 12.6 Calculated Expansion Factors

Table 12.7 Crosstabulation of Subregion and Household Size/Vehicles for an Example Survey

Table 12.8 Actual Distribution of Households by Subregion and Household Size/Vehicles for the Example

12.4.5 Data Expansion Procedures for Non-reported Trips and Trips by Nonrespondents
Because the calculation of trip generation rates is often an important use of household survey data, many survey teams perform data expansion exercises to incorporate into the survey database the likely number of trips that were missed in diaries that were not returned. To include trips by household members that failed to complete the diary, survey teams have used completed diary data to develop person-based trip generation models that relate socioeconomic and household relationship variables to the number of trips being made. These trip generation models are then applied to the individuals who failed to complete the diary so that the household’s total trip generation may be approximated.
Another potential survey data expansion option for some household travel and activity surveys is to use survey follow-up data to estimate household trip rates for non-respondent households.
Limited research has shown that the travel and socioeconomic characteristics of respondents that reply to each wave of follow-ups are successively closer to the characteristics of those who never respond (Wermuth, 1985, Richardson; Ampt & Meyburg, 1995; and Brög & Meyburg, 1982). By tracking the characteristics of respondents at each follow-up stage, and extrapolating, the survey team can estimate the characteristics of non-respondents. Figure 12.1 illustrates the extrapolation process that Wermuth, Richardson, Ampt, and Meyburg advocate based on empirical survey data from Germany and Australia. The estimates for non-respondents can be used to adjust the survey estimates or to simply determine whether non-response bias is likely to be present for a survey effort.

Figure 12.1 An Example of Using Survey Follow-Up to Estimate the Characteristics of Non-respondents
12.4.6 Data Expansion with Choice-Based Samples
It is becoming a common practice to supplement the household survey’s random sample or stratified random sample with a smaller choice-based sample to increase survey representation of certain types of households, people, or trips. The most common form of choice-based sampling in household travel and activity surveys is to recruit a certain number of households who are known to have transit riders. This is usually done to improve the mode choice model estimation process.
Survey teams need to remember that samples of this type need to be expanded separately and differently from the random sample or stratified random sample. Although the trip or activity records from these households may be used in conjunction with the trip or activity records of the other households for disaggregate analyses, such as multi-nominal logit mode choice models, they usually cannot be expanded to the general survey population because: 1) their sampling frame includes an undefined portion of the study area population, and in some cases, 2) the samples are not probability-based. The choice-based records can only be used for population expansion if:
The choice-based or targeted sample has been drawn from a sampling frame that describes a defined population using a probability based sampling method;
The defined population is able to act as a separate sample stratum for the larger survey effort, and the members of this stratum can be identified in the larger sampling frame; and
Information is available for defining the total size of the defined population and for setting control totals.
These conditions are rarely met in practice, but it is important to note that choice-based data are often collected for disaggregate mode choice model development. Some of the most common analytical procedures to develop these models, such as multinomial logit modeling, are better performed on data that are unexpanded.
A number of excellent discussions of the practical aspects of household travel and activity, survey data expansion can be found in the transportation literature (Harrington & Wang, 1995; Stopher & Stecher, 1993; and Kim, Rodman, Sen, Soot, & Christopher, 1993).
12.4.7 Summarizing Survey Results
Although this manual does not discuss the presentation or analysis of survey results, three procedures for summarizing household survey results provide important diagnostic information about the survey effort that can be used in later analyses. It is recommended that all household travel and activity survey teams:
- Perform a detailed set of survey data tabulations; and
- Calculate actual precision levels for key survey variables; and
- Formally calculate and report the survey response rate.
These procedures are discussed below.
12.4.7.1 Survey Data Tabulations
Tabulation of the survey data is a very important, although sometimes overlooked, aspect of the survey analysis. Its value is even more pronounced given the small amount of effort that is required to specify the analyses and produce the summary statistics for a preliminary analysis. A careful review and interpretation of the preliminary one-way and cross-tabulations could be instrumental in:
Finding and correcting errors in respondents’ answers, as well as errors due to coding and programming;
- Making reasonableness checks for variables included in the survey;
- Identifying survey responses with extreme values and determining whether the response to a particular question should be treated as an outlier or whether the data are unreliable, in which case the data record should be dropped from the analysis;
- Obtaining a fairly accurate picture of the distributions for different variables of interest and identifying differences due to geographic, socioeconomic, or choice behavior factors; and
- Uncovering response patterns that may be useful for subsequent more detailed analyses.
To accomplish these objectives, a set of preliminary survey data tabulations need to be specified. These would include both frequency analyses of the survey variables (also referred to as one-way tables) and cross-tabulations (also referred to as two-way tables) that relate the frequency of a particular variable to other variables of interest. For example, a frequency analysis would provide us with the total number of households with no, one, or two or more vehicles, while a cross-tabulation would further provide the same information on automobile ownership in the study area broken down by county.
For each continuous variable, the distribution of variable values in the sample is obtained by measures of the mean, median, and standard deviation. For categorical (discrete) variables and for continuous variables that can be easily grouped into different categories (e.g., 10-minute categories of travel time), the distribution in the sample can be assessed by examining the frequency of values in each variable category both in absolute and percentage terms. For example, the mean household trip rate in the study area, the distribution of automobile ownership, and the share of transit could provide some useful preliminary information on existing travel patterns in the area.
A preliminary assessment of relationships among variables of interest can be obtained by cross-tabulating each variable of interest to a variety of geographical, socioeconomic, or choice-based variables. These tables can offer a means of checking the reasonableness of existing differences by market segment and can also provide the rationale for further more detailed types of analysis. For example, differences in the mean household trip rate, automobile ownership, and transit share by county or by household income can help validate the existing data and can provide insights into the factors affecting trip making and mode choice behavior.
12.4.7.2 Calculation of Precision Levels for Key Survey Variables
Prior to conducting the survey, the survey team will have made an estimate of the required sample size to achieve some pre-determined levels of precision and confidence for important survey variables. Once the data collection is complete, the survey team can use the relationships presented in Chapter 5.0 to determine the actual precision of the survey estimates.
The degree of precision for any variable is dependent on its variance and mean a pre-determined desired statistical confidence level, and the sample size and population size. If the actual values of these parameters are different than the pre-survey estimates, the precision level for the variable will also be different than expected. The survey team should report the degree of precision on all key survey variables and survey-derived parameters.
12.4.7.3 Response Rate Report
As discussed, a household survey’s response rate is a basic measure of the quality of the data collection process. An unusually low response rate is an indication to users of the data that any analyses they conduct could be biased to a greater extent than they are accustomed. A strong response rate is an indication that the input data to their analyses are more likely to be accurate. However, before any conclusions can be drawn from response rate information, it is important to understand exactly how the rate has been calculated. Unfortunately, despite the fact that the term “response rate”, has a very specific technical definition, it is frequently used carelessly and incorrectly. “Response rate” has come to mean many things to many people.
This problem is not restricted to household surveys or travel surveys. Misuse of the term has been, and continues to be, common in many fields. In the early 1980s, the Council of American Survey Research Organizations (CASRO) commissioned a blue-ribbon committee to establish standardized definitions of survey response rate and completion rates.
The basic definition of response rate is (CASRO, 1982):
Response Rate = | Number of Completed Interviews with Reporting Units |
| Number of Eligible Reporting Units in the Sample |
There is an interpretation of this basic definition for each type of household survey effort. However, it is likely that the number of eligible units may not be known exactly, and that the response rate can only be approximated. For example, in a telephone survey the eligibility of phone numbers that are never connected (always busy, unanswered, or routed to an answering machine) cannot be determined. The best strategy is to make repeated attempts until contact is made.
One useful way to provide response information is to develop a process chart, similar to those shown in Chapter 3.0 (Figures 3.8, 3.10, and 3.11) of the Travel Survey Manual, which is reproduced in the Appendix of this report. For the specific survey effort record the disposition of all survey field work on the chart. With this information, analysts can calculate each completion rate of the survey effort. This type of reporting will be especially useful for future household survey teams trying to plan fieldwork resources.
Table 12.9 presents an example of response rate information from a recent household travel/activity survey in New Hampshire. This table provides information on the number of contacts, the contact percentage, eligibility percentage, reasons for ineligibility, participation rate, and overall response rate. The response rate that corresponds to the CASRO definition is 19%; however, some analysts also consider the percentage of those recruited (53% in this case). It is recommended that complete summaries such as Table 12.29 be prepared for all household surveys to provide the variety of information needed by different analysts.
In the New Hampshire survey, the eligibility of 28,125, or 97%, of the 29,036 different phone numbers was eventually determined. The response rate of 19% was computed assuming that none of the phone numbers not reached was eligible. This is obviously an oversimplification; the true percentage of eligible respondents among the 911 numbers that were never connected to cannot be known. Given that survey recruitment was done in the evening, it is likely that the eligibility percentage of the non-connects was lower than for those that were reached (since many businesses are not open at night). However, even in the unlikely event that all of those numbers were eligible, the response rate would not have been much different (18%). The true rate, of course, is somewhere in between. The fact that the range can be computed so narrowly illustrates the advantage of continuing to attempt to contact potential respondents who are not reached on the first attempt.
Table 12.9 Example of Response Rates from a New Hampshire Activity Survey

REFERENCES
Bhat, C. Estimation of Travel Demand Models with Grouped and Missing Income Data, Transportation Research Record 1443, 1994, Transportation Research Board, pp. 45 53.
Brög, W. and Meyburg, A.H. Influence of Survey Methods on the Results of Representative Travel Surveys. Presented at 61st Transportation Research Board, Meeting, January 1982.
Council of American Survey Research Organization (CASRO). On the Definition of Response Rates, a special report of the CASRO Task Force on Completion Rates, Lester Frankel, Chairman, June 1982.
Harrington, I. and Wang, C.Y. Adjusting Household Survey Expansion Factors, presented at the 5th Conference on Transportation Planning Applications, Seattle, April 1995.
Kim, H., Rodman, S., Sen, A., Soot, S., and Christopher, E. Factoring Household Travel Surveys, Transportation Research Record, 1412, Transportation Research Board, 1993, pp. 17 22.
Richardson, A. J., Ampt, E.S. and Meyburg, A.H. Survey Methods for Transport Planning, 1995, Eucalyptus Press, p. 299.
Richardson, A. J., Ampt, E.S. and Meyburg, A.H. Survey Methods for Transport Planning, 1995, Eucalyptus Press, pp. 321 335.
Stopher, P. and Stecher, C. Blow Up: Expanding a Complex Random Sample Travel Survey, Transportation Research Record, 1412, Transportation Research Board, 1993, pp. 10 16.
Wermuth, M. Non-Sampling Errors Due to Non-response in Written Household Travel Surveys in Ampt, E.S., Richardson, A.J., and Brög, W., 1985, New Survey Methods in Transport, VNU Science Press: Utrecht, The Netherlands, pp. 349 365.