OeNB Report 2023/1: Eurosystem Household Finance and Consumption Survey 2021: Methodological notes for Austria
The HFCS in Austria (fourth wave): key facts
Methodological framework at a glance
Nicolas Albacete, Peter Lindner, Karin Wagner
Questionnaire
The HFCS in Austria is based on an internationally harmonized questionnaire that covers the key stock and flow components of the household balance sheet and integrates them with socioeconomic characteristics. Data were collected from households.
Reference period
The data on stock positions and socioeconomic characteristics refer to the survey date (the fieldwork was carried out between late May 2021 and February 2022). Income-related data refer to the 2020 calendar year. The data on household consumption refer to a typical month.
Geographical scope
Austria
Sampling
Target population
All households in Austria (irrespective of nationality and citizenship)
Sampling frame
Postal addresses of all households in Austria
Sampling design
Stratified two-stage cluster sample design
- Stratification: NUTS-3 regions divided into 8 classes by municipality size
- Primary sampling unit (PSU): enumeration districts
- Secondary sampling unit (SSU): postal addresses
The gross sample comprised a total of 598 PSUs and 6,300 SSUs in 188 strata.
Survey company
Institut für empirische Sozialforschung GmbH (IFES)
Fieldwork
General information
Fieldwork period: May 2021 to February 2022
Number of interviewers: 47
Method of data collection: Computer-Assisted Personal Interview (CAPI)
Interviewer training
Number of virtual HFCS interviewer training sessions: Refresher training sessions: 4 complete training sessions: 7
Duration of virtual HFCS interviewer training: Refresher training sessions: 1 day Complete training sessions: 2 days
Pilot survey
Number of pilot interviews: 50
Contact strategy
All households received a personalized letter from the governor of the Oesterreichische Nationalbank (OeNB) and an information leaflet distributed by IFES before they were contacted by the interviewers.
The interviewers had instructions to make up to five contact attempts per household over a period of at least three weeks. At least two of these attempts were to be made in person, at least one attempt was to be made on a weekend and another outside regular working hours (9:00 a.m. to 5:00 p.m.).
Incentives for participation
Participation in the HFCS was voluntary.
Each household that successfully completed an interview received a voucher with a face value of EUR 10.
Documents used during interviews
Showcards, interviewer manual, glossary
Interviewer monitoring
The survey company monitored the conduct of the interviews by randomly checking (by telephone) around 15% of the interviews.
The (anonymized) data from the completed household interviews were forwarded to the OeNB in 19 batches during the field phase, to enable prompt assessment of each interview and interviewer.
Follow-up queries by telephone
Follow-up queries were made by telephone to address outliers and inconsistent responses from some 220 households.
Editing and consistency checks
Number and type of edits
Number of observations: around 1.2 million
Thereof edited observations: around 59,200 (about one third on the basis of verbatim records)
Percentage of edited observations: 4.9%
Consistency checks during the interviews
Number of consistency checks programmed into the questionnaire: around 250
Postinterview consistency checks
Expert analysis of the data from each interview, follow-up phone calls to clarify uncertainties, investigation of outliers and consistency checks of the information collected, technical review of filtering.
Documentation
Flag variables are used to document all edits and imputations.
Imputations
Method: multiple imputation with chained equations (broad conditioning approach)
Number of multiple imputation samples: 5
Number of iterations per imputation sample: 10
Median of the variables with missing values per household: 8.0
Mean of the variables with missing values per household: 17.3
Sample size and response rate
Number of households in the sample (gross sample): 6,300
Number of successfully interviewed households (net sample): 2,293
Number of households that could not be reached (despite five contact attempts): 60
Number of households that refused to participate: 3,342
Number of households that did not participate for other reasons: 130
Number of addresses whose eligibility was unknown: 16
Number of ineligible addresses: 64
Incomplete interviews and interviews discarded after fieldwork: 44
Response rate: 39%
Weighting
Final weights computed with nonresponse and poststratification adjustments to design weights – methods:
- Model-based adjustment combined with weighting-class adjustment, based on optimal number of classes (nonresponse adjustment)
- Cell adjustment (poststratification adjustment)
Smallest final weight: 318
Median of final weights: 1,558
Mean of final weights: 1,774
Highest final weight: 10,475
Sum of final weights (target population): 4,066,627
Unequal weighting effect: 1.329
No trimming and no normalization of weights
Variance estimation
Method: Rescaling bootstrap procedure
Number of replicates: 1,000
Number of pseudo-strata: 134
Computation of replicate weights: adjustments made to design weights to obtain replicate weights are identical to adjustments made to obtain the final weights.
Finite population corrections were applied to all replicate weights.
1 Introduction
The Eurosystem Household Finance and Consumption Survey (HFCS) is the most comprehensive compilation of data capturing real assets, financial assets, debt and expenditures of households in one survey, allowing for in-depth scientific analyses of household balance sheets in line with international standards. HFCS data are comparable across all participating countries thanks to the ex ante harmonization of the survey and of the survey methods applied. The geographical scope of the HFCS has been broadened with every wave, to include more than 20 countries in the fourth wave. In Austria, the euro area HFCS was first carried out in 2010/11, a second time in 2014/15, a third time in 2016/17 and most recently in 2021/22. All four waves were conducted by the Oesterreichische Nationalbank (OeNB) in cooperation with the survey company IFES (Institut für empirische Sozialforschung GmbH). The ECB is expected to make data for all participating countries from the fourth HFCS wave available for research purposes around summer 2023.
This publication provides an in-depth view of the data collection process and the methods applied. Based on the methodological documentation from the first three HFCS waves in Austria (Albacete et al., 2012, Albacete et al., 2019), it aims at making the process of data collection as transparent as possible and serves as the basis for correct evaluation of HFCS data. Since the first wave, specific methodological aspects have been discussed in a number of publications. For instance, the information gathered from respondents’ verbatim answers (Lindner and Schürz, 2017) and from the survey of interviewers has been examined in depth and cross-checked with the HFCS data (Albacete and Schürz, 2013b and 2015). Other papers have discussed the relevance of paradata and ways to improve them (Albacete and Schürz, 2014a and 2014b) as well as comparability with other surveys conducted in Austria (Albacete and Schürz, 2013a) and across HFCS countries (Andreasch et al., 2013). Moreover, different approaches to compiling the components of the household balance sheet have been compared (Lindner and Schürz, 2015) and methodological enhancements between the first and the second wave of the HFCS in Austria have been discussed (Lindner et al., 2014).
The chapters are self-contained, each dealing with specific aspects of the HFCS, and can therefore be read independently of each other. Cross-references help the reader recognize links to other chapters or material aspects discussed within them. The sequence of chapters reflects the logical flow of the survey. Closely related topics (e.g. constructing survey weights and estimating the correct variance using HFCS data) are arranged in a way to ensure comprehensibility. To avoid redundancies, only essential details were repeated. The following seven chapters provide a detailed explanation of each step in the survey process, with a further chapter designed as a user guide.
Chapter 2 on the Questionnaire of the HFCS in Austria explains the content of the survey, discussing the individual parts and special features of the questionnaire, the sequence of questions as well as the unit of data collection.
Chapter 3 looks into the role of the interviewers who conducted the face-to-face interviews. Great importance was placed on the qualifications of the interviewers as well as on their professional demeanor and expertise, which all contributes significantly to the quality of data obtained. The chapter also covers details on the contact strategy and incentives for households to participate in the survey. Moreover, it outlines the information material and documents that were made available to the households in the HFCS sample.
All raw data collected by the interviewers were reviewed during the field phase, leading to the collection of further data or data edits where necessary. This process is described in detail in chapter 4 on Consistency checks and editing, which lists all changes to the raw data as well as the flags included in the dataset to document such changes.
Chapter 5 on the Multiple imputations applied in the HFCS deals with item nonresponse. For cases in which respondents were unwilling or unable to answer one or several questions, we performed multiple imputations to obtain the missing information. This approach made it possible to correct distortions due to item nonresponse at least to some extent and also to account for the uncertainty attached to imputations, which, like all edits, have been flagged. Users of the HFCS data may apply our imputations or deal with item nonresponse in a different way.
Chapter 6 on Sampling provides a detailed description of the survey sample design. The complex survey sample design used for the first wave was developed further for the second wave to ensure a sufficiently representative sample of Austrian households that fits the purpose of the Eurosystem and the OeNB. The third and fourth wave retained the sample design methodology of the second wave.
The final household weights were calculated in several steps on the basis of the sampling design. Chapter 7 outlines the procedure for the Construction of survey weights. The sampling design yields design weights for each household already in the sampling process. It takes several steps to process these weights to account for information obtained during the field phase (such as nonparticipation of households and external information regarding the distribution of certain household characteristics).
Another step is required to obtain the correct variance estimation, which is presented in chapter 8 on the Construction of replicate weights for variance estimation.
The User guide in chapter 9 provides basic guidance on the correct use of HFCS data in Stata.
In chapter 10 we describe all the information about COVID-19 pandemic related issues encountered in the fourth wave of the HFCS in AT. The planned start of the field period concidentially subsided with the first lock-down due to the pandemic. This made several changed necessary. All the details are layed out in this chapter. Additionally, boxes in specific chapters clarify special arrangemend in the topic covered.
Finally, an Online appendix (www.hfcs.at/en) serves to provide all the essential documentation and background material used in the HFCS (available in German only). The HFCS website also provides information about the publication of HFCS data from all participating countries by the ECB (expected in summer 2023) and any other HFCS news.
2 Questionnaire
2.1 Introduction
Almost twenty years ago, in 2006, the Governing Council of the European Central Bank (ECB) decided to establish the Household Finance and Consumption Network (HFCN) to collect data on wealth, income and consumption from a representative sample of households. The ultimate goal of the network is to ensure ex ante harmonization of the information collected. Harmonization was also broadly achieved with regard to methodology. Nevertheless, some cross-country differences in the technical conditions for the HFCS surveys, for example the respective sampling designs (see chapter 6) or multiple imputation procedures (see chapter 5) remain.
It is the intention to conduct the HFCS every three years. In Austria, the survey has now been conducted four times including the fourth wave documented here (HFCS 2010, HFCS 2014, HFCS 2017 and HFCS 2021). The questionnaire for the fourth HFCS wave was based on the questionnaires used and experiences made in the first three waves. This chapter presents the Austrian questionnaire, which was designed on the basis of the HFCN blueprint questionnaire (drafted in English) 1 but expanded to include country-specific features (e.g. loan specifics or the national variant of housing association apartments).
This chapter is structured as follows: First we outline the objectives of the HFCS survey in Austria (section 2.2), define the data collection unit (section 2.3), and the reference period (section 2.4). Then we describe the sequence of the questionnaire and highlight some core questions and variables (section 2.5). We subsequently discuss special features of the questionnaire (section 2.6) and list interviewer documents (section 2.7) and participating countries (section 2.8). Section 2.9 refers to further information provided in an online appendix.
2.2 Objectives of the survey
The main goal of the HFCS is to collect microdata on the structure of the assets and liabilities of households in Eurosystem countries as well as some EU countries that have not yet adopted the euro, 2 as these data allow for the analysis of households’ investment and consumption decisions. These survey data can be used, for example, to:
- gain insights into various aspects of the monetary transmission mechanism and of financial stability,
- gain insights into individual household behavior,
- analyze the impact of economic policy measures and macroeconomic shocks, and to
- make cross-country comparisons.
Household-level finance and expenditure data are indispensable for a central bank, as they contribute significantly to improving the analysis of monetary policy and financial stability. Furthermore, the economic developments of the past decade and more have clearly shown that it is not the level of household debt – which can be calculated from macrodata – that matters most in the assessment of stability risks, but the specific burden on different income, occupational and age groups. Hence, decision-making on monetary policy and financial stability issues is also informed by analyses based on the HFCS, which is the most comprehensive household-level survey conducted on this subject in the euro area. Publications attesting to the variety of insights gained from analyses relying on the microdata from the first three HFCS waves are, amongst others, Albacete and Lindner (2013, 2015, 2017a and 2017b), Albacete et al. (2014), Albacete et al. (2016 and 2018b), Albacete et al. (2018a), Beer and Wagner (2017) Fessler and Schürz (2013, 2017, 2022), Fessler et al. (2014, 2015 and 2017), Lindner and Redak (2017) and Wagner (2014), Lindner et al. (2022), Fessler and Schürz (2022), Drescher et al. (2020), Bekthiar et al. (2019), Albacete et al. (2022), Kennickell et al. (2021), Lindner and Schürz (2019, 2021), Lindner (2021), Albacete et al. (2020, 2021), Lindner and Probst (2020).
2.3 Unit of collection
To begin with, a survey requires a clear definition of the target population (see also chapter 6) and the data analysis unit. In the case of the HFCS, households represent the main unit of collection, although some data are also collected from individuals. The household definition is contained in the HFCS common template.
2.3.1 Definition of “household”
For the purpose of the HFCS, a household is defined as an individual living alone or a group of people living together in the same private dwelling who share household expenses and jointly take expenditure decisions. 3 More specifically, household members include:
- people who live in the same household and are related to each other,
- people who share household expenses and live in the same household but are not related,
- people who usually live in the same household (the reference period being the six months before the interview) but are temporarily absent e.g. because of holiday travel, job assignments away from home, hospital stays or boarding school stays, and
- children who are educated away from home but do not constitute a separate household, i.e. remain financially dependent on their family.
The household definition also includes people who have been members of the household for less than six months at the time of the interview (e.g. a new partner or a new child), provided that they share household expenses with the other members or are fully financially dependent on the household (children).
Employees of residents, like au-pairs or nursing staff, short-time visitors or subtenants are considered separate households. In shared apartments, all residents are treated as separate households unless they also share household expenses. This means that a particular address may be used by more than one household as defined for HFCS purposes (e.g. people sharing a residence). In such cases, we selected the household whose member received the letter of invitation to participate in the survey.
The definition also includes households with non-German speaking household members, households living at a place registered as their second home in the centralized residence registry and households not officially registered at a particular residence but living there.
2.3.2 Financially knowledgeable person
All questions pertaining to the household were put to the person that the members of the household deemed to be most familiar with the household’s finances, i.e. liabilities, assets, income and expenditure. This person, referred to as the financially knowledgeable person (FKP), answered all questions relating to the household as a whole (green sections in chart 1) as well as individual questions on behalf of absent household members. As a rule, questions relating to individuals were meant to be answered individually by all household members aged 16 and over.
Since the FKP was typically a member of the household, he or she was also the reference person of the given household. In line with the approach taken in the previous waves, the FKP could also be a third person who was not a household member. This could be a family member (e.g. a son or daughter) who oversaw the household’s finances but was no longer a member of the household or a household’s tax consultant or financial advisor. Only four households in the sample from the fourth wave had an FKP who was not a household member. In this case a member of the household was selected as the reference person for the household.
2.4 Data collection period and reference period
In general, all the questions asked in the HFCS, especially those related to stock data, referred to the status quo at the time of the interview, which was conducted during the field phase from late May 2021 to February 2022. In contrast, questions about income (aside from the question on the average monthly net household income) 4 referred to the calendar year 2020 (reference period). Of the interviews contained in the dataset more than 80% were completed in 2021 and the remaining about 20% at the end of the field phase in 2022.
Also for households interviewed in 2022 the reference period for yearly income was kept as 2020.
The field phase included lock-down periods due to the COVID-19 pandemic, in which no interviews could be conducted (see box 1 for more information).
The COVID-19 pandemic and the HFCS questionnaire
The HFCS questionnaire used in Austria is traditionally based on the internationally agreed core questionnaire and has repeatedly been adapted on the basis of experience gained in previous waves. We finished work on the questionnaire, its translation into German and associated programming before the COVID-19 pandemic hit Austria. Changes made necessary by the extraordinary circumstances created by the pandemic, such as the reference year for income or an additional set of COVID-19-specific questions, were implemented later. We integrated this additional set of questions into the consumption section, which covered information on the consequences of COVID-19 on households’ work, income, finances, and consumption. In contrast to some other countries taking part in the HFCS, we kept the computer assisted personal interview (CAPI) technique for this survey wave, because it is the best available method given the complexity of the survey.
The field period was initially planned to start in March 2020, but the pandemic made it necessary to postpone the start by more than one year. The Austrian field period of wave four eventually started in May 2021 and lasted until February 2022. Despite this extended period, about 350 randomly selected Viennese addresses of the initial gross sample were not contacted – due to the low number of interviewers – and eventually left out of the sample (see also COVID-19-related information in the calculation of weights in the HFCS in chapter 7). Furthermore, because of (regional) restrictions and warnings, the field period had to be interrupted (from November 22, 2021, to December 12, 2021) and restarted. No interviews were conducted during lockdowns.
Health risks made interviewing particularly difficult during this survey wave. However, once a household could be convinced to voluntarily participate, and as safety measures (such as wearing FFP2 masks) were complied with, interviewing was comparable to previous waves (for a comparison of item-nonresponse with previous waves, see also the COVID-19-related information in chapter 10).
2.5 Interview structure and content
2.5.1 Questionnaire structure
The questionnaire consists of three parts: the preinterview, the main interview (divided into household and individual questions) and the postinterview. This structure was chosen to make the survey as user-friendly as possible and to keep the duration of the interviews short. Chart 1 shows the sequence of interview questions in the HFCS.
Preinterview
Before the actual interview, households were informed about the content and structure of the survey. 5 If a household was willing to participate, the interviewer first recorded the household matrix and identified the FKP. Recording the household matrix data involved determining the size of the household as defined for HFCS purposes, listing the individual household members and identifying the main respondent, i.e. the FKP or, if the FKP was not a household member, a reference person. It was also at this stage that key identifying data were collected for all household members, namely gender, age and relation to the main respondent. Finally, all these basic household data were listed in a table to facilitate verification and, if necessary, revision. If the FKP was present, the interview proceeded immediately – or, if the household so wished, at a later date – with the questions about general household characteristics.
General characteristics
In this section of the questionnaire, more detailed sociodemographic characteristics were collected for all household members: country of birth as well as length of stay in Austria for persons not born in Austria. For household members aged 16 and over, information about the level of education (including that of their parents) and marital status was recorded as well. The following sections (on consumption, real assets and their financing, other liabilities and credit constraints, investments in self-employment businesses and financial assets, inheritances and gifts) were used to collect information about the household as a whole.
Consumption
In this section, respondents were asked about the household’s consumption and saving behavior and some components of household income. The questions on consumption aim at collecting information about households’ typical average monthly expenditure on food, utility costs, total consumption spending and transfers to people outside the household. The monthly net household income was asked as a means to verify whether it was high enough to finance the household’s expenditure; if not, further information was collected on how a household financed any expenses in excess of this income.
Expenses for holiday and travel was asked for the last calendar year. Income and consumption yield to questions on the savings behavior and attitude.
Additionally, this chapter contained a set of questions concerning the impact of the COVID-19 pandemic on the households financial situation (see box 1 and chapter 10 for more details).
Real assets and their financing
This section contained questions about the household’s housing situation and most other real assets (excluding investments in self-employment businesses, which are addressed later) as well as their financing. The first set of questions established the location and size of a household’s main residence and then focused on the tenure status of the household main residence (variable (A)HB0300), grouping households into (partial) owners, tenants or free users of their homes.
Households owning their main residence were asked to indicate when and how they had acquired it as well as the value of the property at the time of the interview and at the time they first acquired ownership. Further, homeowners were asked whether the property was being used as collateral for a loan. In the case of a collateralized loan, the following data were collected separately for a maximum of three loans: purpose of the loan, initial amount, duration of the loan, outstanding principal, interest rate and type of interest, repayment rates and other characteristics. If a household had taken out more than three mortgages, the FKP was asked to provide summary information on the outstanding amounts and repayment rates of these additional (more than three) mortgages.
Households living in a rented home were asked about their rental cost (including and excluding running costs). Tenants of housing association apartments were asked to indicate the deposit made to the housing association. Any debt tenant households had incurred to finance such deposits was recorded under unsecured loans (see “Other liabilities and credit constraints”).
Households that enjoyed free use of their main residence did not have to answer any further questions in the first half of this section.
The next set of questions for all three groups addressed additional real estate holdings, including holdings abroad. Specifically, loop questions, i.e. sets of questions with multiple iterations, 6 were used to identify basic information (i.e. type, size, value at the time of ownership transfer and at the time of the interview, and use) for up to three other real estate properties, as well as a summary question serving to establish the total value for any further properties. As was done with mortgages on the main residence, mortgages taken out on these properties were queried using loop questions for a maximum of three loans, and summary information was collected for further mortgages taken out on additional properties. In contrast to relatively recent developments with regard to the ECB blueprint questionnaire the implementation of this part on other real estate remained unchanged in Austria from wave two and three to wave four.
Finally, households were asked to indicate the value of any cars or other vehicles as well as of any valuables (e.g. works of art, antiques) they owned. To conclude this section, respondents were asked whether they had bought a car or any other vehicles in the previous 12 months and if so, at what price.
Other liabilities and credit constraints
This category covered all other credit liabilities: leasing contracts, outstanding balances on current accounts and credit cards, private loans and noncollateralized loans.
For the first three types of liabilities, interviewers asked whether a household held such liabilities and how much was outstanding (for leasing contracts: the amount of lease payments per month). For outstanding balances on credit cards, the question put to households was: after paying the most recent monthly bill, was there any balance outstanding on your credit cards?
Respondents were asked separately whether they had private and noncollateralized loans. Private loans refer to loans from friends or family and noncollateralized loans to any further loans such as for example consumer loans or employer loans. When held by a household, the same information as for collateralized loans was gathered in the form of loop questions for the first three such loans of each of the two types. If a household had more than three of one of these two types of outstanding loans, the outstanding principal and the total monthly repayment rates for these were asked for in a summary question at the end.
This section of the questionnaire also addressed any loan requests that a household had made more recently and potential repayment difficulties of loans and other bills. Finally, respondents were asked about their attitudes to loans, their planning preferences and their level of risk aversion.
Private businesses and financial assets
In the sections outlined in the following, the HFCS survey documented all wealth components of the household balance sheet beyond the real assets already covered. Interviewers started out by asking whether a household partly or entirely owned any nontraded self-employment businesses in which at least one household member played an active role, and, if so, recorded separate information for up to three such businesses (respective industry, legal form, number of employees and current value). If a household was invested and involved in any additional private businesses (other than the three business operations already recorded), the value of those holdings was asked at the end in a summary question. Another two questions related to ownership of any nontraded shares of businesses in which the household played no active role and to the value of these holdings.
The next questions focused on assets held in sight accounts, savings accounts, savings plans with building and loan associations, life insurance funds, mutual funds, bonds, listed stock, as well as assets in private foundations and managed accounts. For each of these items, FKPs were asked to indicate whether their household held such assets (yes/no question) and, if the answer was yes, what the total value of these assets was. For life insurance contracts, more detailed questions were asked. These included the date of conclusion, the type and duration of contract (benefits to be provided at the death of the policy holder or at a given date, or a combination thereof) and the frequency and amount of payments into the life insurance plan. This information allows for projections of the amounts held in such funds. In addition, respondents were asked about money owed to the household as well as other financial assets. Interviewers also asked for a household’s estimated total net wealth as well as for the distribution of this wealth across household members. This estimate was used to assess the plausibility 7 of the information provided, i.e. the sum of itemized figures was cross-checked with the total.
Inheritances and gifts
The next section of the questionnaire focused on the transfer of asset ownership in the form of inheritances and gifts. In a loop, interviewers asked for up to five inheritances or gifts, 8 recording information about the value at the time of transfer, the type and source of inheritance or gift, and when the transfer was made. The inheritances or gifts were listed in descending order, starting with the most important one for a household’s current wealth. If there were more than five inheritances and gifts, their total value was queried in the questionnaire. However, in this survey for Austria only one household indicated having received more than five.
Up to this point, almost all questions (of course, excluding those about the sociodemographic makeup of the household) focused on the household as a whole. In contrast, the questions in the three subsequent sections of the HFCS questionnaire, with a few exceptions, related to individual household members aged 16 years and above rather than to the household as a whole.
Employment
Household members aged 16 and above were asked to provide information on their employment status. The first set of questions focused on people active in the labor market. Pensioners, homemakers, schoolchildren, students and unemployed people were only asked about their expected retirement age and number of years spent in employment so far, before moving on directly to the second set of questions. Employed individuals answered questions about their occupation (ISCO code 9 ), the number of working hours (with seasonal fluctuations), the company’s main economic activity (ÖNACE code 10 ), the amount of time they had worked both for their current employer and overall in their working life, and about their expected retirement age. In the second part, all individuals aged 16 and above were asked to answer questions about their job history and their personal background. 11 The information provided in this section of the questionnaire is especially relevant in combination with that of the next section, which deals with income.
Income
Information on income was recorded by types of income. Respondents first indicated whether they received a certain type of income, and if so, what the annual amount was (information on annual income being usually readily available, e.g. on the income tax declaration). All types of income were recorded for the calendar year 2020 regardless of whether the interviews were completed in 2021 or 2022. Theoretically, for interviews that were conducted during 2021 it could be possible for a respondent to not yet know what their income in 2020 would have been, but for the interviews contained in this dataset this did not present a real problem. The option to add verbatim comments to responses made it possible to ensure during the editing phase that the information provided was indeed the information that had been queried (see chapter 4).
The types of income covered were employee and self-employment income, income from the state pension system and from private and company pension plans as well as income from unemployment benefits. For the first four types of income, respondents could provide either gross or net figures. (In the editing stage (see chapter 4) all net figures were converted to gross income figures using the Austrian Finance Ministry’s gross-to-net calculator.)
Pensioners, who did indicate not to have received pension in 2020 were queried about potential income in 2021. This information could be used to estimate 2020 income.
In addition to these individual questions, the household-level part of the interview with FKPs also established whether a household received income from regular (public or private) social transfers, from the rental or leasing of real estate or from financial assets or private businesses. If respondents were unable to provide the gross income from financial investments, the relevant net income was also accepted. Finally, interviewers asked about other sources of income and about expected income growth.
Pensions
The main pension variables collected in the HFCS questionnaire included eligibility for future income from the state pension system and the number of state pensions to which they are entitled. Likewise, they were asked to indicate the number of contribution years and the account balances of, or their total contributions, to company or private pension plans.
Assessments 12
In this section, FKPs were invited to provide their assessment of their households’ position in the national wealth distribution as well as their view on the functions of wealth.
Postinterview
After the interview, respondents were encouraged to comment on questions they had found particularly hard to answer, items not covered by the questionnaire they would have deemed relevant, etc. Any comments were recorded as verbatim text. In addition, a so-called paradata section collected background information from interviewers about the interviews (see section 2.6.4.2).
2.5.2 Field phase
2.5.2.1 CAPI implementation (questionnaire programming)
The questionnaire was programmed using Warp-It developed by Solve-x software on the basis of a Word template, the German original of which is available in the online appendix. In addition to filtering, the questionnaire also prompts internal consistency checks (see chapter 4) that will flag possible errors in data entry during the interview. The use of the Computer-Assisted Personal Interviewing (CAPI) technique combines the advantages of a personal interview with those of real-time digital recording and data cross-checking. In addition, it allows for the implementation of complex filtering techniques, producing tailor-made questionnaires for each household. This test took place before the COVID-19 pandemic and could be conducted in person.
2.5.2.2 CAPI test
After the first programming phase, the questionnaire was tested by members of the OeNB’s HFCS team and in a pilot survey of 50 households.
2.5.2.3 CAPI problems
This wave of the HFCS benefited from the experiences of the three prior waves. As a result, errors could be largely eliminated. Below is a list of programming issues that led to reprogramming during the field phase of the fourth wave or editing thereafter:
- Intra-household asset distribution (ahd1940x, ahd1950x): In households with a single adult and children below the age of 16 were not asked these questions at the beginning of the filed period. This filtering was wrong and corrected after very few affected households. Missing observations for households in the beginning were corrected in the imputation model.
- Summary question for uncollateralized loans ((a)hc1200): This question should have been asked for households with more then three uncollateralized loans of this type. After one household this mistake was corrected. This single missing observation was multiple imputed.
- Interval recording of the value of business participation (hd0801i): After this interval recording also the currency was recoded although pre-defined intervals are only collected as euro values. Only one household was affected before correction and the double information was deleted from the dataset.
- Value of bonds (hd1420): If all the recordings of the type of bond (hd1410x) were “Don’t know”, there was wrongfully no value of the bonds collected. This mistake was correct after one household went this way through the questionnaire. The missing information was collected through the after enquiry over the phone.
- Estimates of gains/losses in income/consumption during the time of the COVID-19 pandemic (ahv0210, ahv0610): For these two questions the wrong list of predefined ranges for euro questions was used. For the affected households the broader interval was used in the multiple imputation model.
- Employment status (ape0200x): In the beginning of the field period persons who should reach this question (based on the apa0100b) did not reach it. The filtering error was corrected and only two households affected. The information in these households was collected through after enquiry by phone.
- NACE classification for retired or other inactive persons (pe0470): Due to a filtering error this variable was not collected and is left empty.
2.6 Special features
2.6.1 Loops
Various aspects of a household that are especially relevant for the HFCS were asked using loops, i.e. sets of identical questions to collect information on individual items applicable to a household (e.g. loans). Chart 2 shows the schematic cycle of these loops.
Loops were used for the following items:
- mortgage loans using the household’s main residence as collateral (max. number of iterations: 3),
- real estate holdings other than the household’s main residence (max. number of iterations: 3),
- mortgage loans that used these other properties as collateral (max. number of iterations: 5),
- unsecured loans from family and friends (max. number of iterations: 3),
- other unsecured loans (max. number of iterations: 3),
- investments in self-employment businesses (other than holdings of listed shares) (max. number of iterations: 3),
- life insurance contracts (no max. number of iterations),
- ownership transfers through inheritances or gifts (max. number of iterations: 5).
First, respondents indicated whether a specific item applied to their household, and if so, the number of such items held by the household. The details were then recorded in a loop for each of these items to the maximum number of iterations. For instance, if a household held two other unsecured loans, the respondent was asked to first provide details of the loan with the higher outstanding balance and then details of the second loan. If a household had received multiple inheritances and gifts, the procedure was repeated up to five times. When there were more items than the maximum number of iterations permitted in the loop, interviewers recorded a summary of the information. 13 For life insurance contracts, the questionnaire did not set a maximum number of iterations in order to be able to record all contracts individually.
To make the survey as user-friendly as possible, it was made possible for interviewers to exit a loop at any time. In this case, information was collected in summary questions. Interviewers were instructed not to overuse the option of exiting loops; this feature was only meant to prevent respondents from breaking off the interview early if they were unwilling to answer a given question for more than one item.
The loops for real estate holdings other than the household’s main residence and mortgage loans collateralized with these other properties are structured differently in the ECB template questionnaire. Whereas the Austrian questionnaire used two separate loops to establish this information (as in wave one and wave two), the questions on loans collateralized with further property were nested in the loop about further property in the ECB blueprint questionnaire. Additionally, the HFCS in Austria was designed to record in this section which property was being used as collateral for a particular loan. Thus, all the information required by the ECB questionnaire was contained in the Austrian questionnaire and the variables were adapted and renamed as required for the ECB version after the data imputation phase.
2.6.2 Euro loops
All questions involving monetary amounts were asked in a loop to avoid data entry errors and to obtain a range containing the approximate amount if respondents were unable or unwilling to state specific amounts. This section describes the structure of euro loops (see the online appendix on “Euro loops” for a schematic overview).
In the first step, respondents were asked to provide specific amounts (“How much…?”) in any given currency. After the information had been recorded, respondents were asked to confirm the amount and the currency. (“You said the amount is … [currency]. Is that correct?”)
If no specific amount was provided, respondents were asked to indicate a range (“Could you provide a range for the amount?”). Rather than having an upper and a lower bound, the range could also be limited at one end and open at the other end (e.g. “no more than EUR ...” or “at least ATS …”). If respondents stated a range, the interviewer continued again by asking respondents to first indicate the currency and then confirm the range and the currency.
If respondents were unable (“Don’t know”) or unwilling (“No answer”) to indicate an individual range themselves, they were asked to choose a range from a list. Depending on the question at hand, interviewers used one of three lists of ranges (see table 1).
List A | List B | List C | |||
---|---|---|---|---|---|
EUR | |||||
A | 1 – below 101 | A | 1 – below 10,001 | A | 1 – below 1,001 |
B | 101 – below 201 | B | 10,001 – below 50,001 | B | 1,001 – below 2,501 |
C | 201 – below 301 | C | 50,001 – below 75,001 | C | 2,501 – below 5,001 |
D | 301 – below 401 | D | 75,001 – below 100,001 | D | 5,001 – below 7,501 |
E | 401 – below 501 | E | 100,001 – below 150,001 | E | 7,501 – below 10,001 |
F | 501 – below 751 | F | 150,001 – below 200,001 | F | 10,001 – below 15,001 |
G | 751 – below 1.001 | G | 200,001 – below 300,001 | G | 15,001 – below 20,001 |
H | 1,001 – below 1,501 | H | 300,001 – below 400,001 | H | 20,001 – below 25,001 |
I | 1,501 – below 2,001 | I | 400,001 – below 500,001 | I | 25,001 – below 30,001 |
J | 2,001 – below 3,001 | J | 500,001 – below 750,001 | J | 30,001 – below 35,001 |
K | 3,001 – below 5,001 | K | 750,001 – below 1 mio | K | 35,001 – below 40,001 |
L | 5,001 – below 7,501 | L | more than 1 million – 3 million | L | 40,001 – below 50,001 |
M | 7,501 – below 10,001 | M | more than 3 million – 5 million | M | 50,001 – below 75,001 |
N | 10,001 – below 25,001 | N | more than 5 million – 10 million | N | 75,001 – below 100,001 |
O | 25,001 – below 50,001 | O | more than 10 million | O | 100,001 – below 200,001 |
P | more than 50,001 | P | 200,001 – below 300,001 | ||
Q | 300,001 – below 500,001 | ||||
R | 500,001 – 1 million | ||||
S | more than 1 million | ||||
Source: HFCS Austria 2021, OeNB. |
The three lists of predefined ranges (A to C) are based on the (unweighted) empirical distribution of the answers to numerous questions in the first wave of the HFCS in Austria and were used for survey waves two to four. This evidence showed that, for specific questions, the main part of the distribution called for smaller and hence more specific ranges than the remaining parts of the distribution. List A was used for questions about consumption expenditure and loan repayments. List B was used for questions related to properties and investment in self-employment businesses, and list C was typically used for outstanding loans and incomes. Questions about financial assets were aligned either with list A or list C, depending on the distribution of assets as observed in the first wave of the survey. 14 The predefined ranges referred to amounts in euro only. Interviewers moved on to the confirmation question as soon as the respondent had chosen a range. The lists of predefined ranges were presented in the form of showcards for all questions involving monetary amounts (see also section 3.5.4).
Only if a respondent also refused to choose from the list of predefined ranges was the status of the question recorded as not answered (“Don’t know” or “No answer”). The information recorded with the ranges was especially important for multiple imputation (see chapter 5).
2.6.3 Recording farming households
Recording the (real) assets of farming households was found to be a particular challenge for respondents in the first waves, especially when it came to breaking down the assets into the household main residence and business assets. The business assets of farmers have been recorded in the loop on investment in self-employment businesses since the first wave. In order to elicit more precise answers, the second-wave questionnaire introduced a number of additional questions for farming households as well as some additional guidance. These additions and enhancements were retained in the questionnaire for the third and fourth wave. The procedure can be summarized as follows:
- Before the actual interview started, respondents were classified by interviewers as running an “Agricultural business” or as running “No agricultural business.” The classification was straightforward in all but a few cases. But even in the few cases where a respondent was incorrectly classified, the structure of the questionnaire ensured that all essential information was still obtained.
- Specifically, the following extra information was recorded for households classified as farmers:
- Was it possible to separate housing assets (i.e. the household main residence) from business assets? (in the main residence chapter of the questionnaire)
- If not, what percentage of the recorded value did respondents allocate to their main residence? (in the main residence chapter of the questionnaire)
-
Does the value recorded for investment in a self-employment business include the main residence recorded? (in the investments into self-employment businesses chapter of the questionnaire)
- Moreover, for questions relating to the value of their main residence, the yes/no question on properties other than the main residence, as well as for the question about investment in a self-employment business and its value, farming households received detailed guidance as to which components of their household balance sheet were to be recorded under which position.
In addition, all interviewers were specifically trained to handle such cases (see also section 3.3). The additional information thus collected proved to be particularly relevant for multiple imputation (see also section 5.4).
2.6.4 Other information recorded
2.6.4.1 Contact attempts
Every household in the sample population had to have been contacted unsuccessfully on at least five separate occasions before it could be classified as a unit nonresponse (see also the contact rules in section 3.4). 15
The contact attempts were recorded in the dataset, 16 thus providing additional information. The exact time (year, month, day, hour and minute), mode and outcome of every single contact attempt were documented, as was the total number of contact attempts. Interviewers were instructed to write this information down on paper first and record it in the electronic questionnaire only following the completion of the workload (interview, refusal, etc.) on a particular household.
2.6.4.2 Paradata
Two kinds of so-called paradata were collected: While the first type of paradata was collected for all households – including those that ultimately did not participate in the survey – the second type covered additional information on the households that were interviewed.
The first section covered all information that could be obtained without actually entering a household’s residence or completing an interview: the interviewer’s assessment of the building and construction type, the geographical location (urban or rural area), the condition of the building, the residential area and special security measures.
If an interview took place, interviewers also collected the following additional paradata: the condition of the dwelling’s interior, the interview language (in Austria all interviews were conducted in German), the interviewer’s assessment of the accuracy of the information provided, the village or town in which the interview was conducted, the number of people present during the interview and the interest they showed in the interview, the frequency with which respondents consulted documentation to answer questions, and the type of documentation used. In addition, interviewers had to submit written comments about the interview for every single household. These comments, to be made on five questions covering the interview as a whole, proved very helpful at different stages of the project.
The first section of paradata was recorded in the sample register file, which is not part of the user database due to anonymization requirements. It was used mainly to calculate nonresponse weights. 17 The second section (excluding interviewer comments) was recorded in variables HR0100 to HR1600 in the household data file, which is part of the HFCS dataset.
2.7 Interviewer documents
Among other things, interviewers had access to the following documentation to help them prepare for an interview and as a reference point during the interview (also available in the online appendix): 18
- the showcards which were used during interviews to provide respondents with a list of response options for several questions in the questionnaire,
- a glossary, which contained simple definitions of the terminology used in the questionnaire, and
- a copy of the study entitled “Vermögen der privaten Haushalte in Österreich – Gemeinsamkeiten und Unterschiede” (Fessler and Schürz, 2019) to illustrate how the data obtained in the third wave of the HFCS in Austria was used for analysis.
2.8 Participating countries
The fourth wave of the Eurosystem HFCS was conducted in the following euro area countries: Belgium, 19 Croatia, Germany, 20 Estonia, 21 Ireland, 22 Greece, Spain, 23 France, 24 Italy, 25 Cyprus, Luxembourg, 26 Latvia, Lithuania, Malta, 27 Austria, the Netherlands, Portugal, Slovenia, 28 Slovakia, and Finland. 29 Additionally, the fourth wave of the HFCS is expected to include Czech Republic and Hungary.
The survey was prepared by the Household Finance and Consumption Network (HFCN) launched by the ECB. The aim was to achieve ex ante harmonization at as many levels of the survey as possible. In doing so, it was necessary to take features specific to individual countries into account, leading to discrepancies and to additional questions in some cases. In addition to the core output variables, the Austrian survey also collected data that are specific to Austrian households (e.g. information on foreign currency loans). Moreover, the answer options for some questions were categorized in greater detail in the national datasets. A case in point is the question about respondents’ marital status, which came with six answer options in the national dataset but only five answer options in the international dataset. The OeNB is planning to provide the country-specific details as additional information to the datasets that the ECB is expected to publish in summer 2023.
2.9 Online appendix
The following PDF documents are available in German for download from the Austrian HFCS website (www.hfcs.at/en) as an appendix to this chapter:
- the questionnaire,
- the euro loops,
- the paradata questions,
- the variable lists,
- the showcards, and
- the glossary.
1 For further details about the HFCN network and the HFCS survey (including the euro area blueprint questionnaire), see www.ecb.europa.eu/pub/economic-research/research-networks/html/researcher_hfcn.en.html (accessed on June 9, 2023).
2 Some euro area countries (such as Ireland and Estonia) did not participate in the first wave (HFCS 2010). In the fourth wave, some EU countries (Hungary and Czech Republic) that have not adopted the euro also participated in the survey.
3 The HFCS household definition is discussed in greater detail in chapter 6. However, this section outlines the definition as is required for this chapter to be able to stand alone.
4 This is a noncore variable specific to Austria that is not included in the international HFCS dataset.
5 See chapter 3 for a detailed description of the contact strategy.
6 See section 2.6.2 on the structure and navigation of loops.
7 For further details on consistency checks, see chapter 4.
8 The international core dataset only contains up to three inheritances/gifts. In other words, it does not contain any additional inheritances/gifts. Over one-quarter of households reported intergenerational transfers and gifts. Of these below than 1% reported more than three.
9 ISCO: International Standard Classification of Occupations, see www.ilo.org/public/english/bureau/stat/isco/isco08/index.htm (accessed on June 9, 2023).
10 ÖNACE: ÖNACE is the national version of the Statistical Classification of Economic Activities in the European Community (NACE – Nomenclature statistique des activités économiques dans la Communauté européenne); see https://www.statistik.at/datenbanken/klassifikationsdatenbank (accessed on June 9, 2023).
11 The questions from the second part of this chapter in the questionnaire are not in its entirety part of the international core dataset.
12 This is a noncore variable specific to Austria that is not included in the international HFCS dataset.
13 There were very few cases where respondents reported a higher number of items than the maximum number of iterations in the loop.
14 See the questionnaire in the online appendix at https://hfcs.at/en/publikationen/dokumentation.html for a detailed overview of which ranges (list A, B or C) were used for which questions.
15 Interviewers had to make at least two contact attempts in person over a period of at least three weeks.
16 These variables were not included in the user database due to anonymization requirements.
17 These weights are used for the correction of the nonrandom participation of households in a survey and are needed to construct the final household weights (see chapter 7).
18 For a detailed description of the documents, see chapter 3.
19 Information on the survey in Belgium is available on https://www.nbb.be/en/publications-and-research/study-financial-behavior-households-household-finance-and-consumption (accessed on June 9, 2023).
20 Information on the survey in Germany is available at https://www.bundesbank.de/en/bundesbank/research/panel-on-household-finances (accessed on June 9, 2023).
21 Information on the survey in Estonia can be found on https://www.eestipank.ee/en/statistics/research-financial-behaviour-and-consumption-habits-estonian-households (accessed on June 9, 2023).
22 Information on the survey in Ireland is available at https://www.cso.ie/en/methods/socialconditions/hfcsurvey/ (accessed on June 9, 2023).
23 Information on the survey in Spain is available at; www.bde.es/bde/en/areas/estadis/Otras_estadistic/Encuesta_Financi/ (accessed on June 9, 2023).
24 Information on the survey for France, where it is run under the Enquête Patrimoine, can be found at https://www.insee.fr/fr/metadonnees/source/serie/s1005 (accessed on June 9, 2023).
25 Information on the survey in Italy is available at; www.bancaditalia.it/statistiche/tematiche/indagini-famiglie-imprese/bilanci-famiglie/index.html?com.dotmarketing.htmlpage.language=1 (accessed on June 9, 2023).
26 Information on the survey in Luxembourg is available at https://www.bcl.lu/en/Research/enquetes/hfcs/about/index.html (accessed on June 9, 2023).
27 Information on the survey in Malta is available at; www.centralbankmalta.org/en/household-finance-and-consumption-survey (accessed on June 9, 2023)
28 Information on the survey in Slovenia is available at https://www.bsi.si/en/statistics/household-finance-and-consumption-hfcn (accessed on June 9, 2023).
29 Information on the survey in Finland can be found on https://stat.fi/en/statistics/vtutk (accessed on June 9, 2023).
3 Interviewers
This chapter provides an overview of the HFCS interviewers’ role and tasks. It describes how interviewers were supported and monitored in their work and how the data they collected were examined.
3.1 The interviewers’ role in the survey process
The information on households collected in the HFCS in Austria is generally considered to be sensitive. Therefore, the personal interviews conducted by trained interviewers played a major role in the survey process. Interviewers’ professionalism, profound knowledge of the survey’s subject matter, excellent interviewing skills and appropriate behavior are a precondition for surveys to be successful and therefore contribute in particular to the quality of the resulting data. To prepare for the HFCS, interviewers completed comprehensive training on the content and structure of the HFCS.
In the field phase and during the personal interviews, it was possible for interviewers to consult written reference material and, if necessary, receive support from the OeNB.
3.2 General information
The number of interviewers involved in the third wave of the HFCS was 47. While the survey company decided which interviewers to involve in this complex and sensitive survey, the OeNB reserved the right to withdraw individual interviewers if they did not meet the quality criteria.
In general, the interviewers had specific experience conducting household surveys, having been involved either in past waves of the HFCS in Austria or in surveys of a similar magnitude (e.g. the OeNB Household Survey on Housing Wealth 2008, EU-SILC or SHARE). In fact, almost 90% of the interviewers in the fourth wave of the HFCS in Austria had also conducted interviews at least one of the first three waves. Payment for successfully completed interviews was calculated on the basis of the surveyed household size; a considerably lower remuneration was paid for the collection of paradata when interviews were not completed successfully. Travel expenses were also refunded. To be entitled to a refund of travel expenses for uncompleted interviews, the interviewers were required to have made at least two personal contact attempts and five contact attempts altogether.
The COVID-19 pandemic and HFCS interviewers
COVID-19 posed serious difficulties to HFCS interviewers in Austria. The challenges that had to be addressed during the fourth HFCS wave included, first and foremost, health risks that had to be handled but also challenges in interviewer training and breaks in the field period.
At the start of the field period, the Austrian government in AT pronounced the first COVID-19 restrictions. Some interviewers were trained in March 2020 and scheduled to start interviewing soon thereafter. However, the restrictions put in place made in-person interviews impossible so that the field period was eventually launched with a delay of about one year.
After this period, interviewer training was redesigned to take place online via zoom meetings. This increased flexibility in training and reduced the health risks for everybody involved. While the training content remained unchanged, the schedule was split into two meetings on two days to help trainees keep a high level of concentration. Also, trainees were able to use the break between the two sessions to work on their additional take-home exercise interviews. Particular effort was put into keeping the training interactive in the online setting. Interviewers asked questions – also those that came up during homework – throughout the training. Interviewers that had been trained already in March 2020 had a reduced training schedule of just one day to refresh their knowledge of the questionnaire of the survey. In total, four one-day-refresher training sessions and 7 complete two-day training sessions took place for wave four of the HFCS in AT.
The extraordinary circumstances reduced the readiness of interviewers to work in the HFCS. Especially experienced and elderly interviewers were more reluctant to work for the survey on account of increased health risks. As a result, the number of interviewers decreased from about 70 in wave three to 47 in wave four, which, in turn, led to a longer field period.
During the field period, maximum security measures were implemented to reduce the risk of infection for both respondents and interviewers. Interviewers were required to show proof of vaccination, recovery or a negative test (PCR test or antigen test performed by a pharmacist). Additionally, strict hygiene standards (washing hands, not touching one’s own face, etc.) were implemented in the same manner as wearing a FFP2 mask. No work-related health difficulties were reported in the HFCS.
3.3 Interviewer training
All interviewers conducting interviews in the HFCS were specially trained. The training content was developed by the OeNB in cooperation with the survey company. The development and intention of the interviewer training was before the COVID-19 pandemic. As such and based on the experience of the previous waves, we planned one day in person training courses. After the lockdowns and delay due to the COVID-19 situation in AT, we conducted the training session as digital meetings. In order to keep the concentration high and the digital workshops interactive, trainings were split into two half day gatherings. The contents remained unchanged and are explained below. Additionally, interviewer had to conduct a take home interview by themselves. Questions arising in this take home exercise could be answered in the second half of the digital training sessions. For more information on the changes due to COVID-19 see box 2 and chapter 10.
In total about 15 distinct training session were set. Four of which were repeated short trainings for interviewers who once participated before the COVID-19 delay and once again when the field period started.
3.3.1 Training unit 1
Introduction
First, a member of the OeNB HFCS team introduced the interviewers to the topic and the aims of the HFCS in Austria. This introduction also covered information about the use of data, including explanations why a central bank requires the data surveyed and how researchers use the data and communicate results to the media. Knowledge of these issues is considered to help interviewers’ motivation. The HFCS team representative also described the use of data and analytical approaches on the basis of examples and emphasized the importance of conducting interviews conscientiously and of all households in the sample taking part in the survey. Finally, the central role of interviewers in the HFCS data collection process was highlighted.
Overview of the questionnaire
Following the introduction, the participating interviewers were made familiar with the questionnaire: Its chapter structure, the definition of “household” within the meaning of the HFCS, the identification of financially knowledgeable persons (FKP), how to distinguish an FKP from a reference person, loops, the method used for recording amounts in euro (including the structure of a (euro) loop, see section 2.6.2).
3.3.2 Training unit 2
The briefing on the questionnaire started with a theoretical introduction, supported by additional information and documentation, where required. After that, the lecturer walked the workshop participants through the CAPI questionnaire using an unrealistically complex household as an example. This approach made it possible for participants to acquaint themselves with the essential elements of the questionnaire both in theory and practice. This training unit was split into the two blocks described below.
Questionnaire – theory
The first part of this training unit covered the preinterview questionnaire including the creation of a household matrix and the selection of the household’s financially knowledgeable person. In addition, the general characteristics of the household members, the questionnaire section on the household’s consumption behavior and the household’s real estate wealth and its financing were discussed. Explanations of how to treat farming households were also given ample time.
The treatment of other liabilities, private businesses and financial assets, as well as the section on inheritances and gifts were introduced. In particular, participants were walked through the range of financial assets to address possible misunderstandings, and they learned about the fundamentals of the stock and flow data in households’ balance sheets and how to record additional comments.
Lastly, information on household members’ employment status, income and retirement provisions. The training moved towards incomes at the household level and assessment questions. In particular, participants were acquainted with the reference period for income as well as the options for recording income (gross or – if the gross amount was not known – net of tax and social security contributions). The lecturers highlighted the importance of recording comments provided by the respondents.
Questionnaire – practice
Part two of this training unit covered the simulated interview. As for the previous waves. Interviewers asked the questions, and the workshop leader gave the answers. All types of questions were covered in this mock interview. This means that not all the questions could be reached but in order to understand and discuss a question not all the iterations have to be went though. So, the interview covered more than one item of one loop and one mortgage for example to explain all the details of a loop and of loans, but not for every possible type of loan three iterations had to be covered. This part of the workshop was particularly interactive. Interviewers asked ample questions that could be addressed and resolved.
3.3.3 Training unit 3
Interviewers‘ tasks, contact specifications and paradata
The key initial task for interviewers was to convince the selected households to take part in the HFCS. In this respect, the interviewers were provided with a comprehensive list of reasons in favor of participating, as well as information on data security and the contact details for people at the survey company and the OeNB who they could turn to in case of problems. In training unit 3, the interviewers were given exact, detailed specifications on how to proceed when contacting households (see section 3.4). Among other things, interviewers were required to document their attempts to contact the selected households and compile all paradata (see section 2.6.4). The lecturers highlighted in particular that accuracy in compiling information was of utmost importance and that interviewers thus contributed substantially to data quality.
Guidance on interviewer communication
In the second part of this training unit, interviewers received guidance on how to communicate during interviews, for instance with regard to providing explanations or querying answers. In addition, they were trained not to express their personal opinions if respondents asked them questions. Likewise, interviewers learned to repeat and explain questions in the most neutral way possible (using the glossary, if necessary). Comments received from the previous waves of the HFCS in Austria helped to highlight typical interview situations.
3.3.4 Training unit 4
Documents and other material
In training unit 4, lecturers and interviewers once again went through all the documentation and material made available to the interviewers, which had been used in training units 1 to 3 (see section 3.5). This provided the participating interviewers with another opportunity to ask questions on all aspects of the HFCS.
Organizational information
Finally, interviewers were provided with organizational information, such as the addresses of households that they had to contact. Also, they received information about the incentives for households that completed an interview and interviewers’ remuneration.
3.4 Contact strategies and specifications
The process of establishing contact with the households in the HFCS sample took place according to detailed specifications provided by the OeNB. One or two weeks prior to the first contact attempt by the interviewer, the survey company sent the households selected in the sample an individualized advance letter signed by the OeNB governor as well as an information leaflet. This prior notification enabled respondents to prepare in advance for interviewer visits. By consulting the information material provided, as well as the HFCS website (www.hfcs.at/en), households were able to familiarize themselves with the survey topic, consider whether they wanted to take part and, if so, prepare useful documents (such as bank account statements, etc.).
With the advance letters having been sent, interviewers had to make up to five contact attempts with each household. At least two of these contact attempts were to be made personally (by visiting the household’s address in person and trying to establish contact); at least one attempt was to be made at the weekend and another outside normal working hours (9:00 a.m. to 5:00 p.m.). All contact attempts had to be spread out over a period no shorter than three weeks. This approach was necessary in order to rule out distortions as a result of selective participation (e.g. many single-person households cannot be reached during the day and can only be contacted in the evening or at the weekend).
The interviewers were required to document each contact attempt. During at least one of the personal contact attempts, information on the exterior and the location of the property (see also section 2.6.4.2 on paradata) was recorded, even if no successful interview took place with the household in question.
The interviewers were instructed to carry with them all the necessary material (notebook computer, information material, participation incentives, etc.) during each personal contact attempt. This allowed them to react appropriately to different situations, e.g. if a household wanted to participate in the survey immediately, if they requested time to consider or wanted to make an appointment, or if they declined to be interviewed. If requested, interviewers also had to offer interview appointments at the weekend or in the evening as well as the option of meeting respondents outside their main residence (e.g. at the respondent’s office).
3.5 Documents and other supporting material
In addition to the specific training the interviewers received upfront, interviewers were provided with the following information and supporting material to be used during the interviews, where appropriate:
3.5.1 Letter by the OeNB governor to households
Shortly before the first personal contact attempt, all households received an individualized letter and an information leaflet (see online appendix) explaining what the survey was about, what objective it served, who to contact in case of questions, how the collected data would be used and that all data would be treated confidentially. Interviewers took this letter, which was signed by the OeNB governor, with them whenever they contacted households.
3.5.2 Incentives
As participation in the survey was voluntary, monetary incentives were used to increase households’ willingness to take part in the HFCS. Each household that successfully completed an interview received a SODEXO voucher with a value of EUR 10. The interviewers handed over the voucher to the respondents directly upon completion of the questionnaire.
3.5.3 Scientific study
The interviewers were instructed to have with them a copy of the study “Vermögen der privaten Haushalte in Österreich - Gemeinsamkeiten und Unterschiede” by Fessler and Schürz (2019) (see online appendix) during each contact attempt. This study is based on data taken from the third HFCS wave in Austria and gives an example of how survey data are used in a statistical context. Respondents thus had the opportunity to inform themselves how the information they provided was going to be used, which helped to increase confidence in the survey. Interviewer feedback after the first and second waves showed that initially reluctant respondents were more likely to participate in the survey after having received this information.
3.5.4 Showcards
To answer certain questions of the survey, respondents had to choose from a list of answers presented by the interviewer on showcards (see online appendix), which covered the following topics:
- Euro amount ranges A
- Euro amount ranges B
- Euro amount ranges C
- Questions for capturing the demographics of household members
- Relation to the reference person
- Housing costs including service charges
- Expenditure for travel and holidays
- Types of income
- Unexpected windfall gain – lottery
- Rent including service charges
- Five value changes
- Reasons for not renting out
- Economic sectors
- Types of life insurance contracts
- Types of mutual funds
- Banks
- Investment behavior
- Type of inheritance/gift
- Percentiles of the distribution
- Employment status I and II
- Employment classifications
- List of possible probabilities
- Health
- Functions
- Mobility I-III
- Taxation I-II
The questions that required interviewers to use a showcard were specifically marked in the questionnaire. The digital version of the questionnaire also contained references at places where the use of a showcard was required.
3.5.5 Contact form
Interviewers were required to document all information on contact attempts initially by hand on the contact form, which, upon conclusion of a household interview, was digitized with the same software that was used for the questionnaire.
Aside from the household’s identification number, the documentation comprised the date, time, type (e.g. personal or by telephone) and outcome (e.g. complete interview or ineligible address) of a contact attempt. Personal identification information (such as name, address or telephone number) was not part of the data and was not forwarded to the OeNB.
3.5.6 Interviewer manual
The interviewer manual distributed to all interviewers included all necessary information on the HFCS (e.g. the definition of a household) and served in particular as a reference point for the interviewers. In addition to an introduction to the questionnaire, its special features (see chapter 2) and all related documents, the manual also outlined the tasks of the interviewer. Furthermore, it provided guidance on how to locate households and convince them to take part in the HFCS. It also described the requirements for interviewer behavior and their interaction with the people contacted. Other important features were detailed contact specifications and answers to questions frequently asked during the first contact attempt. The manual additionally comprised essential legal texts on data protection that the interviewers had to be familiar with. Furthermore, the manual listed the contact data of the survey company (including a hotline telephone number) and the telephone number of the OeNB hotline in case the interviewers had any questions. The interviewer manual provides an extensive overview of the preparations for the HFCS and can therefore be found in the online appendix.
3.5.7 Glossary
Working for the HFCS required a basic understanding of a broad range of different financial instruments, investment opportunities and types of income, as well as the acquisition of real assets. Interviewers had at their disposal an alphabetical glossary (see online appendix) that provided explanations of technical terms. The glossary consisted of some 20 pages of explanations for all terms of key importance to the HFCS, such as mutual fund or household (according to the HFCS definition).
Already at the training stage, the interviewers were instructed to use this glossary to acquire relevant knowledge which they would be able to fall back on during interviews. By virtue of its references to the variables recorded in the survey, the glossary is also of importance when analyzing the collected data, as it explains the technical terms contained in the questionnaire.
3.6 Monitoring
To uphold the high quality standards of the HFCS, both the survey company and the OeNB monitored interviewer performance. The interviewers’ direct contact person and superior was a regional area manager who reported to field management at the central office in Vienna. The survey company monitored in particular the correct execution of the interviews by checking roughly one in every six interviews via telephone from Vienna. During these calls, the contacted respondents were asked to provide data on the composition of their household, the conduct and duration of the interview and the topics covered.
Furthermore, the data from completed household interviews were forwarded to the OeNB promptly, in 19 batches (including answers to queries) during the field phase, to enable OeNB staff experts to monitor interviewer performance in a timely manner (see section 4.4.1). In addition, the following interviewer performance indicators were examined: item nonresponse (both broken down by real assets and financial assets and in aggregate form for the entire interview), the relative duration 30 of an interview, the number of questions asked, the number of households surveyed successfully and unsuccessfully, and the resulting unit nonresponse, as well as the number and quality of interviewers’ comments. The specific comments to be made by the interviewers upon completion of each household interview were also examined.
The OeNB’s goal in this phase was to quickly identify and resolve difficulties with prompt analysis. Monitoring interviewers gave the OeNB a chance to address individual interviewers’ difficulties concerning certain topics or aspects by providing targeted guidance. The OeNB also had the possibility to withdraw with immediate effect interviewers from the survey that did not meet the quality requirements.
Additionally, and due to the COVID-19 related complications a high-level steering committee meeting with participants from the OeNB and the survey agency was held regularly (in general every 2-4 weeks) to address potential problems directly.
3.7 Problems relating to interviewers
Shortcomings identified during the monitoring process were pointed out to the interviewers. For instance, if interviewers had difficulties entering the correct number of zeros for (large) numbers – a problem that was relatively easy to identify with the help of the numerous plausibility checks – they were asked to pay particular attention in subsequent interviews. The next batch of data was then examined for the persistence of these problems. In the case of some of the interviewers, monitoring also helped reduce the item nonresponse rate of the households they interviewed.
One interviewer had to be withdrawn entirely from the survey during the fieldwork due to flaws in conducting the interviews in personal interaction.
3.8 Survey of interviewers
The HFCS in Austria also entailed the systematic collection of information on the interviewers involved. The information provided by the interviewers on a voluntary basis included socio-economic information (age, gender, education, region), employment status including work experience as an interviewer, personality-related indicators and the interviewers’ financial situation. Interviewers also had the opportunity to document their experience working for the HFCS in Austria. This information is particularly relevant for the nonresponse adjustment of the complex survey weights (see chapter 7). The participation rate in the survey of interviewers was close to 100%.
3.9 Online appendix
The online appendix includes the letter by the OeNB governor to the households, the information leaflet, the showcards, the interviewer manual, the alphabetical glossary, as well as the exemplary study by Fessler and Schürz (2019).
30 During each interview, time logs were recorded at different points in the questionnaire.
4 Consistency checks and editing
4.1 Introduction
Here, data editing is understood as the amendment of electronically recorded observations collected through individual interviews, so as to correct any errors or logical inconsistencies that may have occurred during the survey, as well as the aggregation of information that was recorded via auxiliary variables, typically with a view to keeping the questionnaire as clear and user-friendly as possible. The editing process is thus essential for improving the quality and consistency of the datasets. 31
The raw data collected in surveys do not always contain the information that the questions were intended to elicit. As respondents in the HFCS occasionally either experienced difficulties in understanding the questions asked or had insufficient knowledge on the substance of the survey, they may sometimes have provided inaccurate information. At the same time, data entry errors may have occurred (see also chapter 3), or data may have been processed inaccurately. In the HFCS, great importance was attached to minimizing such errors up front through the structure and wording of the questionnaire; through checks and, if needed, adaptations during the field phase; and ex post through data editing, as will be outlined in this chapter. The COVID-19 pandemic had very little influence on the way editing was conducted in the HFCS.
This chapter provides insights into the consistency analyses and edits performed for the fourth HFCS wave in Austria, starting with information on the number of edits performed (section 4.2) and followed by explanations on the consistency checks conducted during and after the interviews (sections 4.3 and 4.4). Furthermore, we outline the flags used to highlight ex post adjustments of the observations recorded (section 4.5), provide a detailed account of ex post editing (section 4.6) and describe formatting and editing after multiple imputations (section 4.7). The chapter ends with concluding remarks (section 4.8).
4.2 Number and type of edits
All in all, around 59,200 of the more than 1.2 million observations collected in the fourth HFCS wave were edited, i.e. about 4.9% of all data points were amended (see table 2). This figure is comparable to the last waves of the HFCS in AT and hence expected.
The row “Total” indicates the full range of edits that were implemented. Edits that resulted in actual changes to the collected values, i.e. real changes, were limited to some 6,000 observations (see row “Edits based on expert judgment and follow-up phone calls”), which corresponds to a change rate of 0.5% (slight reduction from previous waves). These changes involved primarily inconsistent values that were corrected based on subsequent queries and/or other information or were deleted and replaced through imputation. Just a little under 30% of all amendments (see row “Edits based on other survey information (e.g. verbatim records)”, i.e. a little over 16,000 observations, could be derived from the verbatim records and the use of a flexible questionnaire design for certain questions (e.g. questions about life insurance policies or total annual net income). All in all, around 1.3% of all observations were amended through this type of editing, making them the most prominent form. This indicates how important it is to allow for verbatim records on a large scale. Questionnaires as detailed as the one used for the HFCS in Austria must be user-friendly to ensure the participation of respondents and high-quality standards. Various data – e.g. data on the occupation (ISCO code) of employed individuals – are only collected as verbatim responses to minimize the effort required from respondents. The flexibility for certain questions was achieved by giving respondents more options on how to respond to the question. For some income questions that asked for the gross value, net values could be provided instead if respondents did not know the gross value. The net value was then converted to its gross value during editing.
Total
observations2 |
Number
of Edits |
Share of
edited observations in total observations |
|
---|---|---|---|
% | |||
Total1 | 1,215,745 | 59,177 | 4.9 |
Edits based on expert judgement
and follow-up phone calls |
6,112 | 0.5 | |
Edits based on other survey
information (e.g. verbatim records) |
16,155 | 1.3 | |
Deleted observations | 36,910 | 3.0 | |
Source: HFCS Austria 2021, OeNB | |||
1 The line “Total” contains all edits. | |||
2 Includes only observable information. Filter missings are excluded. |
In around 37,000 cases (i.e. around 3% of observations), observations were set to missing (“.”). 32 Edits of this type were made for different reasons, but mostly in the process of data cleanup (see section 4.6.2.1). Further, some items had been entered in a wrong position in the questionnaire. When transferring such information to the right position, the original entry had to be deleted, i.e. set to missing. In some cases, complete sets of entries were set to missing (“.”), among other things because the corresponding head variable had been edited. Below is an example 33 that demonstrates the editing process resulting from some of the reasons just outlined.
A case in point would be the duplicate recording of income from pensions, first under “Received employee income” and then under “Received income from public pensions.” Here, the head variable “Received employee income” (PG0100) was changed to “No” and the value recorded for this variable was deleted because the respective income figure had been adequately recorded under the pension income variable (PG0300 and PG0310).
4.3 Consistency checks during interviews
The HFCS is based on Computer-Assisted Personal Interviewing (CAPI). CAPI has a number of advantages over the use of paper-based questionnaires or phone-based interviews. Interviewers walk the respondents through the questionnaire on screen, using a laptop on which the survey software is installed. The information collected is checked for integrity and consistency as it is being entered. Any questions of clarification that the respondents may have raised can be resolved immediately either by the interviewer or with the aid of the supporting documentation, and thus errors can be prevented during data entry.
However, consistency checks during an interview are subject to limitations in terms of scope. An excessive number of consistency checks during an interview would make it exceedingly long and thus wear out the respondents and in turn decrease the standard of the data collected and/or might even cause respondents to break off an interview.
Moreover, restrictions arise from the fact that all information which should be used for the consistency checks must already be available. These limitations do not apply to simple consistency checks linked to specific predefined benchmarks. Whenever certain limits are exceeded or undercut, pop-up warnings appear that allow the entry to be checked immediately. However, the information necessary for more complex consistency checks often does not become available until answers are received in later stages of the interview.
The digital version of the questionnaire used for the HFCS provided for close to 250 consistency checks, 34 typically in the form of “soft” checks that highlight potential issues but do not prevent the interview from proceeding if the interviewer decides to continue. Whenever a test criterion was violated, a warning message popped up.
For example, if a household with a disposable net monthly income of EUR 1,000 (enough to cover the relevant household’s average consumption) indicated, for instance, that – in addition to consumption expenses totaling EUR 900 – it had typically supported nonhousehold members with EUR 200 per month in the past year, the following message popped up:
“The sum of total consumption expenditure and regular remittances to nonhousehold members exceeds the household’s total net income. Are the figures correct? If yes, please confirm the figure(s), or amend them as necessary.”
The initial figures may in fact be confirmed in the cross-check, possible reasons being that the figures reported referred to different time periods, that the remittances were financed by the sale of assets, or that the household’s income had since dropped as a result of one or more members losing their job. At any rate, inconsistencies would prompt the respondents to confirm or correct the total household income, remittances and consumption expenditure.
Other consistency checks programmed into the digital version of the HFCS questionnaire in Austria would allow the survey to proceed only once an answer identified as incorrect or inconsistent had been amended. However, these so-called “hard” checks were only used in cases where a particular answer could definitely be ruled out.
If individuals stated, for instance, that they had lived in Austria for 40 years but gave their age as 30, the following error message would appear:
“The respondent has been living in Austria for longer than his/her age allows. This is not possible. Please correct the information as necessary.”
Thus, proceeding with the CAPI questionnaire required changing the age given to at least 40 years, or reducing the period of residence in Austria to 30 years or less (or changing both variables).
4.4 Postinterview consistency checks
4.4.1 Expert data analysis
During the field phase of the fourth HFCS wave in Austria, household data deemed to be final by the survey company were forwarded to the OeNB in 19 batches. This means that the OeNB received household data roughly every two weeks during fieldwork. All batches of data were subjected promptly to expert data analysis. 35 On the one hand, these analyses served to improve the consistency of the data recorded for each household. On the other hand, they were used to check the survey software (in particular, to review the programming of the questionnaire) and the mechanisms used by the survey company to process the data.
The datasets for households actually interviewed and those for households that refused to participate were analyzed on a case-by-case basis. This made it possible to assess and optimize the success of interviewers in convincing households to participate. Hence, it was almost impossible for interviewers to cherry-pick “easy” or more readily accessible households, which would probably have created a bias toward certain households (e.g. those where housewives or pensioners live), thus distorting the data. The interviewers knew that the list of addresses was limited to the 6,300 households of the gross sample (see also chapter 6). This ensured that interviewers would not select the less difficult households and then move on to a new set of addresses. The incentive for interviewers to use the strictly limited address material as efficiently as possible was supported with a performance-related payment system and the relatively high effort that was required from interviewers to participate in the survey in the first place. Furthermore, area managers were advised to avoid allocating new households to interviewers before they had made sufficient effort to survey the households they were assigned at the time. The decision to exclude subsequent draws (substitute households) is among the key criteria for a successful survey, and is moreover essential for ensuring the representativeness of the sample (see e.g. Vehovar, 1999).
Initial analysis of the information on individual households during fieldwork covered the data provided on household structure, financial and real assets, debt and income, whether households had come to ownership of property by inheritance or gift, comments made by households or remarks made by interviewers, as well as the date, time and duration of the interviews. This information enabled a quick initial assessment of the interview’s quality. The microdata on every single household were checked for consistency regarding their content and reviewed by at least two analysts from the OeNB HFCS team. Issues requiring clarification were discussed by the whole team, which then decided on the way forward.
In addition, this stage of the process was also used to assess the interviewers (see also chapter 3) and to address errors or misunderstandings. The shortcomings identified in this process were often minor in their nature, but one interviewer whose results were not up to the required standards (e.g. regarding nonresponse) was excluded.
4.4.2 Follow-up queries
If individual data analysis identified a problem but not how it could be corrected, households were contacted again by the survey company to clarify uncertainties and ensure that data were recorded correctly. Given the timely submission of interview results to the OeNB (around every two weeks) and the subsequent checks by the HFCS team, the survey company was able to address any queries to the surveyed households promptly. A typical case of a data problem that was easy to spot and did not require queries was rewriting a negative sight account balance as a (positive) liability (overdrawn account) while setting the value of sight accounts to zero (see also section 4.6). This was simply a matter of adhering to the recording conventions as to where such liabilities should be recorded. Decisions on queries were always guided by the principle that any ex post data editing and the burden on participating households should be kept to a minimum. Many unusual results (e.g. particularly high asset values) were confirmed or corrected in the course of queries. All in all, follow-up queries (by phone) were necessary to confirm specific details of some 230 households. This is a similar percentage of households than in the third wave.
4.4.3 Investigation of outliers
The checks on a case-by-case basis were aimed in particular at recognizing and processing outliers (exceptionally high or low values), which were recorded above all for wealth variables, the size the household income or the size of the dwelling. Any outliers that were not removed from the dataset were generally not the result of interview errors but actually confirmed by the follow-up queries. Our recommendation for future studies based on HFCS data is therefore not to generally exclude outliers from the analysis, but rather to incorporate them in computations through the use of suitable methods.
4.4.4 Technical review of filtering and consistency
During the field phase, the consistency checks programmed into the digital version of the questionnaire and the rounds of expert data analysis were complemented with detailed automated consistency checks of the transmitted data.
All hard checks were applied repeatedly to the observations, for instance, in order to assess whether respondents might have given answers that precluded moving on to subsequent questions, thus requiring changes. The technical review also covered the questionnaire’s complete set of filters to prevent programming errors leading to extensive and costly follow-up queries. Comprehensive tests of the questionnaire’s programming prior to the start of fieldwork as well as a pilot survey of 50 households made it possible to largely exclude programming errors from the outset. Minor difficulties did occur, such as incorrect filtering implemented for the summary question for uncollateralized loans ((a)hc1200 (see section 2.5.2.3). This deficiency was identified and corrected in a timely manner. 36 These filter checks also ensured that the coding of variables was consistent throughout the questionnaire. 37
4.5 Flags
All edits (and imputations – see chapter 5) were documented with flag variables, which indicate how the individual HFCS observations were established (see table 3 for a list of the flags used to classify the observations). The flags used can be divided into five groups: flags used to identify recorded information (group I), incomplete or inadequate observations (group II), observations that were not recorded or later deleted for anonymization (group III), ex post edits (group IV) and imputed observations of different types (group V). To comply with requirements set out for the common (international) datasets covering all participating countries, some flags were aggregated for the international datasets (section 4.7).
Group I | 0 | “Not applicable (i.e. skipped due to routing)” |
---|---|---|
1 | “Recorded as collected, complete observation” | |
2 | “Recorded as collected, but moved in iteration” | |
12 |
“Recorded as found in other source, not collected
in survey” |
|
Group II | 1050 | “Not imputed, originally: Don’t know” |
1051 | “Not imputed, originally: No answer” | |
1052 |
“Not imputed, originally not collected due to
missing answer to a higher order question” |
|
1053 | “Not imputed, originally collected from a range” | |
1054 | “Not imputed, collected value deleted” | |
1055 |
“Not imputed, value not collected due to a CAPI
error” |
|
1056 |
“Not imputed, set to missing due to incorrect
answer to a higher-order question” |
|
1057 |
“Not imputed, collected value deleted but range
information available” |
|
1058 | “Not imputed, set to missing due to red button” | |
1075 | “Not imputed, specific answer code” | |
Group III | 2050 |
“Missing, set to missing for anonymization
purposes” |
2051 |
“Missing, set to missing because data were not
collected” |
|
Group IV | 3050 |
“Edited, set to modified value as considered
incorrect or unreliable” |
3051 |
“Edited, adjusted on the basis of other
information obtained in the (national) survey” |
|
3052 | “Edited, adjusted on the basis of verbatim records” | |
3053 | “Edited, set to missing (.)” | |
3075 |
“Edited, set on the basis of follow-up with
household” |
|
3076 |
“Edited, set on the basis of follow-up with
interviewer” |
|
Group V | 4050 | “Imputed, originally: Don’t know” |
4051 | “Imputed, originally: No answer” | |
4052 |
“Imputed, originally not collected due to missing
answer to a higher order question” |
|
4053 | “Imputed, originally collected from a range” | |
4054 | “Imputed, collected value deleted” | |
4055 | “Imputed, value not collected due to a CAPI error” | |
4056 |
“Imputed, originally value not recorded due to
incorrect answer to a higher-order question” |
|
4057 |
“Imputed, collected value deleted but range
information available” |
|
4058 | “Imputed, set to missing due to red button” | |
Source: HFCS Austria 2021, OeNB |
Group I
The flags allocated to group I were used to identify recorded information. Specifically, all observations recorded during the interview were flagged “1” while all observations set to missing (“.”) were flagged “0.” Information recorded in loops (see section 4.6.2.4) was paired with a flag of “2” if it had to be moved in the iteration of a loop. In other words, flag “2” observations were retained in the dataset exactly as they were recorded, but assigned a new iteration number. Observations flagged with “12” are not collected but taken from external data/information.
Group II
Recorded observations that were incomplete or inadequate were assigned group II flags. Such observations include cases where the respondent was unable or refused to answer the question (entries of “Don’t know” or “No answer”), or proved unable to give a specific figure and provided a range instead. Included here are also observations that were not available on account of edits of either the variable in question or a head variable (flags “1054” and “1056”). If the edited observation was available as a range, it was assigned a flag of “1057.” If an observation was not available due to a CAPI error, it was given a flag of “1055.” Observations that were not available because questions in a loop were skipped were flagged “1058” and special missing values were flagged “1075.” In these cases, alternative information was collected.
For example, if gross income was unknown, but information on net income was provided, the variable for gross income was flagged “1075.”
Observations with group II flags were not imputed (see chapter 5).
Group III
Group III flags identify observations and/or variables that were not recorded or that were recorded but later deleted from the datasets on account of anonymization requirements.
Group IV
Flags from group IV indicate an ex post edit of an entry. The following types of ex post edits can be distinguished: edits as a result of logical inconsistencies (flag “3050”); calculations that were adjusted using other information obtained in the survey, for instance with regard to life insurance contracts (see section 4.6.2.9 for details; flag “3051”); coding that was subsequently adjusted on the basis of verbatim records (see section 4.6.2.3; flag “3052”); edits made to delete a value and set the observation to missing, as in the case of duplicate entries (flag “3053”); and queries put to households (flag “3075”) and interviewers (flag “3076”).
Group V
Flags from group V mirror those from group II. If it was possible to impute missing values, the first digit of the flag was changed to “4.” For instance, if respondents had provided a range, which was subsequently imputed, rather than a specific figure, this observation was flagged with “4053” after multiple imputations. This ensures that all information can be tracked even after the imputations.
Chart 3 indicates how questions were typically structured in the HFCS questionnaire. Let us take employee income to give an example for the structure of question blocks 38 and the use of flags.
For example, the head variable for recording employee income serves to ascertain whether or not a household has an income of this kind. If this yes/no question was answered with “Yes” the amount was recorded and the interview continued with the next head variable in the questionnaire – in this case, the question on self-employment income. If a household had no income of this kind, or if the respondent failed to provide the necessary information (i.e. responded with “Don’t know” or “No answer”), the interview continued with the question on self-employment income (the next head variable). Depending on which answers were given, all the observations recorded were initially flagged “1” or “0.” If the response to a subsequent question (e.g. on employment) revealed that a “No” given the question on employee income was in fact incorrect, the initial response was corrected and flagged “3050” (Edited, set to modified value as considered incorrect or unreliable) and the corresponding data entry field for the value was flagged for imputation. Following imputation, the value was then reflagged “4056” (Imputed, originally value not recorded due to incorrect answer to a higher-order question).
Or, if the question on a household member’s highest education qualification (variable (A)PA0200) was answered by selecting the category “Other qualification” and if that answer was subsequently found to match one of the predefined categories, the observation was flagged “3052” (Edited, adjusted on the basis of the verbatim records) in the flag variable of the individual dataset.
This flag system allows the origin of observations in the HFCS to be tracked. In the Austrian questionnaire the section on pension policies was structured differently to reflect national circumstances and to be consistent with previous waves. This section was adapted ex post to fit the requirements for the international data.
To allow for the merging of datasets, no flags were used to encode the variables for identifying households and individuals, nor were the country codes and the imputation’s iteration number flagged. The flags described here provide for a more detailed breakdown by category than those incorporated into the international HFCS dataset that can be obtained from the ECB. For reasons of international consistency, the flags were aggregated prior to being submitted to the ECB (see section 4.7).
4.6 Ex post editing
4.6.1 Case-by-case review
A detailed case-by-case review of all households allowed inconsistencies to be identified and eliminated through follow-up queries and ex post editing. Specifically, respondents’ answers were checked for plausibility against known benchmarks, including descriptive statistics (e.g. on average income) compiled on the basis of completed HFCS interviews and external sources of data. Moreover, the review process heavily relied on auxiliary variables that recorded information in aggregated form and/or in a variety of other ways.
Both interviewers that produced nonstandard results (see chapter 3) and follow-up queries made by the survey company were reviewed in particular detail.
Expert assessments were generally used to resolve the following issues through ex post edits:
- Double entries: Cases where an inheritance, for instance, was recorded under both “Household main residence inherited” and in the “Inheritances received” chapter, or where the same income was recorded in two different income categories, had to be corrected.
- Missing or additional “zeros”: In a few cases interviewers added or left out a zero by accident when recording amounts; this had to be amended accordingly.
- Implausible values: Values that remained implausible after follow-up queries had to be set to missing and were subsequently imputed.
- Often, information could be gained from the many additional comments made by respondents. If the additional comments made it necessary to change the collected information, changes were made respectively.
- Data entry errors by interviewers: For instance, a wrongly recorded date of a contact attempt by an interviewer was easy to correct because of the potential other contact attempts and the immediate data transmission of completed households.
- Also, all data obtained through follow-up queries were used in this step to correct individual observations in the dataset where necessary.
Such edits related to the whole questionnaire, not just to individual variables. Amendments to recorded data were kept to a minimum and – wherever follow-up queries and/or the use of auxiliary variables (such as verbatim records) failed to provide further information – inconsistent observations were set to missing and flagged for imputation. Inconsistent or implausible observations were processed with great care and only deleted if there was absolutely no doubt about the inconsistency.
4.6.2 Structural editing
4.6.2.1 Data cleanup
When answering the HFCS questions, respondents occasionally gave inaccurate answers but subsequently corrected those answers as they proceeded through the questionnaire. These corrections also necessitated a change in the sequence of questions following the initial question because the new answers called for different filter settings. The “wrong” initial path through the questionnaire, however, remained in place for transparency reasons and had to be subsequently corrected.
4.6.2.2 Currency conversion
Respondents could specify any amount in various currencies (see chapter 2). The edits set out below relate both to specific amounts and ranges indicated by the respondents (predefined ranges had to be specified in euro).
Typically, amounts were given either in euro or in Austrian schillings. In particular, the value of the main residence (both the purchase price and the current value) was often given in Austrian schillings. All Austrian schilling amounts were subsequently converted into euro at the irrevocably fixed conversion rate of EUR 1 = ATS 13.7603. 39 Some amounts were also given in Deutsche mark (DEM). These amounts were also converted at the irrevocably fixed conversion rate, namely EUR 1 = DEM 1.95583.9
In a few cases amounts were also given in Swiss francs and pound sterling. For the amounts outstanding on a loan and the current value of a property the average exchange rate for the month in which the interview took place were used. 40 For income in a foreign currency, the average exchange rate for the year 2020 was applied. 41
There was one instance in which the given value originated in a time where the euro had not yet been introduced and this value given in foreign currency that were not Austrian schillings. This case concerned the the value of an inheritance in 1992 given in US Dollar. For this, the amount was first converted into Austrian schilling on the basis of the exchange rate applicable at the time and then from Austrian schillings into euro according to the fixed ATS/EUR exchange rate.
4.6.2.3 Verbatim records
For many questions, respondents were given the option of choosing the category “Other” and providing a verbatim response, mainly with a view to making the questionnaire as user-friendly as possible. Thus, a verbatim description could be recorded if it was not possible to assign a respondent’s answer to a predefined category during the interview. The verbatim entries were used to assign answers to specific categories ex post, which proved to be possible in the majority of cases. Wherever this could not be done, the initial categorization of the observation as “Other” was retained. Some data, such as data on the occupation (ISCO coding in the variable PE0300) of an employed individual or the main activity (NACE coding in the variable PE0400) of the company where the individual is employed, were collected entirely in verbatim form and coded ex post. All observations subjected to ex post edits on the basis of verbatim records were flagged “3052” (see section 4.5 for details on the flags).
4.6.2.4 Navigation of loops
As outlined in detail in section 2.6.1, some pieces of information were recorded in loops, which required interviewers to run through an identical set of questions for each individual item from a group of items owned by the household. Information on the following items was collected using loops:
- mortgages on the main residence
- real estate assets apart from the main residence
- mortgages secured against these other properties
- unsecured loans from family and friends
- other unsecured loans
- businesses owned by the household
- life insurance policies
- inheritances and gifts
Below we provide an explanation of the edits which were required because of loop questioning.
Recording sequence
The sequence of items that were covered in loops followed a predefined order. With regard to mortgages secured against the main residence, for instance, the first iteration of questions related to the mortgage with the highest amount outstanding, the second iteration of the loop to the mortgage with the second-highest outstanding amount and the third iteration to the third-highest loan amount outstanding. Some respondents did not always adhere to this sequence. Such cases were recoded in the editing process – with the exception of the loop questions on inheritances, for which no recoding was carried out because respondents were prompted to discuss the inheritances received in descending order of relevance for the household’s current wealth situation. Moreover, they were instructed to indicate amounts as transferred rather than current amounts. After all, certain inheritances could have gained (or lost) more in value than others since the inheritance date; or inherited residential property might since have been passed on to children, causing it to be irrelevant for the household’s wealth situation at the time of the interview.
Every variable within a loop that was replaced with observations recorded for the same variable in another iteration was flagged “2” (see section 4.5). Wherever a variable set to missing in one iteration was replaced with the same variable set to missing in another iteration, it was flagged “0” (Not applicable (filtered out)).
Skipping questions
In order to avoid breaking off an interview in mid-loop, respondents were allowed to skip parts of loop questions and to proceed directly to the summary questions, where either the residual sum total of the not yet recorded loans and/or businesses (more than three loans or businesses) or the sum total of all loans and/or businesses was recorded. If questions within the loop for inheritances and gifts were skipped, information on the sum total of all inheritances was always requested in the summary question. As the summary questions from all sections of the dataset to be sent to the ECB were supposed to cover only any items that went beyond the first three itemized loans, real estate assets and private businesses, the relevant summary responses had to be edited accordingly. For ease of reference, examples of such edits are described below on the basis of the section of the questionnaire dealing with other unsecured loans (see section 2.5).
In the 6 cases in which a household had taken out only one unsecured loan and had skipped questions within a loop, the type of edit depended on whether the respondent had (1) indicated the outstanding amount only in response to the summary question; or (2) both when going through the first loop of questions and in answering the summary question; or (3) neither during the first loop of questions nor in answer to the summary question. If the respondent had indicated the outstanding amount only in answer to the summary question (variant 1), this amount was entered as the answer to the appropriate question (in the first loop) and the entry under the summary question was set to missing. If the respondent had indicated identical amounts in answering both the loop and the summary question (variant 2), the latter was set to missing since it was a duplicate entry. 42 Where no amount was given at all, neither within the loop nor in the summary (variant 3), only the summary question was set to missing.
In cases where a household had taken out two unsecured loans and had skipped questions within a loop, 43 the type of edit depended on whether the respondent had (1) specified the value of the highest outstanding loan and indicated an aggregate amount in response to the summary question; or (2) indicated outstanding amounts in response to both question loops and the summary question; or (3) specified an amount only in the answer to the summary question; or (4) given no amounts at all, neither in the answers to the loop item questions nor in the answer to the summary question.
If variant 1 was the case, the amount outstanding for the lower of the two loans was taken to be the difference between the amount given in the answer to the summary question and that given in the first loop. This, however, was only done if the sum total of the two outstanding loans exceeded the amount outstanding from the first loan. If it was lower, it was assumed that the amount given in the answer to the summary question was not the sum total of the two outstanding loans, but rather the amount outstanding for the second loan. In both instances, the summary question was subsequently set to missing. If variant 2 was the case, the amount given in response to the summary question was set to missing. If only the sum total of the two outstanding loans was given (variant 3), it was used as the upper bound for both the first and the second loan for the imputation model. This was the case for one household that held two other unsecured loans and had skipped some loop questions. If no amounts were given at all, neither in response to the loop questions for each of the two outstanding loans, nor in answer to the summary question (variant 4), the summary question was set to missing.
The editing procedure followed in cases with three loans and skipped loop questions prior to the recording of the individual amounts outstanding, was similar to that used for two loans when loop questions were skipped. 44
All edits were again flagged correspondingly. The example layed out above covers all potential variants. Not all these variants have to be seen in every wave of the HFCS in AT.
Summary questions
Every loop of questions ended with summary questions (see chart 2 in chapter 2). The variables for these questions exclusively contained information on any additional items above three (in some instances five iterations) per household. As indicated in chart 2, the summary questions were ultimately also put to all respondents who had refused to indicate the number of a given item in the household. In such cases of nonresponse, the information provided here was used for multiple imputations (chapter 5) and deleted from the dataset ex post.
4.6.2.5 Sight account balances and overdrafts
A few households misreported a negative balance on their household sight account as a negative value of sight accounts (HD1110). For this, however, a separate variable was available. In this area there were also occasional duplicate entries, as well as misplaced entries, that subsequently had to be edited.
4.6.2.6 Rent variables
The HFCS questionnaire included questions on the amount of housing rent paid both excluding and including utilities. In the case of some households, the given rent excluding utilities was higher than, or equal to, rent including such costs, which is logically impossible as housing cannot be “run” free of charge. Some of these households entered just the utility costs under the item “Rent including utilities.” In the course of editing, these were added to the amount entered under “Rent excluding utilities” to obtain the “Rent including utilities.” In the case of other households, the “Rent including utilities” was set to missing and flagged for imputation, with the “Rent excluding utilities” serving as the lower bound to the “Rent including utilities.”
In addition, the item “Rent including utilities” was set as the upper bound for the variable “Rent excluding utilities” and used for imputations whenever the answer to the latter was not an amount (i.e. read “Don’t know,” “No answer” or “Rent excluding utilities unknown”) (see also section 5.4.6 on the use of bounds in the imputations).
4.6.2.7 Agricultural businesses
As defined in the HFCS, farmers are owners of an agricultural business. Separating the asset components of households that own an agricultural business sometimes posed a problem to respondents, in particular with regard to their main residence and the investments in their business. Such cases, therefore, had to be analyzed separately. In this context, the extra questions and guidance added to the questionnaire for the second wave and kept for the following waves (see also section 2.6.3) proved very helpful during the various steps of data processing.
Very few farmers did not report their agricultural business as an investment in self-employment businesses. For these households, data on investments in a self-employment business had to be imputed. The NACE code for such businesses was set to that for “agricultural businesses,” and at least the individual who stated that he/she worked as a farmer was deemed to be employed in this agricultural business. The legal form of the respective business was edited to read “sole proprietorship.” The additional guidance implemented for the second wave was again used in the fourth and continued to keep the number of such cases low.
For all farmers, additional auxiliary variables were created for the combined value of the main residence and the agricultural business (business assets) as well as for the main residence’s share in this amount. For households that were not able to separate their assets and specify the share themselves, information on the total value and on the main residence’s share was used. For households that had specified both the value of their main residence and that of their private business, as required, the combined value and the share of the main residence was calculated. If information was partially missing, it was flagged for imputation (see section 5.3).
The category of agricultural businesses was subjected to case-by-case reviews. Particularly complex cases were clarified through follow-up queries and corrected where necessary.
4.6.2.8 Individual variables for investments in self-employment businesses
The variables for household members employed in a business owned by the household were edited as follows:
To be able to cover even unusually large households, variables were created for up to 18 individuals per household for the CAPI version of the questionnaire. The largest household successfully interviewed in Austria had only 7 members, however, so all variables in excess of that number were deleted from the dataset. Moreover, the coding was changed from yes/no questions for each household member (the type of coding used in Austria) to the list of individual IDs that were required for the internationally available dataset (which only contains six variables for individuals).
At the same time, all NACE codes for household members employed in the business were checked against the information contained in the P-file and corrected where necessary.
4.6.2.9 Life insurance policies
Information on assets held in life insurance contracts was recorded through questions ensuring that the answers were both as precise as possible and prone to only a few errors. There was, for instance, no direct question on the total value of such assets, but rather a series of questions on the start of payments, the frequency of payments (monthly, yearly or single payment), the type of life insurance (benefits to be provided at the death of the policy holder or at a given date, or a hybrid form) and the amount of the current payments for every single life insurance contract in the household. For all life insurance policies with a set payout date and/or all hybrid policies, the value of the assets held in life insurance contracts was calculated as the sum total of all payments. In cases where one or several details were not given, the remaining observations were used as bounds for the values to be imputed. Insurance policies which do not pay out capital if the insured lives beyond the term period do not constitute wealth; they were therefore excluded from this calculation.
4.6.2.10 Income variables
The following categories (variable name in parentheses) of personal income were recorded separately for every member of the household who was 16 years of age or older:
- employee income (PG0100 and PG0110)
- income from self-employment (PG0200 and PG0210)
- income from public pensions (PG0300 and PG0310)
- income from private and occupational pension plans (PG0400 and PG0410)
- income from unemployment benefits (PG0500 and PG0510)
This information was supplemented by the following income categories that were recorded per household:
- income from public social transfers (HG0100 and HG0110)
- income from private transfers (HG0200 and HG0210)
- income from other private transfers (HG0250 and HG0260)
- income from real estate assets (HG0300 and HG0310)
- income from financial investments (HG0400 and HG0410)
- income from investments in self-employment businesses or partnerships (HG0500 and HG0510)
- income from other sources (HG0600 and HG0610)
In the case of the first four personal income categories, respondents could indicate their net income if they did not recall their gross annual income (see chapter 2). Likewise, respondents could indicate their net income from financial investments if they did not know their gross income in this category.
Where only a net amount was entered for individual incomes, the gross income was calculated with the aid of the Austrian finance ministry’s gross-to-net calculator, 45 based on information on the type of income, the structure of the household (with reference to the tax credits for single parents and single earners), the employment status and age of any children, the province and the respondent’s employment status (employed (holding a “blue collar” manual job; or an office, sales or services job) or retired). 46 Wherever both parents were gainfully employed, the single earner’s tax credit was assigned to the main earner, i.e. the parent with the higher income (as long as the legal requirements were fulfilled and the partner did not earn more than EUR 6,000 per annum). COVID-19 related family support was also appropriately taken into account.
Given the far greater scope for tax deductions for self-employed people, the gross-to-net conversion of income from self-employment was not generally based on the precise figures. Precise conversions were made only for gross annual incomes below EUR 11,000, which are tax-free, albeit sometimes subject to social insurance costs. For such cases the gross amount was set equal to the net. For all other gross (or net) incomes from self-employment (22 individuals), 47 a range was created for the purpose of imputing specific amounts by adding EUR 10,000 to and by subtracting EUR 10,000 from the gross (or net) amount converted subject to the conditions for employees in office, sales or service jobs. This range reflected the uncertainty that such a conversion entails, without losing the important information of the actual range within which the value is placed. This range then was used as a bound in the imputation of a precise value (see section 5.4.3).
To calculate the gross income from financial investments, 25% withholding tax was added to amounts given for net income.
When individuals reported more than one income from more than one type of employment, a slightly more complex method for calculating the gross and net income values had to be applied to account for the complexities of the Austrian tax system for such situations. For instance, individuals could receive their main income from a job with the status of employee and additionally earn some money from self-employed work. Other common cases included pensioners who topped up their public pension with income emanating from self-employment or also employment. When converting gross income to net, all forms of income of an individual are taken into account. Sometimes respondents reported their net income from one source and their gross income for another. The gross and net values were approximated by converting one of the values to its counterpart gross or net value, applying the conversion to the sum of both net or both gross and using this amount to calculate the still missing net or gross value of the income not used in the initial step.
If the net amount was only recorded as a range, the upper and lower bounds were converted into gross values that were subsequently used in the imputations. All converted values were flagged “3051.”
Number of persons | Share | |
---|---|---|
% | ||
Number of persons receving employee income | 1.782 | 100 |
Answer recorded, complete observation (flag 1) | 989 | 55,5 |
Not imputed, originally: „Don’t know“ (flag 1050) | 30 | 1,7 |
Not imputed, originally: „No answer“ (flag 1051) | 23 | 1,3 |
Not imputed, originally: originally not collected
due to higher order missing (flag 1052) |
3 | 0,2 |
Not imputed, originally collected from a range
(flag 1053) |
52 | 2,9 |
Not imputed, collected value deleted (flag 1054) | 5 | 0,3 |
Not imputed, set to missing due to incorrect answer
to higher order question ( flag1056) |
10 | 0,6 |
Not imputed, collected value deleted, but range
information available (flag 1057) |
24 | 1,3 |
Edited, set to modified value as considered incorrect
or unrealiable (flag 3050) |
3 | 0,2 |
Edited, adjusted on the basis of other information
obtained in the (national) survey (flag 3051) |
626 | 35,1 |
Edited, adjusted on the basis of verbatim records
(flag 3052) |
1 | 0,1 |
Edited, set on the basisi of follow-up with the
household (flag 3075) |
16 | 0,9 |
Source: HFCS Austria 2021, OeNB |
Using flags as a basis, table 4 gives an indication of the number of edits relating to employee income. The table also illustrates the use of flag variables (see also section 4.5).
The question on the amount of employee income received (variable PG0110) was put to a total of 1,782 individuals. 989 respondents (55.5%) expressed their annual income in gross terms. A further 30 respondents (1.7%) answered “Don’t know” and 66 individuals (1.3%) opted for “No answer.” 52 respondents (2.9%) specified their income amount using a range. For 24 individuals, a range could be calculated from other information given, i.e. the range was given for net income and converted. The responses of 10 individuals (about 1%) were edited and mostly set to missing and flagged for imputation. 626 of the respondents (35.1%) provided their net income, which was then converted as described earlier. The responses of the remaining 16 individuals (around 0.9%) were corrected on the basis of follow-up queries.
4.6.2.11 ISCO and NACE classification
As required by the euro area blueprint questionnaire, the main occupation of respondents was recorded (in variable PE0300) using the occupation codes and titles set out in the International Standard Classification of Occupations (ISCO-08). Making individual members of each household classify their jobs themselves, however, would have been extremely difficult for respondents without any advance knowledge of the ISCO codes, possibly giving rise to misclassifications. Therefore, verbatim answers were recorded for the Austrian HFCS question on job titles and/or main job tasks. That information was later paired with the corresponding ISCO-08 codes, as published by Statistics Austria (in German). 48 As required by the ECB, classification was based on the two-digit ISCO codes (major subgroups). To this end, the verbatim record of the job title and related main tasks was supplemented with individual data relevant for ISCO classification (in particular, the respondent’s qualifications and the main activity of the company where the respondent worked). The variable PE0300 to be submitted to the ECB was first flagged “3051” (Edited, adjusted on the basis of other information obtained in the (national) survey) and aggregated in a next step (see section 4.7).
Also, the main activity of the company (PE0400) where the respondent worked was first recorded verbatim and then assigned a single-digit NACE rev. 2 code. 49
4.6.2.12 Highest education qualification
To account for latest developments in the International Standard Classification of Education (ISCED), the highest education qualification of all household members has been recorded in substantially more detail since the second HFCS wave. Respondents were not asked about ISCED categories, however, but were prompted to indicate their qualification based on Austria’s academic degree hierarchy (bachelor, master (including previous versions) and doctorate). The bachelor degree may have been inadequately chosen by some respondents. As this is a fairly new degree in Austria, respondents aged 50 or above are double checked and in case of likely inconsistency editied to have a master degree (or an earlier version); the corresponding variable was flagged “3051.”
4.6.2.13 Exclusion of successful interviews
The final data do not include 44 households with a successful interview were dropped from the data because they either did not belong the target population or the quality in terms of item non-response did not meet the required standard. These interviews included the work of the one interviewer that had to be excluded from interviewing in the HFCS (see also 3.7).
4.6.2.14 CAPI errors encountered with the questionnaires
A few edits resulted from CAPI errors encountered during interviews.
- Few young persons below the age of 16 did not reach the questions on intra-household asset distribution (ahd1940x, ahd1950x), although they should have been asked this information. This CAPI error was corrected by setting the observation missing to be imputed.
- One household, who did not reach the summary question for uncollateralized loans ((a)hc1200), was also corrected in the imputation model.
- The CAPI mistake for fixed intervals on business participation (hd0801i) yielded more information then necessary and thus this information was deleted.
- The wrong fixed interval list for questions of gains / losses in income / consumption during the beginning of the COVID-19 pandemic (ahv0210, ahv0610) was appropriately taken care of in the editing process.
- Finally, the missing information of pe0470 due to a CAPI error was left missing.
4.7 Formatting and editing after multiple imputations
Any information collected at a greater degree of granularity in Austria than required for the international dataset was processed further upon imputation so as to bring the level of aggregation into line with the international requirements. The most important aggregations can be summarized as follows:
- Marital status: The categories “Married and living together with spouse” and “Married, but separated” were aggregated as “Married.”
- Education: Categories specific to Austria were paired with ISCED 50 2011 codes 51 and classified as ISCED level 0 (“Early childhood education or no education”) 52 ; ISCED level 1 (“Primary education”); ISCED level 2 (“Lower secondary education”); ISCED level 3 (“Upper secondary education”); ISCED level 4 (“Post-secondary nontertiary education”); ISCED level 5 (“Short-cycle tertiary education”); ISCED level 6 (“Bachelor’s or equivalent level”), ISCED level 7 (“Master’s or equivalent level”) and finally ISCED level 8 (“Doctoral or equivalent level”).
- Employment status/relationship: More detailed categories were aggregated.
- Main residence – tenure status: More detailed categories were aggregated.
- Reasons for refinancing: With regard to collateralized loans, “For the conversion of a foreign currency loan” was available in Austria as an additional category for this variable. This category was added to “Other” in the international dataset.
- Loan repayments: The installments for repaying (secured and unsecured) bullet loans were set to “0” as such loans are repaid with a single lump sum upon maturity. Assets accumulated for repayment can be analyzed on the basis of variables that are specific to Austria.
- Use of additional real estate property: In this variable, “Buy-to-let apartment” was available as an additional category in Austria; in the international core dataset, this category was added to “Other.”
- For the international version the question on loans secured with further property was structured differently than in the previous waves in that loans secured by real estate were asked together with different characteristics of the property instead of in a separate loop as before. The Austrian questionnaire kept the previous structure of asking about further properties and any loans secured with these in two separate loops. To bring the data into the structure required by the international dataset, a question on which property was used to secure a particular loan was used to link the two loops.
- Due to the high complexity of bringing the flags of the national data into the structure required by the international dataset, the flags in the international dataset corresponding to the variables concerning loans secured by other real estate next to the household’s main residence were simplified by using the flag value “0” if the national value was missing and “13” otherwise.
- Purpose of private and noncollateralized loans: The categories “To finance a deposit for the housing association” and “To support friends and family” were allocated to “Other.”
- Rejection of a loan application: Information of this type was recorded in three variables with multiple responses; it was then aggregated into two variables.
- Business’ legal form: More detailed categories were aggregated.
- Money in savings plans with building and loan associations and life insurance policies: Data recorded on these two investment methods are aggregated into savings (HD1200 and HD1210).
- Type of assets received (survey questions on inheritances and gifts, HH030$a-i): The sequence based on values was abandoned.
- Provider of assets (survey questions on inheritances and gifts): More detailed categories were aggregated.
- Relative change of income due to the COVID-19 pandemic: This question was not asked but calculated based on net income and the absolute change of income at the household level due to the COVID-19 pandemic.
- Purpose of saving: The sequence ordered by relevance was abandoned.
- Relative change of consumption due to the COVID-19 pandemic: This question was not asked but calculated based on the absolute change of consumption at the household level due to the COVID-19 pandemic.
- Type of dwelling (paradata): More detailed categories were aggregated.
In Austria, more specific flags were used in some areas (see also section 4.5). To conform to international standards, these flags were, in general, aggregated as follows:
- flag “1057” was recoded as “0”
- flag “1058” was recoded as “1050”
- flags “3075”, “3076”, “1075” and 2 were recoded as “1”
- flags “1055” and “1056” were recoded to “1050”
- flags “4055” and “4056” were recoded to “4052”
- flag “3051” was recoded as “1”
- flag “3052” was recoded as “1”
- flag “3053” was recoded as “0”
- flag “4057” was recoded as “4053”
- flag “4058” was recoded as “4050”
The following variable is an exception to this rule:
- The variable featuring rent excluding utilities (HB2300) was flagged “1075” to identify the special cases when only the rent including utilities was known to respondents. These costs were subsequently imputed, with the rent including utilities serving as bounds, and reflagged “4053.”
In analogy to the first waves, some of the additional data over and above those of the ECB’s HFCS datasets, which are collected at the national level and contain all the variables specified by the ECB, will probably be available from the OeNB as of summer 2023. The additional information includes additional variables, as well as a more detailed breakdown of certain variables. Datasets may be merged on the basis of both the identification numbers and imputation numbers. The aggregation of flags in Austria yiels less information for Austrian data in the internationally availabe data. If more information regarding flags is desired the HFCS-Team of the OeNB can be contacted.
4.8 Concluding remarks and online appendix
The underlying rationale of editing was to edit only those observations that had clearly not been recorded correctly. In cases of ambiguity, the first possibility considered was to conduct ex post checks by phone. This option allowed many observations either to be corrected or to be confirmed.
Knowledge of the steps undertaken to check the consistency of the data is essential both for analyzing the data and for understanding how the observations originated. In addition, the use of flags makes it possible for users to develop an imputation model of their own, to dispense with imputations, or to resolve the problem of item nonresponse in another way.
The online appendix which contains the information provided here on the edits and consistency checks applied in the HFCS in Austria includes a list of the consistency checks programmed into the digital version of the questionnaire. 53
31 See e.g. Kennickell (2011) and Bledsoe and Fries (2002) for information on the editing measures used in the Federal Reserve’s Survey of Consumer Finances.
32 The cleanup statistics do not reflect irrelevant variables cleaned up following the skipping of certain questions in a loop (see also sections 2.6.1 and 4.6.2.4).
33 Examples given in this chapter are in italics.
34 A list of all the consistency checks that were programmed into the digital version of the questionnaire can be found in the online appendix.
35 The data were cross-checked against the results of the first HFCS waves as well as external data sources such as the EU-SILC (conducted by Statistics Austria).
36 Only one observation for one household needed to be corrected in an follow-up enquiry due to this error.
37 All HFCS variables were assigned value labels that explain the coding. The coding of the individual variables is also included in the questionnaire (available in the online appendix).
38 See chapter 2 for details of the structure of the whole questionnaire.
39 See www.oenb.at/isaweb/report.do;jsessionid=31767F3B9E6FA661A8A4CD5CB700B5A7?report=2.12 (accessed on June 9, 2023).
40 See https://www.oenb.at/Statistik/Standardisierte-Tabellen/zinssaetze-und-wechselkurse/Wechselkurse.html (accessed on June 9, 2023).
41 See http://fxtop.com/en/historical-exchange-rates.php?MA=1 (accessed on June 9, 2023) for historical exchange rates.
42 Where the amounts given were not identical, the one specified in response to the loop questions on the first loan was deemed to be more relevant than that given as an answer to the summary question. The reasoning behind this procedure is that the loop questions relating to the first loan contained a question explicitly asking for the amount outstanding on an unsecured loan, so the amount given there was regarded as more trustworthy.
43 In the HFCS wave in Austria, only two households opted for this route.
44 In the fourth wave in Austria there was no such case.
45 See https://onlinerechner.haude.at/BMF-Brutto-Netto-Rechner/ (accessed on June 9, 2023). Available in German only.
46 Apprentices were categorized as manual workers in the conversion, while civil servants were treated as employees in office, sales or service jobs on grounds of their more favorable taxation.
47 These cases only include individuals whose income is sourced only from self-employment. Cases where there are further sources of income have to be treated differently as is discussed later in this section.
48 For further information, see the Statistics Austria classification database https://www.statistik.at/en/databases/classification-database (accessed on June 9, 2023) in the “Occupations” section.
49 For further information, see the Statistics Austria classification database https://www.statistik.at/en/databases/classification-database (accessed on June 9, 2023) in the “Economic activities” section.
50 International Standard Classification of Education.
51 For explanations, see https://www.klassifikationsdatenbank.at/KDBWeb/kdb.do?FAM=BILD&&NAV=EN&&KDBtoken=null (accessed on June 9, 2023).
52 Only very few of the participants in the fourth wave in Austria had completed less than the compulsory education.
53 All documents included in the online appendix are available at www.hfcs.at.
5 Multiple imputations
5.1 Introduction
A common problem with voluntary surveys is item nonresponse, i.e. the fact that some survey participants do not answer all questions. 54 This is especially the case with surveys that pose complicated or sensitive questions (e.g. about income or wealth).
Disregarding the problem of missing information due to item nonresponse would lead to biased estimates. For the HFCS data, we therefore used multiple imputation with chained equations.
The idea behind this approach is to substitute missing values in the dataset with several values that have been estimated based on an iterative Bayesian model. The main aim of this procedure is to impute in such a way that the associations between all variables are preserved in terms of maintaining the correlation structure of the dataset. Under this approach, the missing values of each variable are estimated by taking into account a maximum number of available variables. To account for the uncertainty of the missing values, not just one value per missing value is imputed, but several (in the case of the HFCS, five).
This data imputation approach is also used by similar surveys, such as the U.S. Survey of Consumer Finances (SCF – see Kennickell, 1998) and the Spanish Survey of Household Finances (EFF – see Barceló, 2006).
As multiple imputation is a very time-consuming process, most institutions that carry out surveys, including the HFCS, provide users with datasets, which are already imputed. This ensures that all users can work with the same imputed datasets. In the case of the HFCS, users can identify every imputed value of any variable by looking at the corresponding flag variable (section 4.5). Thus, they have the possibility to carry out nonresponse analyses or imputations on their own, or to use other methods for dealing with item nonresponse in their analyses.
This chapter is structured as follows: In section 5.2, we present data on item nonresponse in the HFCS. Section 5.3 describes the imputation procedure used, and in section 5.4 we explain the imputation model specification and how the imputations were executed. Finally, some imputation results are presented in section 5.5.
5.2 Item nonresponse
Table 5 shows selected statistics on item nonresponse for the interviews from the fourth wave. On average, each household has 17.3 missing values, which means that item nonresponse was limited to 0.9% of all the questions (variables) addressed to each household. However, the respective percentage for the euro variables amounts to 3.9%. This suggests that questions of this kind might be perceived as sensitive or difficult to answer. These figures are below the ones of the third wave (for more information on the changes due to COVID-19 see box 3 and chapter 10).
There are different ways of analyzing datasets that include variables with missing values. 55 In most statistical packages, the default method is the complete-case analysis method. This method entails deleting all households that have missing values in any of the variables of interest and basing the analyses solely on complete observations. However, the loss of information resulting from this method leads to two problems: First, it biases estimates if complete observations differ systematically from incomplete ones; second, even if an estimate is unbiased, the estimation would be less precise due to the observations lost. To illustrate how significant the loss of information would be in the case of the HFCS, table 6 shows item nonresponse rates across some selected variables.
Mean | Median | Minimum | Maximum | |
---|---|---|---|---|
Number of variables asked | ||||
all variables | 1,917.8 | 1,947.0 | 1,382 | 2,241 |
euro variables | 117.6 | 122.0 | 42 | 166 |
Number of variables with
missing values |
||||
all variables | 17.3 | 8.0 | 0 | 370 |
euro variables | 4.6 | 2.0 | 0 | 57 |
Share of variables with
missing values in % |
||||
all variables | 0.9 | 0.4 | 0 | 19 |
euro variables | 3.9 | 1.6 | 0 | 47.6 |
Source: HFCS Austria 2021, OeNB. | ||||
Note: Interval responses are considered as missing values with regard to the
corresponding euro variable and are not included as a separate variable. A question addressed to several household members is entered as several variables, one for each household member. |
For example, when asked about the value of their main residence, 84.3% of households provided a specific amount (column 3 of table 6). The other 15.7% of households are item nonrespondents: Either they provided a (prespecified or individual) range (11.4%, column 4), responded with “Don’t know” or “No answer” (4%, column 5) or their response was set to missing 56 (0.4%, column 6) in the editing process. Nonresponse rates 57 vary substantially across items. Variables with high nonresponse rates include e.g. questions related to the monthly amount paid as rent excluding utilities (100% – 65.3 % = 34.7%) and the household’s gross income from financial investments (100% – 57.2% = 42.8%). However, 34.5% and 32.1% of households provided at least a range for these two questions, which confirms the importance of asking range questions when a euro question remained unanswered. Range questions provide valuable and often very precise information (see the online appendix and section 2.6.2 for the questionnaire and information on the design of euro loops). Variables with low nonresponse rates include variables, such as the amount spent on food consumed at home (100% – 91.8% = 8.2%).
Household has item | Responses by households that have the item | |||||
---|---|---|---|---|---|---|
Yes(1) | Unknown(2) | Amount(3) | Range(4) |
“Don’t
know”/ “No answer” (5) |
Other
missing values1 (6) |
|
% | ||||||
Value of main residence2 | 43.0 | 0.0 | 84.3 | 11.4 | 4.0 | 0.4 |
HMR mortgage 1: amount
still owed |
11.1 | 0.2 | 85.5 | 5.5 | 9.0 | 0.0 |
Monthly amount paid as rent
excluding utilities |
49.5 | 0.0 | 65.3 | 34.5 | 0.2 | 0.0 |
Other property 1: current
value |
9.8 | 0.2 | 78.2 | 15.6 | 4.9 | 1.3 |
Other property mortgage 1:
amount still owed |
1.2 | 0.0 | 88.9 | 3.7 | 7.4 | 0.0 |
Value of sight accounts | 99.7 | 0.0 | 83.2 | 8.6 | 8.0 | 0.1 |
Value of saving accounts | 98.7 | 1.3 | 81.8 | 6.0 | 7.5 | 4.7 |
Value of publicly traded
shares |
4.9 | 0.3 | 68.8 | 14.3 | 15.2 | 1.8 |
Amount owed to household | 6.0 | 0.1 | 87.0 | 3.6 | 9.4 | 0.0 |
Employment status (main
activity) (person 1) |
100.0 | 0.0 | 100.0 | 0.0 | 0.0 | 0.0 |
Gross employee income
(person 1) |
41.9 | 0.0 | 93.5 | 2.2 | 3.4 | 0.8 |
Gross income from
unemployment benefits (person 1) |
4.8 | 0.0 | 85.3 | 8.3 | 6.4 | 0.0 |
Gross income from
financial investments |
61.4 | 5.2 | 57.2 | 32.1 | 10.2 | 0.6 |
Gift/inheritance 1: value | 37.1 | 1.2 | 81.5 | 8.4 | 10.0 | 0.1 |
Amount spent on food at
home |
100.0 | 0.0 | 91.8 | 8.1 | 0.1 | 0.0 |
Source: HFCS Austria 2021, OeNB. | ||||||
1 Missing values due to editing measures and exits from loops. | ||||||
2 Based on the HB0900 variable. | ||||||
Note: HMR = household main residence. |
Table 6 (column 2) also shows another aspect of item nonresponse in the HFCS: There are variables known as branch variables (see chart 3 in chapter 4) that may also have missing values due to nonresponses to a higher-order question (head variable) and that are thus set to missing. For example, before the euro question on gross income from financial investments is asked, households are asked a yes/no question determining whether they have this type of income or not. Only those that answer affirmatively (61.4%) are then asked the question on the amount of income; the other households, including the 5.2% of households that did not answer the yes/no question, automatically skip the euro question. As it is unknown, however, whether the 5.2% of households that did not answer the yes/no question have a positive gross income from financial investments or not, their nonresponses must also be considered as second-order (or higher-order) missing values when analyzing nonresponse to a euro question.
Covariates | Coefficient |
---|---|
Female (person 1) | 0.0298 |
(0.116) | |
Age (person 1) | 0.00832* |
(0.00504) | |
Tertiary education level
(person 1) |
–0.0465 |
(0.144) | |
Employed/self-employed
(person 1) |
–0.0997 |
(0.167) | |
Residence is in Vienna | 0.389*** |
(0.136) | |
Size of main residence | 0.00423*** |
(0.00107) | |
Household size | 0.0841 |
(0.0619) | |
Constant | –2.780*** |
(0.399) | |
Observations1 | 2,261 |
Source: HFCS Austria 2021, OeNB. | |
1 The remaining 32 observations of the dataset show missing values in
one of the covariates and/or filter missing remarks in the dependent variable and are thus not included in the regression. |
|
Note: Standard erros in parentheses; *** p<0.01, ** p<0.05, * p<0.1. |
Thus, if a complete-case analysis were to be carried out with the HFCS data, the loss of information and the resulting loss in precision of unbiased estimates would be considerable owing to the large amount of variables with higher-order missing values. Furthermore, as complete observations usually differ systematically from incomplete ones, complete-case analysis would bias the estimates.
For illustrative purposes, table 7 shows a regression of nonresponse to the question regarding the balance of sight accounts (“1” if the value is missing, “0” otherwise) for several explanatory variables. We can see that item respondents differ significantly from item nonrespondents, because respondents tend to be younger, tend to live in smaller main residences and smaller households and tend not to live in Vienna. Thus, a complete-case analysis of the value of sight accounts would bias the estimates toward a population with these household characteristics.
The COVID-19 pandemic and multiple imputations
The HFCS multiple imputation procedure did not change in wave 4 compared to the previous waves. However, the COVID-19 pandemic influenced the item nonresponse rates and, consequently, had an impact on some parameters of the imputation model.
While unit nonresponse increased from wave three to wave four (see box 5 in chapter 7 and chapter 10.7), item nonresponse rates decreased. The number of households that participated in wave 4 was smaller than in wave 3. But the households that did participate in the survey in wave 4 tended to answer more items on average. Among all euro variables in the survey, the average share of euro variables with missing values per household decreased from 4.9% to 3.9% (see also table 1 in chapter 10.5 for more details).
Consequently, a lower number of variables had to be imputed. The total number of variables imputed decreased by 134, from 907 in wave three to 773 in wave four (see table 2 in chapter 10.5 for more details).
However, at the same time, due to the smaller net sample size in wave 4 resulting from the increase in unit nonresponse, the number of variables that could not be imputed with regressions because of insufficient variance or observations increased. These variables were imputed with ad hoc methods, such as hotdeck imputation, after the regular HFCS imputation procedure had been completed. Among the variables imputed in both surveys, 5.2% were imputed using ad hoc methods in wave 4 compared to 3.4% in wave 3. Despite this, these variables are typically not used as predictors in the chained regression equations from the regular HFCS imputation procedure, and, therefore, the broad conditioning approach applied in the selection of predictors is still valid.
5.3 HFCS imputation procedure
To impute HFCS data, we have chosen a procedure implemented in the statistics software Stata by Royston (2004) in which all variables to be imported are estimated in regression equations (chained equations). 58 It can be summarized in the following steps: 59
- Step 1: Select the P variables Y1 ,Y2 ,…,YP to be imputed.
- Step 2: Fill the missing values of Y1 ,Y2 ,…,YP with random selected values which were actually observed.
- Step 3: For each Y1 ,Y2 ,…,YP
- run a Bayesian regression of the variable to be imputed on a broad set of independent variables, which is chosen from among the HFCS variables without missing values and the variables selected in step 1 (except the one being regressed); the regression sample is restricted to those observations that are not missing in the dependent variable;
- randomly draw a vector of regression parameters from their posterior distribution;
- calculate the corresponding predicted values and use them as the imputed values;
-
replace the missing values of the imputed variable with its imputed values.
- Step 4: Repeat step 3 t times. Each time, replace previous imputed values with updated ones obtained from the latest regression. This creates the first imputation sample (or implicate).
- Step 5: Repeat steps 3 and 4 M times independently to obtain M imputation samples.
The basic idea behind this procedure is to impute missing values for each of the P variables with missing values by drawing predictions based on a Bayesian regression model specific to that variable (step 3). To preserve the associations between variables with missing (true) values and variables with complete observations, each regression model contains a broad set of independent variables with complete observations.
Furthermore, the procedure is multivariate in the sense that the estimation of the missing values is repeated (t times); variables that are being conditioned in each regression are replaced by the observed values or those currently being imputed (step 4). It is important that each regression model also contains a broad set of independent variables with missing values in order to preserve the joint distribution of variables with missing values. If t tends to infinity, the imputations of missing values of Y1 ,Y2 , …, YP in each cycle are expected to converge to an approximation of a draw from their joint posterior predictive distribution.
In the final step (step 5), the procedure provides multiple imputations of each missing value by repeating steps 3 and 4 M times independently. This is done to take into account the uncertainty of the imputed values when estimating any variances with imputed variables with missing values. The M imputations of the missing values of Y1 ,Y2 , …, YP are expected to converge to an approximation of M draws from the joint posterior predictive distribution of the missing values.
Although it is theoretically possible that the sequence of draws based on the regressions above might not converge to a stationary predictive distribution, simulation studies provide evidence that the approach yields estimates that are unbiased (Van Buuren et al., 2006). Furthermore, separate regressions for each variable reflect the data better, given that the HFCS data contain a large number of variables, many of which have bounds, skip patterns, bracketed (i.e. range) responses, interactions or constraints in relation to other variables. This approach thus makes more sense than specifying a joint distribution for all variables together, as is the case for example in the joint modeling approach. 60
It should be noted that the HFCS imputation procedure is based on the assumption that the nonresponse probabilities of variables with missing values are only dependent on observed information – never on unobserved information such as the variables with missing values themselves. In the literature this assumption is referred to as ignorability assumption.
Before running through the five steps above, we need to prepare the data and specify all the parameters of our imputation model: e.g. the selection of variables to be imputed, the imputation order, the regression model for each variable, the number of cycles t, the number of imputation samples M, etc. The next section describes how this was done.
5.4 Creating the imputations
5.4.1 Choosing the variables to be imputed
In step 1 of the HFCS imputation procedure, we have to select the variables Y1 , Y2 , …, YP to be imputed. Our strategy is to impute as many variables with missing values as possible, which amounts to around 65% of such variables (or 91% of all missing values). This proportion is about the same as in the last wave and in any case includes all balance sheet variables. The remaining variables with missing values are not imputed with the HFCS imputation procedure due to insufficient variance or a lack of sufficient observations to run a regression. 61
The imputation of as many variables as possible is intended to minimize the number of cases in which users are forced to conduct a complete-case analysis with HFCS data because the variables they are interested in have not been imputed. Another important reason for adopting this strategy is that we do not want to bias the correlation structure of the data with our imputations. If we were to reject many variables for imputation, we could not use them in the regression models as independent variables with missing values either, and we would thus bias the associations between the unimputed variables with missing values and the imputed ones.
5.4.2 Imputation order
As mentioned in the section on the HFCS imputation procedure, one of the weaknesses of the procedure is that it does not enable us to prove, in theoretical terms, that the sequence of drawn predictions based on the Bayesian regressions converges to a stationary predictive distribution. In practice, however, it has been found that choosing a particular order of Y1 , Y2 , …, YP often aids convergence. Therefore, we order the variables to be imputed by their degree of missingness, starting with the variables with the least missing values and ending with those variables that have the most missing values. Variables with the same degree of missingness are imputed in a fixed random order. Head variables are always imputed before their corresponding branch variables. For example, the variable indicating whether a household has a mortgage or not is always imputed before the mortgage amount is imputed, even if the degree of missingness is the same for both variables.
5.4.3 Types of regression models
In step 3, we defined a regression model for each variable to be imputed. Depending on the type of the variable, we choose from four different types of regression models. For continuous variables, we use a range regression model, 62 because all of our continuous variables are bounded either from above or from below, or both (see section 5.4.6 for more details). For binary variables, we use a logit model; and for ordinal and nominal variables, we use ordered logit and multinomial logit models. 63
5.4.4 Use of weights in regressions
Generally speaking, there is little debate about the need to use weights for the estimation of descriptive parameters (means, proportions, totals, etc.). There is, however, some debate about the use of weights when fitting regression models to survey data. This issue also arises when fitting the regressions in step 3 of the HFCS imputation procedure. Starting with the second wave, we have used weights as predictors only (section 5.4.7). Put differently, our approach has been not to estimate weighted regressions, in line with the current trend in imputation (see e. g. Frumento et al., 2012). As argued in the literature, multiple imputations are only meant to appropriately predict missing values (and their uncertainty). Units should not be weighted until later, when statements about the population are to be made on the basis of an analysis of the final dataset.
5.4.5 Variable transformations
Before imputing variables with missing values, we transform several of them, as this has proved to be extremely helpful in improving the imputed values of these variables and, hence, in improving the quality of the imputed values in general. Once the imputations are finished, we transform all variables back into their original measure.
One important transformation of continuous variables involves using the natural logarithm. These types of variables usually have a highly skewed distribution; using the logarithm helps to bring the distribution closer to assumption of normality that is necessary for the forecast. Another very helpful transformation for year variables is to impute time periods rather than years. For example, instead of imputing the purchase year of a house, we impute the time elapsed since the house was purchased. In such cases, the logarithmic transformation mentioned above is carried out on the durations and not on the years.
Another transformation used for some variables with values between “0” and “1” is the log-odds transformation (log(y/(1–y))), for example for the amount of an outstanding consumer loan (HC0801 to HC0803). Instead of imputing these variables individually, the original amount of the consumer loan (HC0601 to HC0603) is imputed as a first step. Additionally, an indicator showing whether the amount outstanding is smaller than the original amount of the loan, is imputed, and if so, the outstanding amount is imputed as a percentage of the original amount. This share is imputed as a log-odds transformation, considerably improving the quality of the imputed values. Subsequently, the individual variables (HC0801 to HC0803) are calculated from the original loan amounts and shares.
For categorical variables, two types of transformations may be used. First, some of the nominal variables can be transformed into ordinal variables by reordering categories. This improves the stability of the imputation model, as fewer parameters need to be estimated for ordinal regression models than for multinomial regression models. Second, multiple response variables are transformed into several binary variables by generating one binary variable for each response category (“1” if the category applies, “0” otherwise). This makes it possible to impute more than one response category for the same question per imputation sample.
A transformation that is done for both continuous variables with missing values and categorical variables with missing values involves splitting the original variable into head and branch variables; this is done when there is a certain heterogeneity in the original variable. For example, some loan-length variables have the value “–4,” indicating that “The loan has no set length.” When imputing such a loan-length variable, it would not make sense to run the regression over these observations together with those variables that do provide a loan-length value. In such cases, the variables are split into two: (1) a binary head variable indicating whether the loan has a set term or not (imputed with a logit regression model), and (2) a continuous branch variable indicating the loan length if the loan has a set term (imputed with range regression).
A further transformation, which is carried out both for continuous and categorical variables with missing values, is that of individual IDs. 64 Individual variables are modeled and imputed separately for each ID in order to avoid biased imputations (section 5.4.8); this should ensure that household members with the same IDs display relatively homogenous characteristics if they are modeled together. For this reason, respondents are grouped into new individual ID categories created specifically for the imputations prior to imputation. The criteria for this categorization are as follows: All male financially knowledgeable persons (FKPs), all male partners of FKPs that were individual 2 and all other FKPs are classified as individual 1 (ID = 1). All female partners of FKPs that were individual 2 and all women that were individual 1 before their male partners became individual 1 are classified as individual 2 (ID = 2). All other people are ordered and numbered by age in descending order and are numbered starting with ID = 3.
In the case of households with members that engage in farming, we use a special transformation of the variables for the value of the household’s business(es) (HD0801 to HD0803) and the variable for the value of the household’s main residence (HB0900). Instead of imputing these variables individually, we first impute the sum of these variables and, additionally, the percentage of this sum that is attributable to the farm. Then we calculate the individual variables (HD0801 to HD0803 and HB0900) based on the sum and percentages imputed. The reason for using this transformation is that it considerably improves the imputed values, as some households with members that engage in farming did not state separate values for their main residence and their agricultural business but indicated only the combined value (see section 4.6.2.7 for further details).
5.4.6 Bounds
As mentioned above, we use range regression models to impute continuous variables in step 3 because all such variables are bounded either from above or from below, or both. These bounds are used to avoid the imputation of values that are not defined or that are inconsistent with other variables in the survey. We distinguish between general bounds and individual bounds.
General bounds, which are the same for all households and individuals, are used to avoid imputing values that are not defined or are very unrealistic. Examples of this type of bound include nonnegativity constraints on continuous or count variables (e.g. income or age). For all households the lower bound for these variables is zero. For some continuous variables, we assume that a value above or below a particular general bound cannot occur in practice. As a case in point, the lower bound for the year a loan was taken out (HB1301 to HB1303) is 1945. We assume that no loan that is still outstanding in Austria was taken out, renegotiated or refinanced more than 77 years ago. The use of such empirical bounds helps avoid imputing extreme outliers of these variables without providing biased results. More examples of general bounds include percentage variables (e.g. share of homeownership), where we set the lower bound to zero and the upper bound to 100, or some year variables (e.g. the purchase or inheritance year of the household’s main residence), where the upper bound is 2022, i.e. the year in which the last survey interviews were carried out.
Unlike general bounds, individual bounds take different values depending on each household or individual; they usually ensure consistency with other variables from the same household. Most of the HFCS bounds fall into this category. For example, when imputing the amount spent on food eaten at home, we set the total consumption expenditure estimated by the household as the upper bound. Inversely, when imputing the total estimated consumption expenditure, we set the sum of the amounts spent on food and drink consumed at home and outside of the home as the lower bound. Individual bounds are also used when a household provides a range (either prespecified or individual) in a euro question instead of a specific value. Such ranges are requested if respondents do not provide specific amounts in response to euro questions; they prove very useful for imputation purposes, as they yield valuable and precise information on the missing value from a euro question (see also section 5.2 in connection with table 6).
Individual bounds in the HFCS are, for example, also used when imputing rents (e.g. rent including utilities is used as an upper bound for rent excluding utilities and vice versa), or when imputing several count variables (e.g. the birth year of the oldest household member is used as a lower bound for the year of acquisition of the main residence). If an observation has more than one lower and/or upper bound (e.g. general and individual bounds), we take the lower and/or upper bound that is the most restrictive.
5.4.7 Selecting predictors
As mentioned above, one of the main goals of imputation is to preserve the distribution among variables with missing values and variables with complete observations – and also that among variables with missing values themselves. Therefore, when choosing predictors for the imputation model, it is not sufficient to select the most accurate predictors for each variable to be imputed. Such an approach could bias the correlation structure between the variable to be imputed and the excluded variables. Furthermore, ignoring variables that are determinants of nonresponse for the variable to be imputed makes the ignorability assumption (see section 5.3) less plausible.
Thus, we choose as many predictors as possible (broad conditioning approach). In a large dataset, such as that of the HFCS containing several hundred variables, it is, however, not feasible to include all variables, as this may lead to both multicollinearity problems and computational problems. In line with Van Buuren et al. (1999) and Barceló (2006), we have therefore adopted the following strategy for selecting predictor variables:
- Include the variables that are determinants of nonresponse. These are necessary to satisfy the ignorability assumption on which our imputation model relies (see section 5.3). Variables included as typical determinants of nonresponse in the HFCS imputation model are, for example, variables that describe the household (e.g. estimated household income, household size, number of children) and household members (e.g. age, education, sex and employment status of the household’s first individual and his/her partner) as well as stratification variables (e.g. province, municipality size) and information provided by the interviewers (e.g. standard of living, type of neighborhood, building condition, interview atmosphere, etc.). The latter pieces of information (paradata) were extremely important for the imputations, since they provided plausible explanations for item nonresponse for many variables.
- In addition, include variables that are well suited to predicting and explaining the relevant variable to be imputed. This is the classic criterion for using predictors, and it helps to reduce the statistical uncertainty surrounding the imputations. These predictors are identified by their correlation with the variable to be imputed. For example, when imputing loan variables, we typically use the original loan amount (as mentioned above), the repaid loan amount or principal outstanding as predictors because, in most regressions, these variables can explain a considerable amount of variance. When imputing the market value of various types of real estate property, we usually include the purchase value and the length of time (in years) for which the household has already owned the respective property. When imputing loan variables, we typically (as described above) use the original amount, the loan repayment amount or the loan amount outstanding. Usually, these variables are connected logically (e.g. the outstanding principal is the original loan amount minus the sum of all loan repayments). However, in the course of imputation, it is not possible to preserve all of these logical connections, in particular if several of these variables are being imputed simultaneously.
- Remove the aforementioned predictor variables that have too many missing values in the subsample of missing observations of the variable to be imputed and substitute them with more complete predictors of these predictors. As a rule of thumb, predictors where the percentage of observed cases within this subsample is below 50% are removed and replaced by more complete predictors. This criterion helps to make the imputations more robust. Typical predictors of predictors include essential household characteristics, such as household size, the number of children, region, age, as well as the employment and marital status of the first individual.
- Include all variables that appear in the models that will be applied to the data after imputation. In other words, consider which economic theories might be tested based on the data and include those variables as predictors that are expected, according to these theories, to influence or explain the variable to be imputed. Failure to do so will tend to bias the results of potential data users when testing the hypothesis of one particular model. For example, the HFCS data provide detailed information on different components of households’ wealth, e.g. real assets or financial assets. This information is used for the analysis of wealth effects on consumption. Therefore, we use these variables both for the imputation of consumption expenditure and for the imputation of wealth variables.
Obviously, many variables in the survey – for example, the income, age or education of the first individual – fulfill more than one criterion for selecting predictors.
We also include the final survey weights in all regression models (see the discussion in section 5.4.4) and an interaction term, as well as a main effect dummy for each of the abovementioned predictor variables that were only asked from a subsample of the households asked about the variable to be imputed. For example, suppose that we want to impute a household’s consumption expenditure using the mortgage amount as one of our predictors. While every household in the sample was asked about consumption expenditure, not all of them were asked about mortgage amounts. If, for those households that do not have a mortgage, we just set the mortgage amount to zero (corresponds to an interaction term), the estimates would be biased, because the information on whether a household has a mortgage or not would be omitted. This information should thus be additionally included as a main effect dummy in the regression model. But again, not all households were asked whether they have a mortgage, just homeowners. Thus, we should also include a homeowner dummy in the regression.
Finally, the number of predictors is restricted by the size of the subsample for which the regression is estimated. In cases where the subsample size is smaller than the number of predictors selected according to the above strategy, we use the Akaike information criterion to choose the subset of predictors which best fits the data, ensuring that, if possible, each of the above four predictor categories is represented in each regression equation. Typically, the number of predictors used for each regression model is around 20% of the number of observations for the variable to be imputed. More details on the specification of subsamples can be found in the next section.
5.4.8 Specification of subsamples
Each regression in step 3 is estimated over a subsample consisting of all households and individuals that were asked the question pertinent to the variable to be imputed. For example, if a household has two mortgages and we want to impute the outstanding amount of the second mortgage, then we impute this missing value by regressing over the subsample of households that have at least two mortgages. If we also included the households that only have one mortgage when imputing the second mortgage amounts, we would ignore systematic differences between the first and second mortgages. For example, we would ignore the fact that the outstanding amount of the first mortgage is always higher than the second one, because mortgages are ordered by importance, which would introduce a bias to our estimates. 65
A further example is the imputation of individual variables. These are also only regressed over the subsample of people that share the same ID. To ensure the homogeneity of people with the same IDs, respondents are grouped into new ID categories created specifically for the imputation (see section 5.4.5), and which then form the mentioned subsamples. When imputing question by question, as we do, the bias will be very small, though at the cost of precision because, consequently, the subsample sizes are often small.
5.4.9 Number of cycles
In step 4, the number of cycles (or iterations) t determines how often step 3 is repeated. As t tends to infinity, the imputed values should converge to a draw from the joint posterior predictive distribution of the variables with missing values. However, according to Van Buuren et al. (1999), in practice, convergence in these models usually occurs very quickly during the first few iterations. Given the large computational effort required for the HFCS imputation model, we set the iteration number for the HFCS imputation model at t = 10, which is in line with other similar surveys, like the SCF (Kennickell, 1998) and the EFF (Barceló, 2006).
Typically, we check convergence graphically by plotting the mean of the imputed values against the iteration number t. Convergence is judged to have occurred as soon as the pattern of the imputed means becomes random and a definite trend can no longer be observed.
In the fourth wave of the HFCS, we additionally examined the convergence of selected variables using the Gelman-Rubin convergence diagnostic, which is used very frequently in literature (for more details, see e.g. Cowles and Carlin, 1996). According to this diagnostic, convergence of a variable is reached when the variance of an estimate of this variable (e.g. the mean, median or other percentiles) is relatively small within chains of multiple imputation samples compared to the variance of the same estimate between cycles. 66 All variables examined in the fourth wave of the HFCS meet this criterion. 67
Of course, such tests (just like any other diagnostic test to assess chain convergence) can never confirm the existence of convergence (see section 5.3). But they are useful for pointing out weaknesses of the imputation model or other unusual results that could indicate nonconvergence.
5.4.10 Number of imputation samples
In the last step (step 5), we choose the number of realizations m = 1,2,…, M that we want to have from the joint posterior predictive distribution of the missing data or, put more simply, the number of samples to be generated through multiple imputation. Setting M too low leads to standard errors of estimates that are too low and to p-values that are too low. However, Schafer and Olsen (1998) show that the gains in efficiency of an estimate rapidly diminish after the first few M imputation samples. They claim that good inferences can already be made with M = 3 to M = 5. In line with the international standards set by the ECB and other similar surveys (like the SCF or EFF), we set the number of imputations at M = 5.
5.5 Selected results
After imputation, the HFCS dataset is five times bigger than before, because it consists of M = 5 multiple imputation samples (also referred to as “implicates”). Table 8 provides first insights into the imputation output. It shows the weighted means of selected euro variables in both the multiple imputation samples and the original unimputed sample.
For several variables the means are, on average, higher after imputation than before imputation. If imputations are close to the true values, the result suggests that households that do not respond to the relevant variables tend to be households with higher (unobserved) amounts in these variables. For example, the mean value of the first gift/inheritance (without main residence) is EUR 97,930 before imputation. After the respective imputations, it increases to EUR 120,590 in m = 1, EUR 109,209 in m = 2, EUR 118,374 in m = 3, EUR 117,520 in m = 4, and EUR 120,175 in m = 5. Beware the ECB changed the recording of inherited HMR into the loop. Thus, on average the imputations increase the mean value of the first gift/inheritance from EUR 97,930 to EUR 117,174, i.e. by 20%. Additionally, about 40% of the values imputed in this context are based on range responses by households, which suggests that households with more valuable inheritances tend to respond to this question less often than households with less valuable inheritances. Further large increases in comparison to the unimputed sample occur when imputing mortgage loans. Households’ range responses again play an important part here, as they provide valuable and often very precise information for the imputations (see also table 6).
However, for other variables, the mean does not change significantly, or even decreases in some cases. For example, the mean amount spent on food eaten at home does not change significantly after imputation, due to the low item nonresponse rate of this variable (see table 6). The mean gross income from financial investments is even lower after imputation than before imputation, which suggests households receiving a substantial amount of income in this category are more likely to know it.
Finally, table 8 also shows that the uncertainty of imputations can vary a lot depending on the variables. For some variables (e.g. other property mortgage 1), the means show a relatively high variance among the five multiple imputation samples, signaling the uncertainty of the imputed values due to the lower number of observations for these variables. For other variables (e.g. gross income from unemployment benefits or the monthly amount paid as rent) the mean values show a relatively low variance among the five multiple imputation samples, which in turn signals a higher precision of the imputed values. Had we conducted a single imputation of the variables – with only one imputation sample – instead of multiple imputations, the variance of the estimates would be too low, since the uncertainty behind the imputed values would be disregarded, and they would thus be treated like true values. The variance among the five multiple imputation samples is within the range of the previous waves.
Mean
beforeimputation |
Multiple imputation sample means | |||||
---|---|---|---|---|---|---|
m=0 | m=1 | m=2 | m=3 | m=4 | m=5 | |
EUR | ||||||
Value of main residence1 | 380,145 | 384,192 | 382,733 | 383,990 | 383,539 | 386,126 |
HMR mortgage 1: amount still
owed |
88,621 | 85,730 | 87,748 | 89,044 | 88,749 | 88,290 |
Monthly amount paid as rent | 439 | 415 | 411 | 408 | 413 | 412 |
Other property 1: current
value |
317,025 | 308,453 | 317,038 | 335,956 | 355,280 | 363,385 |
Other property mortgage 1:
amount still owed |
113,494 | 110,837 | 114,354 | 118,980 | 106,954 | 104,792 |
Value of sight accounts | 4,967 | 5,366 | 5,114 | 5,055 | 5,162 | 5,050 |
Value of saving accounts | 27,320 | 32,145 | 31,133 | 31,834 | 30,514 | 31,928 |
Value of publicly traded shares | 31,485 | 77,291 | 28,197 | 30,489 | 33,184 | 32,386 |
Gross employee income
(person 1) |
35,033 | 35,208 | 35,214 | 35,088 | 35,040 | 35,226 |
Gross income from
unemployment benefits (person 1) |
8,410 | 8,411 | 8,381 | 8,076 | 8,287 | 8,326 |
Gross income from financial
investments |
443 | 324 | 321 | 324 | 338 | 328 |
Gift/inheritance 1: value | 97,930 | 120,590 | 109,209 | 118,374 | 117,520 | 120,175 |
Amount spent on food at home | 413 | 416 | 417 | 416 | 416 | 417 |
Source: HFCS Austria 2021, OeNB. | ||||||
1 Based on the HB0900 variable. | ||||||
Note: All means are estimated over the observations “Household has item = yes.” The number of these observations
may vary across the different imputation samples m if we imputewhether households have the relevant item or not. HMR = household main residence. |
5.6 Concluding remarks
We have shown that imputation is necessary for analyzing the HFCS dataset because, compared with complete-case analysis, it decreases the nonresponse bias of estimates when complete observations differ systematically from incomplete ones. It also decreases the loss of information in analyses because no observations need to be deleted. We chose a multiple imputation with chained equations to create five multiple imputation samples. For information on analyzing multiply imputed data in Stata, please see the HFCS User guide (chapter 9).
54 A common related problem that occurs in surveys is unit nonresponse, which means that no questions are answered at all because, for example, a household declined to take part in the survey. This problem is addressed with the construction of HFCS nonresponse weights (chapter 7).
55 For a comprehensive study, see Little and Rubin (2019).
56 See chapter 4 for more details.
57 The nonresponse rate is calculated by subtracting the value in the “amount” column (column 3) in table 6 from 100%.
58 This procedure is also known by several other names, including “stochastic relaxation,” “regression switching,” “sequential regression,” “incompatible MCMC” and “fully conditional specification.”
59 Albacete (2014) provides further technical details on the imputation procedure used for the Austrian Household Survey on Housing Wealth 2008, which is identical to that used for the HFCS.
60 See Little and Rubin (2019) for an overview of imputation techniques.
61 A very small fraction of these variables that could not be imputed with the HFCS imputation procedure were imputed with ad hoc methods such as hotdeck imputation after the HFCS procedure had been completed (for more information on the changes due to COVID-19 see box 3 and chapter 10). This is because their imputation is considered very important as they are used, for example, to calculate important aggregate variables, such as total household income.
62 The range regression model is a generalized version of the Tobit model. It is used to account for censoring from below and/or above. See Cameron and Trivedi (2005) for more details.
63 The nominal variables on the three-digit International Standard Classification of Occupations (ISCO) and the three-digit European statistical classification of economic activities (Nomenclature of Economic Activities – NACE) classifications, which were difficult to estimate with a multinomial logit model because they contain a very large number of categories (74 and 121, respectively), represent the only exceptions. In these two cases, the predictive mean matching (PMM) procedure was used for each missing value to first, predict a value by linear regression and second, impute the observed value that is closest to the regression-predicted value.
64 In the dataset, financially knowledgeable persons are designated with the ID = 1 by default.
65 Even if, in such cases, we could introduce a large number of interaction terms to our model to reduce the bias, there might still be unobserved differences between the two groups.
66 The Gelman-Rubin diagnostic is the root of [(t–1)/t) + (BV/WV)], with BV denoting the between-chain variance and WV the within-chain variance. If the Gelman-Rubin values are below 1.2 to 1.1, they are usually considered to denote convergence.
67 The following variables were tested: HB0900, HD1110, HD1210, HD1510, HB1701, HB2801, HB4400, HI0100, HI0200 and HI0310.
6 Sampling
6.1 Introduction
The sampling design for the fourth wave of the HFCS in Austria was specifically developed by the OeNB in collaboration with the survey company IFES (Institut für empirische Sozialforschung GmbH), which also executed the survey. Sampling is understood as the selection of a set of units (i.e. a sample) from the whole population on the basis of which conclusions can be derived about the behavior of the whole population. Thus, the units of the sample should be representative of the whole population; in other words, an analysis of the sample (using appropriate weights) is expected to lead to the same estimates as an analysis of the whole population. Another criterion of major importance for the HFCS – coverage of households in all geographic regions – is achieved through stratified sampling, i.e. by dividing a country into smaller units from which the sampling units are drawn. Although some degree of statistical uncertainty cannot be ruled out, sampling – together with imputation and weighting – serves to produce best unbiased estimates (and confidence intervals). It further keeps uncertainty as low as possible taking restrictions like cost, time and practicability into account. Therefore, every survey is highly dependent on the quality of its sampling design.
The survey sample was initially drawn before the outbreak of COVID-19 in Austria. After the delay of the field period, we kept the initial sample. As the time between sample design including the data in the background and contacting the household was extraordinary long, we expected some more neutral dropouts as people move away or die. Furthermore, the gross sample included 11 addresses that declined to be contacted during the delay of the field period. They were classified as “inaccessible” and treated as addresses with an unknown eligibility status.
This chapter describes the sampling procedure for the HFCS in Austria and is structured as follows: First, we define the target population (section 6.2) and provide a short overview of the sampling design in box 4. This part is followed by a description of the required external data on geography and population (section 6.3). Next, we detail the stratification process (section 6.4) and the two stages of drawing the survey’s sample population (section 6.5), which form the main part of the sampling procedure. Section 6.6 completes the chapter with some concluding remarks.
6.2 Target population and sampling frame
The first step in determining the sampling procedure is to define the target population of the survey. The survey is intended to cover all households living permanently in Austria, independent of citizenship and/or residence status. In line with the common ECB definition, a household in the HFCS is defined as
“a person living alone or a group of people who live together in the same private dwelling and share expenditures, including the joint provision of the essentials of living. Employees of other residents (i.e. live-in domestic servants, au-pairs, etc.) and roommates without other family or partnership attachments to household members (e.g. resident boarders, lodgers, tenants, visitors, etc.) are considered separate households.” 68
More specifically, according to the ECB’s definition the following persons are to be regarded as household members if they share household expenses (which includes both benefiting from and contributing to their coverage):
- persons usually resident, related to other members
- persons usually resident, not related to other members
- persons usually resident, but temporarily absent from dwelling (for reasons of holiday travel, work, education or similar)
- children of household being educated away from home
- persons absent for long periods, but having household ties: persons working away from home
- persons temporarily absent but having household ties: persons in hospital, nursing home, boarding school or other institution.
Sampling in the HFCS in Austria
The HFCS in Austria is based on a stratified, two-stage cluster sampling design:
“Stratified sampling” ensures that the data collection units – i.e. households for our purposes – are drawn from all parts of the population. Stratification in the Austrian HFCS was carried out geographically (based on NUTS-3 regions 69 ) and by municipality size.
“Two-stage cluster sampling” refers to a process where primary sampling units (PSUs) are first selected from each geographical unit (i.e. stratum) and secondary sampling units (SSUs) are then drawn from within each selected PSU. The two-stage sampling design of the Austrian HFCS (see the infographic below) entails, first, selecting a random sample of enumeration districts (the smallest geographical unit for which statistical data are available) from each stratum and, second, selecting a random sample of households (postal addresses) from each sampled enumeration district. As in the previous waves of the HFCS in Austria, 70 the probability of being drawn during this first stage is proportional to the number of households in the respective PSUs. 71 The households constitute the secondary sampling units (SSUs) and are selected at random from within a drawn PSU. The two-stage cluster design reduces costs due to relatively small distances between the 12 households (8 households in strata with over 50,000 inhabitants) selected within each PSU, while it ensures a sufficient number of PSUs within the individual stratum.
This guarantees that households from every single stratum are invited to take part in the survey. In total, the gross sample of the fourth wave of the Austrian HFCS consists of 188 strata, 598 unique PSUs and 6,300 households.
In the case of the HFCS, the target population does not include households that are institutionalized, such as households living in
- homes for elderly people,
- military compounds,
- monasteries,
- prisons, and
- boarding schools.
Additionally, the Austrian HFCS does not cover homeless people. People without a residence were not reached with the survey, as sampling was based on fixed dwellings (see section 6.3). At the same time, the HFCS in Austria is not limited to households officially registered at their main residences.
In order to draw a sample from this target population, we would need a complete list of households in Austria. As such a list does not exist, we use a complete list of postal addresses in Austria as our sampling frame. These external data, explained in more detail below, provide the best possible sampling frame in the sense that (almost) all households in Austria appear in the data (and appear only once) and that the data are highly up to date.
6.3 Background – the (external) datasets used
Given the definition of the target population, geographical data, as well as data on households in Austria are needed. A representative draw from the sample requires the target population to be correctly represented by the sampling frame. The frame data are perfect “if every element appears on the list separately, once, only once and nothing else appears on the list” (Kish, 1995, p. 53). In practice, it is not possible to achieve this theoretical optimum. The Austrian HFCS has been designed to meet this goal subject to the constraints of the data sources available. The following sections give more details on the data which provided the basis for the sampling used in the HFCS in Austria.
For the HFCS in Austria, we relied on two different sources: We used data from Statistics Austria for the purpose of stratification and for selecting a random sample of PSUs (primary sampling units; in Austria those are the enumeration districts) and we used post office data to draw the households, the actual SSUs (secondary sampling units), at random. The advantage of the post office data is that they are up to date and that the data fit the HFCS definition of households.
6.3.1 Statistics Austria
We used information about the geographical structure of Austria, i.e. data on the NUTS-3 regions, and the enumeration districts (PSUs) from Statistics Austria. 72 These enumeration districts are the smallest territorial units in Austria for which basic data characteristics are collected by Statistics Austria by default (each enumeration district contains around 461 dwellings on average). 73
In addition, we relied on the municipality directory of 2019 (to categorize by municipality size). Unlike in the early waves, we did not use population data from Statistics Austria’s register-based census from 2011 to determine the number of sampling units to be drawn within each stratum. Instead, we relied on post office data, which also contained the population (households) of each stratum (see section 6.3.2 below).
6.3.2 Austrian Post Office
Once the appropriate primary sampling units have been randomly selected, information on the households is needed to complete the sample selection. The dataset of choice for the purpose of the HFCS was a dataset of postal addresses for sale from the Austrian Post Office, based on the assumption that the number of households living in each building corresponds to the number of postal addresses. Specifically, we used a commercial product called “Adress.Certified” developed by the Austrian Post Office. This address register contains information about individual buildings (including street name, house number and whether the building is used privately or commercially). It can be purchased in combination with a product called “DATA.DOOR,” which is a directory of post office-certified address codes (shortened to PAC), i.e. a directory of all addresses in Austria which mail can be delivered to. This information is available in disaggregated form. In Austria, there are about 4,390,000 private mail delivery points. Addresses identified by the Austrian Post Office as vacation homes have already been excluded.
Thus, our starting point was some 4.39 million private mail addresses. Very few remaining commercial addresses and ineligible addresses had to be removed after the first contact by the interviewer (e.g. if the interviewer arriving at a given address reported the address as incorrect or found it to house a commercial building) and were given weights of zero, since they do not belong to the target population (see chapters 4 and 7). Moreover, households at secondary residences whose main residence address was identifiable as such were also excluded from the dataset of the sampling frame or given a weight of zero to ensure that every household was included only once in the list of post office-certified address codes. After removing these commercial addresses and ineligible addresses, the total of all weights comes to roughly 4.07 million, which means that Austria has an estimated 4.1 million households.
The post office data we used do not reflect whether a given address is a household’s main residence or not or whether the residence at that address is registered in the Austrian residence registry (Zentrales Melderegister). Yet they provide a realistic picture of households and thus meet the HFCS requirement of reflecting actual living situations. Unlike other data sources (e.g. EU SILC), the post office data cover households at addresses that are registered as a secondary home or that are not registered at all, but fulfill the HFCS definition of a household. They have thus been included in the sampling frame because they have a post-certified address code. 74
6.3.3 Profile.Address and IFES
To identify the names of the households that correspond to the selected postal addresses – information that is not evident from the datasets described so far – the survey company, IFES, relied on its databases or obtained the corresponding addresses from a commercial provider called “Profile.Address.”
This information was needed in the contact phase when households received individualized letters of invitation to participate in the survey. 75
6.4 Stratification and sample size
6.4.1 Stratification
The Austrian HFCS essentially used two indicators for stratification, the first one being the 35 NUTS-3 regions (see chart 4).
With the exception of the capital city Vienna, each NUTS-3 region was divided further into the following eight municipality population size categories. The data from Data.Door from the Austrian Post contain information that serves to map each address to a province, municipality and enumeration unit as they are defined by Statistics Austria.
- Up to 2,000 inhabitants
- 2,001 – 3,000 inhabitants
- 3,001 – 5,000 inhabitants
- 5,001 – 10,000 inhabitants
- 10,001 – 20,000 inhabitants
- 20,001 – 50,000 inhabitants
- 50,001 – 1 million inhabitants
- Over 1 million inhabitants
The large “50,001 – 1 million inhabitants” category essentially contains just the provincial capitals. Vienna is a special case as it is the capital city and the only city in Austria with more than 1 million inhabitants; it was subdivided into its 23 districts.
This very fine stratification yielded 198 strata. Where the number or proportional share of households per stratum was too small to allow the selection of enumeration districts, individual strata were merged with neighboring strata to increase the share of households and thus insure the selection of at least one PSU from each stratum. This exercise left the HFCS with 188 strata for sampling covering all households in Austria. The distribution of strata across provinces and municipality size categories can be seen in table 9.
Municipality size (number of inhabitants)1 | |||||||||
---|---|---|---|---|---|---|---|---|---|
up to
2,000 |
2,001–
3,000 |
3,001–
5,000 |
5,001–
10,000 |
10,001–
20,000 |
20,001–
50,000 |
50,001–
1 million |
over
1 million |
Total | |
Vienna | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 23 |
Lower Austria | 7 | 7 | 7 | 7 | 6 | 5 | 1 | 0 | 40 |
Burgenland | 3 | 2 | 1 | 2 | 1 | 0 | 0 | 0 | 9 |
Styria | 6 | 5 | 6 | 6 | 5 | 1 | 1 | 0 | 30 |
Carinthia | 3 | 3 | 3 | 3 | 2 | 1 | 1 | 0 | 16 |
Upper Austria | 5 | 5 | 5 | 5 | 3 | 2 | 1 | 0 | 26 |
Salzburg | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 0 | 13 |
Tyrol | 5 | 5 | 3 | 3 | 3 | 0 | 1 | 0 | 20 |
Vorarlberg | 2 | 2 | 2 | 2 | 2 | 1 | 0 | 0 | 11 |
Total | 34 | 31 | 29 | 30 | 24 | 11 | 6 | 23 | 188 |
Source: Statistics Austria (municipality directory 2019). | |||||||||
1 Municipality size accounts for municipality mergers up to and including 2019. |
Each stratum contained about 86 PSUs on average, which in turn contained around 482 households 76 on average.
6.4.2 Sample size
The variance of estimates based on the underlying data will be smaller the larger the sample is. At the same time, the cost of data collection increases with sample size. Therefore, a balance has to be found in order to yield reasonably precise estimates whilst taking into account the given budget constraints. Furthermore, given the focus of the survey and the analyses that the HFCS is to carry out, the HFCS should produce enough observations to allow for an analysis of subpopulations (e.g. indebted households, which are only a (small) fraction of the sampling frame) and provide some insight into the regional differences within Austria. We know from previous OeNB surveys (e.g. the two previous waves of the HFCS in Austria in 2010, 2014, 2017 and the Household Survey on Housing Wealth in 2008) that at least 2,000 households, and 2021 need to be successfully interviewed and that the unit nonresponse rate can be expected to reach some 40% to 50% (with expected differences between Vienna and the rest of Austria). 77 With some leeway for extraordinary circumstances, the HFCS survey was designed to yield a sample of 3,000 successfully interviewed households and a participation rate of about 37% in big municipalities (more than 50,000 inhabitants) and an average of around 41% in the rest of Austria. This leeway was particularly helpful during the COVID-19 pandemic. These participation rates are estimates based on the experience of past surveys. The participation rates recorded in the previous waves in a stratum were used to determine the exact number of households to be drawn in the respective stratum in the third wave.
% of
households |
Target
sample |
Gross
sample |
Number of
households per PSU (enumeration district) |
Number
of PSUs to be drawn |
|
---|---|---|---|---|---|
Vienna | 25.1 | 754 | 1,936 | 8 | 242 |
Lower Austria | 18.6 | 558 | 1,020 | 8 / 12 | 87 |
Burgenland | 3.2 | 97 | 180 | 12 | 15 |
Styria | 13.7 | 412 | 824 | 8 / 12 | 81 |
Carinthia | 6.1 | 184 | 376 | 8 / 12 | 38 |
Upper Austria | 15.0 | 451 | 892 | 8 / 12 | 85 |
Salzburg | 6.0 | 180 | 368 | 8 / 12 | 37 |
Tyrol | 8.0 | 240 | 476 | 8 / 12 | 45 |
Vorarlberg | 4.1 | 123 | 228 | 12 | 19 |
Total | 100 | 3,000 | 6,300 | 649 | |
Source: Austrian Post, data.door October 2019, HFCS Austria 2021, OeNB |
The targeted net sample of n = 3,000 was divided between the nine provinces, based on their share of private addresses as collected by the Austrian Post Office (table 10, column 1). These figures, which corresponded to the targeted number of secondary sampling units (SSUs, column 2), were subsequently translated into gross samples of SSUs based on the estimated participation rates (column 3). Due to the shorter distances between buildings, 8 households were selected in Vienna and in strata with more than 50,000 inhabitants, whilst this number was 12 in the rest of Austria (column 4). The number of PSUs to be drawn in each province was calculated on this basis (column 5).
In total the Austrian HFCS sample design produced 649 (598 unique) PSUs across all strata and a gross sample size of 6,300 households that were invited to participate in the HFCS (see box 5 in chapter 7 for information on the number of households interviewed successfully). Drawing the PSUs was done with replacement sampling, which entails some PSUs being drawn multiple times (see section 6.5.1). Drawing possible substitute addresses was excluded from the HFCS to begin with to ensure that all households from the gross sample would be interviewed with the same commitment so as to prevent data distortions (see also section 4.4.1).
6.5 The two stages of the random draw
The Austrian HFCS is based on a stratified, two-stage cluster sampling design:
- stage one: random draw of PSUs (enumeration districts) from each stratum
- stage two: random draw of a predefined number of households (postal addresses) from each PSU
6.5.1 First stage
We chose the smallest territorial units in Austria, the so-called enumeration districts (of which there are 8,825), as the PSUs for the Austrian HFCS. On average, an enumeration district contains 461 households, but there are also PSUs with only a few households. Such units were aggregated with neighboring units to ensure that each PSU contains at least 50 households and that at least one PSU can be chosen per stratum. This aggregation process narrowed the number of PSUs down to 8,430, which then had 482 households on average. The description above shows that the number of PSUs to be drawn in each province is determined a priori by the chosen sample size and stratification. To translate the numbers allocated to each region (table 10) into the desired number of PSUs within a given stratum, the total number of PSUs in the respective province was distributed proportionally according to the number of households in the respective stratum. For example, the 85 PSUs to be drawn in Upper Austria (table 10) were divided up among the 26 strata in this province according to its population share.
First stage | Second stage | |||||
---|---|---|---|---|---|---|
Statistics Austria | Austrian Post Office | Profile.Address/IFES | ||||
Municipality
code (1) |
Enumeration
district (2) |
Postal
code (3) |
Street
(4) |
House
number (5) |
Mail delivery
point (PAC) (6) |
Name of
household (7) |
90101 | 90101001 | XXXX | Sample street | 6 | 101255765 | John Doe |
90101 | 90101001 | XXXX | Sample street | 6 | 101255766 | Jane Doe |
90101 | 90101002 | XXXX | Sample street | 9 | 101255767 | John Doe |
90101 | 90101001 | XXXX | Sample street | 10 | 101255768 | Jane Doe |
Source: Statistics Austria, Austrian Post Office, Profile.Address/IFES. |
After determining how many PSUs were to be drawn per stratum, the PSUs – unlike during the first wave of the HFCS in Austria 78 , but as in the second and third wave – were drawn proportionally to their size (measured in terms of the number of households in a PSU). 79 The purpose of this approach is to reduce the standard errors of estimates by reducing sample design weight variance (see also section 7.2.2). Likewise, it ensures that every household within a stratum has the same probability of being drawn in the gross sample of the HFCS. PSUs were drawn with replacement, meaning that a PSU can be drawn multiple times. This meant that a total of 649 PSUs were drawn in the third wave of the HFCS in Austria, only 598 of which were unique.
6.5.2 Second stage
With 649 (598 unique) PSUs having been randomly drawn, we turn to the second stage in which households are selected.
Mail delivery points were randomly selected from each PSU drawn, with 8 being chosen in Vienna or in strata with more than 50,000 inhabitants and 12 being chosen in all other strata. In this process, every household in a given PSU has an equal probability of being selected in the sample, which is measured as a ratio of 1 to the number of households in that PSU. This procedure resulted in a gross sample of 6,300 households in Austria.
6.5.3 Practical implementation
Table 11 illustrates how the data from the second stage of the sampling was used, after the PSUs had been chosen in the first stage (column 2): Austrian Post Office data (column 6) were used to determine the appropriate mail delivery point, which gives the address, but not the holder of this address. To identify the name of the household corresponding to the selected postal address, the survey company, IFES, used its own databases or, where necessary, bought the corresponding name from the company “Profile.Address” (column 7).
Since the first contact with a household is very important for a successful interview, every household selected for the HFCS survey received an individualized letter signed by the governor of the OeNB. This letter contained information on the survey and an invitation to take part (see section 3.5.1). 80
6.6 Concluding remarks
This chapter provides information on the sampling design as developed for the third wave of the HFCS in Austria on the basis of the designs for the first three waves. Improvements reflect the experience of past waves. As described, the survey is based on stratified, two-stage cluster random sampling, consisting of a random draw (proportional to the number of households) of primary sampling units (PSUs; here, enumeration districts) from each stratum plus a random draw (with an identical probability of being drawn within a given PSU) of households (postal addresses as available from an Austrian Post Office database) from the selected PSUs.
The sampling method used for the HFCS has a number of advantages, with the following aspects being particularly important:
- The fact that the probability of drawing a PSU was proportional to its size in terms of the number of households ensured a better efficiency of sampling design as compared to a design in which all PSUs could be drawn with the same probability by reducing the variance of the design weights.
- As sampling does not differentiate between main residences and second homes (as recorded in the residence registry), all households that correspond to the HFCS household definition have a positive probability of being selected.
- The very fine stratification structure ensures that all segments of the Austrian population are represented in the survey.
At the same time, given the topics covered by the HFCS it would be desirable to oversample certain groups of the population, such as wealthy households, to improve the efficiency of estimates for these subgroups. However, management decided against the implementation of such an oversampling scheme.
68 See ECB (2013a), p. 80f. and ECB (2016), p. 11f.
69 See https://www.statistik.at/en/services/tools/services/regional-information/regional-divisions; accessed on June 9, 2023). Austria is divided into 35 NUTS-3 regions. These regions typically consist of several neighboring political districts or correspond to urban areas including the capital cities of the provinces.
70 In the first wave this was not the case.
71 In accordance with the literature (see Williams, 2014; or Valliant et al., 2013), if the calculation of this probability was “greater than one,” then these PSUs were put into the sample with probability one.
72 See also https://www.statistik.at/en/services/tools/services/regional-information/regional-divisions (accessed on June 9, 2023).
73 The estimated number of households in Austria according to the HFCS definition (4.1 million) divided by the number of enumeration districts (8,825) yields 460.8.
74 The post-certified addresses for some 4.4 million households compare with about 3.9 million household addresses documented by other sources (such as the microcensus based on the residence registry; see also section 7.2.4).
75 For more details on the contact strategy, see section 3.4.
76 The difference between this figure and the 461 households per PSU cited earlier is the result of aggregation (see also section 6.5.1). Similar to the data above we used the roughly 4.1 million households according to the HFCS definition in the 8,430 enumeration districts resulting from the aggregation to calculate this average.
77 The experience of regional differences between participation rates in past waves of the HFCS in Austria were taken into account in the sample design for the third wave.
78 During the first wave, the probability of drawing a PSU within a stratum was identical for every PSU.
79 Mathematically, the probability of an enumeration district being drawn in a given stratum can be expressed as the number of households in a given PSU divided by the total number of households in this stratum times the number of enumeration districts drawn.
80 See the online appendix for the invitation letter.
7 Construction of survey weights
7.1 Introduction
Survey weights are usually constructed for two reasons: first, to make the sample representative of the target population and second, to reduce sampling variance.
The target population of the HFCS consists of all households in Austria, with a household being defined as an individual or a group of people who live together in the same private dwelling and share expenses. 81 However, the sample may contain several types of biases that may cause a misrepresentation of this target population: unequal probability sampling bias, frame bias and nonresponse bias (see chart 5).
As mentioned above, the unequal probability sampling bias is due to the fact that not every household has the same probability of being selected into the sample, reflecting the fact of oversampling of households in urban areas (like Vienna) in the HFCS sample, which is used to address the known problem of the relatively low survey participation propensity of urban households. To correct these misrepresentations, we constructed design weights, which will be explained in section 7.2.2. Further details about the HFCS sampling design can be found in chapter 6.
Imperfections in the survey frame from which the sample is drawn can lead to frame bias. In the HFCS, the sampling frame is a list of all personal postal addresses in Austria (see chapter 6). Erroneous exclusion of households could imply an imperfection with respect to the target population. In other words, there is the possibility that households without a postal address, for example, one-person households living together in residential communities and sharing an address that contains only one of these households, were excluded. These types of households would then be underrepresented. Another imperfection of the frame could be caused by erroneous inclusion, that is, the inclusion of addresses not belonging to households, for example, those of companies or households in care residences. 82 Finally, there is a third type of imperfection called frame multiplicity, which means that households may be duplicated because they have two (or more) addresses, for example multiple domiciles of commuters. Depending on its type, the frame bias can be reduced by using design weights 83 (to address erroneous inclusion and frame multiplicity) or poststratification weights (to address erroneous exclusion). We explain the construction of these weights in more detail in sections 7.2.2 and 7.2.4.
The nonresponse bias is caused by the fact that only a subset of the households included in the gross sample is willing to participate in the survey. Certain groups of households have a lower probability of participating in the HFCS than other groups – a phenomenon widely corroborated in literature (see e.g. Kennickell and McManus, 1993). Thus, estimates for the sampling frame would be biased with respect to these group characteristics, even though they are unbiased for the participating population. Using nonresponse weights can correct this bias (section 7.2.3).
Furthermore, as mentioned above, survey weights can help to reduce sampling variance, and, hence, to increase the precision of the estimators. Ideally, the precision of the estimators should be improved by stratification prior to sampling. However, some variables (e.g. household size) that would have been very good for stratification and, thus, for improving the precision of the estimators, were not available until after the sample had been drawn and the sample households had been contacted. Some of the gain in precision that would have been possible by using these variables for stratification can be achieved by using these variables for poststratification. These poststratification weights were also utilized for correcting erroneous exclusion (see chapter 7.2.4). 84
The construction of survey weights is very important for the HFCS. The following sections will explain how design, nonresponse and poststratification weights were constructed and how the final set of survey weights was derived from these weights. Finally, we will present some descriptive results that take these weights into account.
7.2 Construction of survey weights
7.2.1 Weight components
We aim to construct a final survey weight wi for every household i that is relatively small for households that are overrepresented in the sample compared to the target population and relatively large for households that are underrepresented. However, as already mentioned in the introduction, households may be misrepresented with respect to the target population for various reasons. Therefore, a specific adjustment using weights is required for every type of misrepresentation. In the HFCS, three types of weights are used: design weights wDi, nonresponse weights wNRi, and poststratification weights wPSi. The product of these three weights yields the final survey weight wi:
wi = wDi · wNRi · wPSi
Although some HFCS variables are asked at the individual level rather than the household level, no weights were constructed for individuals because the main focus of the survey is the household level.
7.2.2 Design weights
Design weights help reduce the unequal probability sampling bias, as well as rectify erroneous inclusion and frame multiplicities. In the HFCS, we compute the design weights on the basis of two-stage cluster sampling and the selection probabilities of the primary sampling units (PSUs) and the secondary sampling units (SSUs). In the first stage, the smallest territorial units, the so-called enumeration districts (PSUs), are drawn; then in the second stage, the households (SSUs) within these enumeration districts are drawn (see section 6). The probability that the ith household in the jth enumeration district is selected into the sample is the product of the selection probability for the enumeration district and the selection probability for the household, under the condition that the household’s enumeration district is selected. The inverse of this product is the preliminary design weight. The calculation of the design weight mirrors the two steps of the sampling procedure:
- Step 1: Calculate the probability that a certain PSU is selected. As described in section 6, this sampling probability is defined depending on the relative number of households in a PSU. The probability that PSU j will be selected in stratum h is
where Mhj represents the number of households in this enumeration district (h,j), mh the number of PSUs to be drawn in this stratum, and Nh the number of households in this stratum.
- Step 2: Calculate the probability that a SSU is selected. Under the condition that a PSU is chosen, each household in this enumeration district has the same probability of being chosen. Thus, the probability of being selected is given by
where mhj is the number of households to be drawn in the PSU (i.e. 8 in a stratum with a population of over 50,000 and 12 in the rest of Austria). As above, Mhj is the number of households in this enumeration district.
Overall, the ex ante selection probability Prob(i) for each household i is given by multiplying the two partial probabilities. This probability may be shown as:
The design weight (wDi) is calculated by inverting this probability. For example, a household with a probability of selection equal to 0.001 has a preliminary design weight of 1.000=1/0,001, which is much higher than that used for a household with a probability of selection equal to 0.009, which would be 111=1/0.009.
This procedure ensures that every household that has the same probability of selection within a stratum on account of the sample design also has the same design weight. The design weights vary across the strata, due to the differing assumptions on the willingness of a household to participate, which determine the SSUs to be drawn, and the different size of the strata as a result of the number of households.
Our sample included 352 Viennese addresses that could randomly not be contacted because the time for the field phase had already expired due to the low number of interviewers resulting from the extraordinary circumstances under which they had to work during the COVID-19 pandemic (see chapter 4 and chapter 10 for more details). For the the survey weights construction these 352 addresses were left out of the sample and, as a consequence, the design weights of their strata were adjusted, too, in order to still represent the frame population (see Box 5 for more details).
The COVID-19 pandemic and the construction of survey weights
The HFCS procedure of constructing survey weights was challenged by COVID-19 in several dimensions.
On the one hand, unit nonresponse strongly increased, by 11 percentage points from about 50% of eligible households in the previous wave to 61% in wave 4. All in all, 57% of eligible households actively refused to take part in the survey, 12 percentage points more than in the previous wave. As a consequence, the net sample size decreased by about 800 households, and accordingly the variance of the population estimates increased, which makes them less precise.
On the other hand, the unweighted net sample is biased toward households with older household members. The proportion of households with a reference person aged 65 or older increased by 11 percentage points from about 20% in the previous wave to 31% in wave 4. In order to correct for this bias, we kept, as in the previous wave, the average age in the municipality as a predictor of household participation in the survey when constructing nonresponse weights, and we introduced “age of the reference person” as an additional poststratification variable when constructing poststratification weights.
Finally, due to the low number of interviewers in wave 4, the HFCS gross sample included 352 random Viennese addresses distributed across 15 strata that could not be processed in time before the end of the field phase was declared. As these addresses were never contacted, it is unknown whether they are eligible for the HFCS or not, and their paradata are also not available. For the construction of the survey weights, they were dropped from the sample. Therefore, the design weights in the corresponding strata had to be adjusted so that they sum up again to the population total. The construction of the survey weights itself then followed the same procedure as in previous waves and as described in chapter 7. The survey weights remain robust against alternative approaches of how to treat these 352 addresses (see chapter 10.7 for more details).
Finally, although the sampling frame was carefully prepared and cleaned before sampling, our sample still included some ineligible (see box 6) or duplicated observations (see also section 4.6.2.13), for example company addresses, addresses of care homes or secondary residences. We flagged all such cases detected during the fieldwork as ineligible or duplicated in our sample by setting the design weights equal to zero. As a result, the design weight total decreased from about 4.39 million to 4.34 million.
Mean | Median | Minimum | Maximum | |
---|---|---|---|---|
Vienna | 690 | 585 | 0 | 1,102 |
Lower Austria | 790 | 776 | 0 | 1,197 |
Burgenland | 784 | 796 | 0 | 1,002 |
Styria | 720 | 691 | 0 | 1,186 |
Carinthia | 715 | 568 | 0 | 1,202 |
Upper Austria | 735 | 769 | 0 | 1,224 |
Salzburg | 702 | 715 | 0 | 1,095 |
Tyrol | 730 | 733 | 0 | 1,109 |
Vorarlberg | 792 | 801 | 421 | 1,044 |
Total | 730 | 735 | 0 | 1,224 |
Source: HFCS Austria 2021, OeNB |
Table 12 shows some statistics of the obtained HFCS design weights across Austria’s provinces. Carinthia and Vienna is the province with the lowest median weight, which is plausible, as households living in urban areas were oversampled during the fourth HFCS wave in Austria because of their low willingness to participate, which would have created a bias had they not been reweighted downward using the design weights. Very similar results are found in wave 3.
Unit nonresponse in the HFCS in Austria
In the fourth wave of the HFCS in Austria, successful interviews were conducted with 2,293 households from the gross sample, which comprised 6,300 addresses. The remaining 4,007 addresses were classified either as unit nonresponse cases (3,576 households), ineligible addresses (64 addresses) or addresses of unknown eligibility (367 addresses).
The unit nonresponse cases are households as defined in the HFCS that were not interviewed successfully for several reasons. The most common reason was that households actively refused to take part in the survey, either by refusing to be interviewed, breaking off the interview or by failing to keep the interview appointment and being subsequently unavailable for contact. This applied to a total of 3,342 households. Another reason was that no contact at all could be established with 60 households. The remaining 174 nonrespondents specified other reasons, such as illness, language barriers; or they resulted from ex post exclusion of interviews due to a high number of missing or unreliable values.
In addition, 64 addresses were classified as ineligible because they were not part of the target population, as they were, for instance, addresses of companies, empty buildings or second homes of households that could be reached via their main residence address. Finally, the eligibility status of another 367 addresses was impossible to ascertain. Most of them (352 addresses) were Viennese addresses that were randomly not contacted due to the low number of interviewers resulting from the extraordinary circumstances under which they had to work during the COVID-19 pandemic (see Box 5 for more details). The eligibility status of the remaining 15 addresses was also impossible to ascertain, as the interviewers were unable to reach or find them. In accordance with how the eligibility statuses of the rest of the observed addresses in the sample were distributed, 0 of the 15 addresses were randomly chosen to be ineligible and, thus, all 15 to be eligible.
Expressed in percentages, the eligibility rate in the HFCS sample ultimately came to 99% and the nonresponse rate of the eligible households amounted to 61%. This means that successful interviews were conducted with 39% of the eligible households in the HFCS sample. This figure is significantly below the one of the previous waves (see box 5 and chapter 10 for more information on the changes due to COVID-19). About 57% of the eligible households actively refused to take part in the survey.
The value of a household’s design weight can be interpreted as the number of households in the sampling frame that is represented by this household. For example, the median household in Vienna represents 585 households in the sampling frame.
7.2.3 Nonresponse weights
As described in box 6, not all households participated successfully in the survey. If household characteristics correlate with nonresponse, the respondent population is not a random subsample of the sampling frame and the sample is nonresponse biased (see chart 5). In the HFCS, this is indeed the case, as can be seen in table 13. The table shows a logit regression of household participation in the survey (1 if the household participated, otherwise 0) on a set of variables that explain participation in the survey. The results show on the one hand that households living in municipalities with higher personal incomes or with higher unemployment rates or with higher crime rates have a lower probability of participating. On the other hand, households that live in areas with higher average age had an increased probability of participating. This suggests that nonresponse is not random.
This bias can be corrected by using nonresponse weights, i.e. by attaching a higher nonresponse weight to households with a low probability of responding than to households with a high probability of responding. To calculate the response probabilities and the corresponding nonresponse weights, the weighting class adjustment method is combined with the model-based adjustment method (see Biemer and Christ, 2008). The weighting classes are chosen optimally using the method described by Haziza and Beaumont (2007). The algorithm can be summarized in the following three steps:
- Step 1: The logit regression model shown in table 13 was used to estimate the probability of response for each household (assuming that the household was selected into the sample).
- Step 2: These households’ response probabilities were grouped into seven classes. The number of classes and their resultant sizes are chosen optimally in line with Haziza and Beaumont (2007). To this end, a k-means algorithm is used to cluster households into a prespecified number of response classes with low variance and similar size. Next, class indicators are used as explanatory variables for the response propensity based on an ordinary least squares (OLS) regression from the logit regression model estimated in step 1. Beginning with one class, the number of classes is increased in an iterative process until the corrected R2 of this OLS regression exceeds 95%. This is the case for seven classes in the fourth wave of the HFCS in Austria. 85 Finally, the average response propensity for each class was calculated (unweighted total number of respondent households/unweighted total number of households). 86
-
Step 3: The nonresponse weight of a class is obtained by inverting the average response propensity of the respective class.
Table 13: Response propensity estimates based on a logit regression model Covariates Coefficients Paradata on the interview, place of
residence and neighborhoodHousehold interview order 0.000219 (0.000260) Building characteristics (reference group:
detached single-family house)Semi-detached single-family house –0.140 (0.154) Single-family townhouse 0.115 (0.152) Residential farm building 0.0824 (0.178) Apartment in a (high-rise) apartment
building0.848*** (0.0812) Student dormitory/rented room 1.465* (0.764) Other type of building –1.012** (0.411) Building design characteristics (reference
group: premium)Very good –0.0620 (0.176) Medium –0.402** (0.180) Basic –0.0690 (0.207) Very basic –0.226 (0.291) Location characteristics (reference group:
city center)Between the city center and suburbs 0.0736 (0.0882) Suburbs and city outskirts 0.0264 (0.0984) Countryside 0.114 (0.114) Graffiti in the neighborhood (reference
group: many)Location – graffiti = 2, some 0.410 (0.642) Location – graffiti = 3, few 0.607 (0.636) Location – graffiti = 4, none at all 0.375 (0.631) Paradata not available –19.976 . Interviewer characteristics Female interviewer 0.0219 (0.0630) Interviewer’s age 0.00173 (0.00275) Interviewer’s second-stage tertiary
education–0.0895 (0.0722) Interviewer’s working experience in
months0.000301 (0.000343) Variables at the municipality level Average per capita income per
municipality in 2020–1.70e–05* (9.63e–06) Share of employees in the primary
sector per municipality in 20190.00256 (0.00434) Share of university-trained population
per municipality in 20190.00153 (0.00558) Unemployment rate per municipality in
2019–0.0640*** (0.0113) Average age of population per
municipality in 20210.0313** (0.0158) Variables at the district level Average crime rate per district in 2021 –0.00480*** (0.00175) Constant –1.306 (0.975) Observations1 5,881 Source: HFCS Austria 2021, OeNB 1 The remaining 67 observations in the dataset are ineligible
including 3 observations with unknown
eligibility and are therefore not included in the regression.Note: Standard errors in parentheses; *** p<0.01,
** p<0.05, * p<0.1.
The advantage of this approach is that it stabilizes the nonresponse weights because the response propensities predicted by the regression model vary widely and can contain extreme values. 87 Information collected through interviewer surveys (see section 3.8), e.g. their level of education and experience, was found to correlate strongly and statistically significantly with households’ response propensity and was therefore used in step 1. Additionally, sample design information and municipal or district-level information was used, which may also explain willingness to participate with statistical significance.
Response
Classes |
Predicted
response propensity |
Weight |
---|---|---|
% | ||
I | 0 to 18 | 6.465 |
II | 18 to 27 | 3.729 |
III | 27 to 33 | 3.754 |
IV | 33 to 39 | 2.766 |
V | 39 to 45 | 2.385 |
VI | 45 to 54 | 2.048 |
VII | 54 to 100 | 1.652 |
Source: HFCS Austria 2021, OeNB. |
The HFCS nonresponse weights are shown in table 14. A value was calculated for each of the seven response groups and, by design, households with a high response propensity were assigned a lower weight than those with a low response propensity. Nonrespondent households were assigned a nonresponse weight equal to zero.
7.2.4 Poststratification weights
Erroneous exclusion may – as mentioned above – be an imperfection in the HFCS frame with respect to the target population. We may have missed households without postal addresses, which means that these types of households would be underrepresented. If an external dataset covering these households and all others in our target population existed, we could use it to adapt our sample to this external dataset accordingly; we could put more weight on households without postal addresses so that the estimated size of the target population in the HFCS would be the same as the one in the external dataset.
Unfortunately, such a dataset does not exist in Austria. Similar surveys, like the EU SILC (EU Statistics on Income, and Living Conditions) or the Austrian microcensus, target a different population of households due to their specific household definition. While the target population of the HFCS includes all households (according to the above definition), the EU SILC and the Austrian microcensus only include households living in a dwelling officially registered in the central residence registry as their main residence. This definition excludes a subset of households included in the HFCS household definition, namely all households living in dwellings that are not registered as a main residence or not registered at all. There are various reasons as to why in some cases households’ actual main residences are not registered as such. For instance, students studying away from home may keep their main residence at their parents’ address even though they are already a household of their own according to the HFCS definition; others may have just forgotten to register the address where they actually live as their main residence. Statistics Austria also acknowledges these problems and others when using main residence addresses for sampling households via the Austrian residence registry. 88
Given that these datasets also suffer from erroneous exclusion, it does not make sense to reweight the entire sample according to the target population size of these datasets. However, following an adjustment made in the second wave of the HFCS in Austria, the survey establishes whether a household’s residence is registered as a main residence or not, so it would appear to make sense to reweight this group of households to the microcensus. In particular, this may deliver a better picture with regard to the proportions of households in the Austrian provinces, as the Austrian microcensus uses a much larger sample than the HFCS. For the small group of remaining households in the HFCS sample that are not registered at their main residence, reweighting the microcensus does not seem sensible. Yet the erroneous exclusion bias in the HFCS sample is likely to be very small in this case, as the vast majority of households do have postal addresses. We constructed poststratification weights that put more weight on households with a lower probability of being included in the frame and less weight on households with a higher probability. We adjusted the HFCS sampling frame size only for households registered at their main residence according to the microcensus. Households not registered at their main residence are then added. 89 This increases comparability between the HFCS and the microcensus since the second wave and at the same time reduces the erroneous exclusion bias. Furthermore, poststratification weights can also reduce the sampling variance and, hence, increase the precision of the estimators; moreover, they can eliminate sample-specific random misrepresentations of the target population (see section 7.1).
The HFCS poststratification weights are constructed following the poststratification cell adjustment method (Biemer and Christ, 2008) and using the Austrian microcensus data (2021 Q4) available during the field phase of the HFCS in Austria. The procedure was as follows:
- Step 1: Choose suitable predictors for including a household in the HFCS frame and cross-tabulating these variables to compute the poststratification cells. Different poststratification cells were defined depending on the registration status. For households registered at their main residence, the province, the tenure status of the main residence, household size and the age of the reference person serve as poststratification variables. No poststratification is performed for all other households, because, as described above, they are not included in the external dataset.
- Step 2: Calculate the average propensity to be included in the sampling frame for each cell.
- Step 3: In each cell, the propensity was adjusted by a constant factor, thus adjusting the external dataset total.
- Step 4: Obtain the poststratification weight by inverting the average inclusion in-frame propensity for each cell.
Households registered at their main residence were grouped according to size, one group containing households with only one individual, another group containing those with two to four individuals and one containing those with five or more individuals. 90 This ensures that larger households are not underrepresented in the HFCS sample. Single households were further grouped according to age of the reference person, one group containing households with a reference person aged 0-64 and another group containing households with a reference person aged 65+. This ensures that older households are not overrepresented in the HFCS sample, which was specially important in this wave (see Box 5). Additionally, households registered at their main residence were grouped into tenants 91 and (part) owners. Moreover, the households were assigned to the nine provinces.
Table 15 shows the HFCS poststratification weights – 67 values, i.e. one value per combination of registration status, province, tenure status of the main residence, household size and age of the reference person. The table shows, for example, that single tenant households are underrepresented in the HFCS sampling frame, as they exhibit above-average poststratification weights.
7.2.5 Final weights
Three different weights were constructed to account for the different reasons as to why a household may misrepresent the target population. As we have seen, each of these weights can be interpreted as an inverted probability. The product of these yields a new inverted probability, which is the HFCS final weight wi:
wi = wDi · wNRi · wPSi
.
Official main residence | Other households | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Household size (number of persons) | Household size (number of persons) | |||||||||||||||
1 | 2 to 4 | 5 or more | 1 | 2 to 4 | 5 or more | 1 | 2 to 4 | 5 or more | 1 | 2 to 4 | 5 or more | |||||
Age of
household’s reference person |
Age of
household’s reference person |
Age of
household’s reference person |
Age of
household’s reference person |
|||||||||||||
0-64 | 65+ | 0-64 | 65+ | 0-64 | 65+ | 0-64 | 65+ | |||||||||
Homeowners | Tenants | Homeowners | Tenants | |||||||||||||
Vienna | 1.405 | 0.458 | 0.794 | 3.316 | 1.589 | 0.391 | 0.929 | 1.950 | ||||||||
Lower
Austria |
1.852 | 0.975 | 1.039 | 2.820 | 1.065 | 0.557 | 0.837 | |||||||||
Burgenland | 1.433 | 0.589 | 0.523 | 0.854 | 3.192 | 0.626 | 1.887 | |||||||||
Styria | 0.525 | 0.616 | 0.710 | 1.494 | 0.850 | 0.766 | 1.494 | 2.691 | ||||||||
Carinthia | 1.674 | 0.782 | 0.953 | 1.063 | 0.816 | 1.403 | 4.307 | 1 | ||||||||
Upper
Austria |
2.533 | 0.528 | 0.888 | 1.184 | 1.757 | 0.513 | 1.143 | 4.371 | ||||||||
Salzburg | 1.372 | 0.662 | 0.994 | 1.872 | 1.559 | 0.336 | 0.577 | 1.495 | ||||||||
Tyrol | 2.862 | 1.179 | 1.387 | 1.855 | 1.006 | 0.511 | 1.030 | |||||||||
Vorarlberg | 0.791 | 0.663 | 2.129 | 1.534 | 0.472 | 2.683 | ||||||||||
Source: HFCS Austria 2021, OeNB. |
The final weight wi incorporates all three adjustments and can be interpreted as the inverted probability that household i is in the net sample. Households with a high probability of being in the net sample have a lower final weight and represent fewer households in the target population than households with a low probability of being in the net sample.
The combination of nonresponse weights and poststratification weights results in 469 different weight adjustment cells based on registration status, province, tenure status of the main residence, household size, age of the reference person and the response propensity classes described above. Each household is represented in precisely one of these cells.
Finally, once we have taken the design weights into account, we obtain the HFCS final weights, whose distribution is shown in chart 6. The HFCS final weights range from 318 to 10,475, with the mean being 1,773 and the median 1,558. Their distribution is slightly skewed to the right, which is not atypical for unequal probability sample designs. After all, households with a higher probability of selection (below average design weights) dominate the sample. This effect is reinforced by the further weight adjustments.
7.3 Selected results
Table 16 shows the impact of the HFCS final weights on estimations by comparing selected weighted and unweighted mean values of HFCS variables. For example, we can see that households in Vienna were slightly upweighted from 22.2% to 23%. This means that despite their oversampling, overall households in Vienna were underrepresented in the sample with respect to the target population. The comparison also shows that households with higher income or higher net wealth were underrepresented in the unweighted sample, which is probably caused by these households’ higher nonresponse rate.
The use of the final HFCS weights is sufficient when calculating the weighted statistics shown in table 16. To calculate the appropriate correct variances or standard errors of these estimators, however, replicate weights, which are described in chapter 8, are necessary.
7.4 Concluding remarks
Mean | ||
---|---|---|
Unweighted | Weighted | |
Household size
(number of persons) |
1.95 | 2.09 |
% of households | ||
Vienna | 22.2 | 23.0 |
Lower Austria | 15.8 | 18.5 |
Burgenland | 3.9 | 3.2 |
Styria | 17.1 | 13.9 |
Carinthia | 6.5 | 6.4 |
Upper Austria | 15.5 | 16.1 |
Salzburg | 8.0 | 6.2 |
Tyrol | 8.4 | 8.4 |
Vorarlberg | 2.6 | 4.3 |
EUR | ||
Estimated household
monthly net income |
2,721 | 2,848 |
Household net wealth | 249,780 | 293,000 |
Source: HFCS Austria 2021, OeNB. |
We constructed a set of final weights to correct imperfections in the unweighted HFCS sample with respect to the HFCS target population. These imperfections are unequal probability sampling bias, erroneous inclusion, frame multiplicity and erroneous exclusion and nonresponse bias.
While the weighted HFCS sample enables unbiased population estimates, it also increases the variance of the population estimates, which makes them less precise. 92 According to the unequal weighting effect (UWE) statistic developed by Kish (1995), the variance of HFCS population estimates may be increased by a maximum of 32.9% (UWE = 1 + coefficient of variation2 = 1.329) as a result of weighting. The increased unit-nonresponse caused to worsen this value compared to the previous waves (see also Box 5 and chapter 10.7). However, it is still not necessary to apply weight trimming methods. Furthermore, a small increase in variance is acceptable in return for a significant reduction in the bias if it helps to avoid distorted results being classified as significant too often.
An explanation of how to correctly use the weights in Stata is provided in chapter 9, User guide.
81 Some special types of households, like those living in care residences (retirees, people in need of care), prisoners, etc., are excluded from this definition. For more details on the definition of the target population, see chapter 6.
82 Although addresses of companies and of households in care residences were removed from the sampling frame, some such addresses may still be erroneously included.
83 Sometimes referred to as noncoverage bias.
84 Poststratification weights can, moreover, correct a third type of sample-specific bias: the target population may be accidentally misrepresented by the specific households drawn into the sample.
85 In wave three eight classes were yielded as the optimal number of response classes by the algorithm.
86 The average response propensity is unweighted (with respect to the design weights) for efficiency reasons. See Little and Vartivarian (2003) for more details.
87 Another problem of the use of simple logit regression models, as highlighted by Iannacchione et al. (1991), is that such modeling does not ensure that the weighted sample marginal distributions conform to the population marginal distribution.
88 For the microcensus, see Haslinger und Kytir (2006), p. 512 f; for the EU SILC, see Statistics Austria (2018), p. 44–45.
89 Before the poststratification adjustment, the HFCS sampling frame encompassed 4,165,143 households, consisting of households registered at their main residence (4,103,836) and households not registered at their main residence (61,307 or about 1.5% of households). After the poststratification adjustment of households registered at their main residence, the population of these households comes to 3,872,661, which corresponds to the household population according to the 2016 Q4 microcensus. As a result, the final HFCS household population amounts to 3,933,968 (= 3,872,661 + 61,307).
90 Given the very low number in a poststratification cell, the cells were aggregated by household size in Lower Austria, Burgenland, Vorarlberg and Tyrol for main residence tenants, and in Carinthia and Vorarlberg for main residence homeowners.
91 Includes free users of main residences.
92 The poststratification step can restrict this increase in sample variance (see Levy and Lemeshow, 2008).
8 Construction of replicate weights for variance estimation
8.1 Introduction
The use of the final survey weights described in chapter 7 is sufficient when estimating population parameters. However, to calculate the corresponding correct variances or standard errors of these estimators requires replicate weights, which are described here. HFCS sampling involves a variety of complex features, such as stratification, multistage sampling, proportional-to-size sampling in the first stage or sampling without replacement in the second stage. In addition, the design weights are adjusted for nonresponse and poststratification. Ignoring these features in statistical analysis will bias the estimated variances of point estimators. For example, if stratification is ignored, the standard errors will be too large, and if clusters are ignored, the standard errors will be too small. Furthermore, if design weights are ignored, the sampling distributions of the statistics underrepresent the observations with a low selection probability and overrepresent those with a high selection probability (see Kolenikov, 2010).
A problem that occurs frequently when statistical analysis takes into account a complex survey design with all its features is that the mathematical functions of the variance estimators are unknown. Therefore, performing a statistical analysis requires methods developed especially for the purpose of variance estimation. There are two general categories of variance estimation methods: replicate weight methods (also called replication or resampling methods) and linearization. 93
Until recently, linearization was preferred to replication in the literature, as linearization requires less computational power. However, linearization comes with the major disadvantage that data protection regulations prevent some information necessary for linearization from being shared. When replicate weights are used, data availability and privacy are not an issue. After all, replicate weights consist of numerous variables which are not available to data users (e.g. stratum and primary sampling unit (PSU) variables). Leaving out this information makes it virtually impossible for individual respondents to be identified from the data by the data user (see Heeringa et al., 2017).
Moreover, the linearization method is unsuitable for estimating the variance of nonlinear statistics (medians, quartiles, etc.), as it requires computing derivatives of continuous functions; however, quantile functions, for instance, are discontinuous. Replicate weights, by contrast, are well suited for estimating the variance of such statistics (see Heeringa et al., 2017).
Given the data protection requirements mentioned above and because the HFCS data facilitate in particular the analysis of distributional parameters such as medians and quantiles, we use – in accordance with ECB requirements – replicate weights for variance estimation in the HFCS. 94 In the following section, we describe how replicate weights were constructed for the HFCS in Austria.
8.2 Construction of replicate weights
8.2.1 The replication method
The replication method aims to estimate the variance of an estimated population parameter. The idea behind this is, to begin with, to estimate population parameters for individual subsets (so-called replicates) of the sample observations. The variability of these estimated population parameters across all replicates is subsequently calculated, resulting in the desired variance of the estimated population parameter (see Levy and Lemeshow, 2008).
Instead of saving a whole sample for each replicate, it is more practical to vary the final survey weights. For example, instead of removing a sample observation to construct a certain replicate, it can be given a weight of zero in the replicate. Then the weights of the other observations in the same stratum need to be increased to ensure that the totals are unbiased for each replicate r (see Kolenikov, 2010). The replicate weights wi(r) for r = 1,…, R will be published together with the HFCS dataset.
There are different methods to form such replicates. The three major replication methods used in survey literature are balanced repeated replication, jackknife repeated replication and bootstrap replication. Although in most simulations in the literature, the estimators of all three replication methods converge towards one another as the sample size increases, simulation studies have shown that bootstrap and balanced repeated replication are better suited to quantile estimation than jackknife (see Kovar et al., 1988). Finally, as balanced repeated replication works only in designs with exactly two PSUs per stratum, which is not the case in the HFCS in Austria, we decided to use the (rescaling) bootstrap procedure proposed by Rao and Wu (1988) and enhanced by Rao et al. (1992). This procedure is also in line with the provisions of the ECB’s Household Finance and Consumption Network.
The bootstrap procedure forms replicates based on repeated with-replacement sampling of the PSUs within a stratum. The idea is to mimic the original sampling procedure in order to obtain approximations for the sampling distributions of the relevant statistics.
8.2.2 Sampling error calculation model
To mimic the original sampling procedure, we create a sampling error calculation model that is a simplification (see Heeringa et al., 2017) of the actual complex sample design (see chapter 6).
In the HFCS in Austria, one necessary simplification of the sampling error calculation model compared with the original sampling procedure is to collapse, i.e. merge, strata with one single PSU because the bootstrap procedure requires at least two PSUs per stratum. Due to the specific stratification of the HFCS sample design, single-PSU strata are quite common in the sample: Only one PSU was drawn from 90 out of 188 strata. For the sampling error calculation model, every single-PSU stratum is paired with the geographically nearest stratum to form a single pseudo stratum, taking into account how many PSUs are in this stratum. Aggregation is carried out with the nearest stratum containing a smaller number of PSUs, reducing the frequency of necessary aggregations. Although collapsing the strata produces an upward bias in the estimated variance, this bias is kept as small as possible by collapsing geographically close strata, which keeps the PSUs within one pseudo-stratum very homogeneous. In this context it must be pointed out that upward biases of standard errors lead to a loss in statistical power. In general, however, this is more acceptable than negative biases of standard errors, which lead to results that are too often considered statistically significant.
Table 17 shows how stratum size (number of PSUs drawn per stratum) changes when the HFCS sampling error calculation model is used instead of the original HFCS sample design: When collapsing strata in the sampling error calculation model, their number decreases from 188 to 134, which means stratification is still very high. Moreover, the mean stratum size increases from 3.2 PSUs to 4.2 PSUs per stratum.
Design
Strata |
Pseudo
Strata |
|
---|---|---|
Number of strata | 188 | 134 |
Mean size | 3.2 | 4.2 |
Median size | 2.0 | 2.0 |
Minimum size | 1 | 2 |
Maximum size | 37 | 37 |
Source: HFCS Austria 2021, OeNB. | ||
Note: Stratum size as measured by PSUs
drawn per stratum. |
Another simplification performed in the HFCS sampling error calculation model in contrast to the original sample design is to assume that sampling variance stems mostly from the first stage of sampling (i.e. the selection of PSUs, and not that of households within each PSU). Therefore, two-stage sampling is reduced to single-stage sampling where all gross sample households within drawn PSUs are selected in the replicate sample.
In addition, all PSUs have the same probability of being selected into the replicate sample. Thus, the sampling error calculation model simplifies sampling by making a PSU’s probability of being drawn independent of its size as measured by the number of households.
No further simplifications are required by the sampling error calculation model. The nonresponse and poststratification weight adjustments are implemented in the same way as in the original weighting procedures (see chapter 7), and a finite population correction 95 is performed.
8.2.3 Construction of replicate weights
The algorithm used to construct the HFCS replicate weights comprises the following steps:
- Step 1: Draw mh PSUs with replacement within each pseudo stratum h.
- Step 2: Adjust the final survey weights of the drawn observations to create a new set of replicate weights. In particular, apply the same nonresponse and poststratification weight adjustments (sections 7.2.3 and 7.2.4) as for the final design weights and perform a finite population correction.
- Step 3: Repeat steps 1 and 2 R times to obtain r = 1,…, R sets of replicate weights.
In step 1, the number of PSUs mh drawn in each stratum of size nh is set to mh = nh –1. This decision is taken often to ensure the efficiency of the bootstrap estimators without violating the natural parameter ranges (see Kolenikov, 2010).
In step 2, the final survey weights must be adjusted because some PSUs may be duplicates and some may not have been drawn at all. As a consequence, each replicate will be biased with respect to the target population and therefore, to obtain the replicate weights, the design weights must be adjusted in the same way they were adjusted when constructing the final survey weights (see chapter 7). In addition, a finite population correction is required, as PSUs are sampled without replacement in the original HFCS sample design (see footnote 3). 96
Finally, in step 3, the higher the number of replicates R is, the more precise the standard error estimates are. We choose R = 1,000, which lies in the upper bound of the usual recommendations found in literature (see Kolenikov, 2010).
Table 18 shows some descriptive statistics of a selection of HFCS replicate weights. We can see that owing to the homogeneous weighting adjustments, the mean and the total sum of replicate weights remain unchanged. Moreover, compared with the final survey weights in the HFCS, the replicate weights have smaller minimum values, however none are equal to zero. These values correspond to the nonselected PSUs, which, instead of being assigned a weight equal to zero, are assigned a small positive weight in the finite population correction. The fact that the replicate weights also have larger maximum values than the final survey weights can be explained by the weight adjustments that were carried out: As some PSUs are not drawn in the replicates, and in order to obtain the same estimated population sizes as in the original sample, the weights of the observations in the drawn PSUs must be increased.
Mean | Median | Minimum | Maximum | Total | |
---|---|---|---|---|---|
Final Survey weights | 1,773 | 1,558 | 318 | 10,476 | 4,066,627 |
1st set of replicate
weights |
1,773 | 1,184 | 6 | 16,422 | 4,066,627 |
2nd set of replicate
weights |
1,773 | 1,166 | 4 | 20,439 | 4,066,627 |
3rd set of replicate
weights |
1,773 | 1,178 | 7 | 35,642 | 4,066,627 |
998th set of replicate
weights |
1,773 | 1,192 | 5 | 19,600 | 4,066,627 |
999th set of replicate
weights |
1,773 | 1,079 | 5 | 16,068 | 4,066,627 |
1,000th set of replicate
weights |
1,773 | 1,197 | 6 | 20,083 | 4,066,627 |
Source: HFCS Austria 2021, OeNB | |||||
Note: Statistics refer to successfully interviewed households only. |
8.3 Concluding remarks
We constructed 1,000 sets of replicate weights to enable HFCS data users to correctly estimate the standard errors of point estimators in the HFCS. This is necessary because the complex features of the HFCS survey design, which comprises amongst other things stratification, several stages of cluster sampling and weighting adjustments, bias the variance estimators if data users ignore them.
While it is true that correctly calculating the standard errors by using replicate weights requires more computational power than analyzing the data without using replicate weights, in practice it is not necessary to use all 1,000 sets of replicate weights for variance estimation. Thus, for example, it is possible to perform variance estimations using fewer replicates more quickly but less precisely. The number of replicates used depends on the type of estimator and the size of the population surveyed (see e.g. Pattengale et al., 2010). For instance, estimating the means for the total population will, as a rule, require fewer replicates than estimating the medians for specific population subgroups.
See the HFCS User guide (chapter 9) for an explanation of how to use the replicate weights correctly in Stata.
93 For a comprehensive overview of variance estimation methods, see Levy and Lemeshow (2008) or Heeringa et al. (2017).
94 In combination with multiple imputations, variance estimation of nonlinear statistics by means of resamplingweights is still largely unexplored.
95 The finite population correction accounts for the reduction in variance that occurs when sampling without replacement from a finite population. This type of sampling is used in the sample design of the second stage of the HFCS in Austria.
96 In the HFCS sample design, PSUs are drawn with replacement, SSUs without. Although the sampling error calculation model ignores the second stage, a finite population correction was performed to allow for the fact that households are not allowed to appear twice in the sample. Finite population correction reduces the bias of a higher variability of replicate weights.
9 User guide
9.1 Introduction
As we have seen in the previous chapters, the HFCS data are characterized by special features that must be taken into account when analyzing the data. The data are multiply imputed and contain survey weights and replicate weights. The HFCS data are also stored in several files, due to the structure of the survey. These files differ in terms of the data level (household or individual), the number of implicates (i.e. each implicate is a separate file) and the type of data, depending on whether the data were collected or constructed (derived variables, i.e. aggregated variables, and replicate weights vs. survey variables). This chapter 97 provides Stata 98 code that users can employ step by step to account for all of these features. 99 The code extracts were provided by Sébastien Pérez-Duarte 100 and have been slightly altered and expanded for release here. The ECB is expected to also make several program codes available in summer 2023 when publishing the dataset. In this chapter Stata program code is contained in the blue boxes. When copied into the Stata command window 101 it must be run in the sequence outlined below; altering the sequence may corrupt the code. Additionally, the online appendix contains a do-file “user_guide.do” with all the steps that are laid out below. 102 We also include an R-version of the user guide (called “user_guide_r_1.0.R”) in the online appendix, which provides a very similar code for the R environment as what is explained for Stata below. Since the steps are similar, we refrain from a repetition of detailed explanations in the chapter. This chapter first explains how to merge the separate files, then describes one way to set up the structure for imputations and survey information. Finally, some examples of simple estimation commands and how they are used are provided.
9.2 Merging the data files
The core HFCS data, which contain all internationally agreed variables, consist of the five multiply imputed samples or implicates at the household level (files H1–H5), the corresponding samples at the individual level (files P1–P5) and the corresponding set of aggregated variables 103 (files D1–D5). Before creating a new dataset containing all these files, users must specify the path to the datasets and the folder containing the do-files on their computers. The variables used for merging are the household identifier “sa0010,” the implicate number “im0100” and the country identifier “sa0100.”
********************************************************************
***Merging the files of the HFCS data
********************************************************************
*Set macro for the path to the data (must be specified by the user)
global hfcsdata=“path to the appropriate folder where the data are stored“
*Set macro for the path to the do-files (must be specified by the user)
global hfcsdofile=“path to the appropriate folder where the do-files are stored“
*Set working directory
cd „$hfcsdata“
*Reshaping and merging the p and h files together (wide format)
forvalues i=1(1)5 {
use „$hfcsdata\P`i‘.dta“, clear
drop id hid survey
foreach var of varlist sa0010- fra0500 {
local `var‘lab: variable label `var‘
}
gen idpers_temp1=“_“
egen idpers_temp2=concat(idpers_temp1 ra0010)
drop ra0010
reshape wide ra0?0* fra0?0* ra0020 fra0020 ra0030 fra0030 ra0040 fra0040 p* fp* /// , i(sa0010 sa0100) j(idpers_temp2) string
drop idpers_temp*
foreach j of varlist ra* fra* p* fp* {
local last2car=substr(„`j‘“, `=length(„`j‘“)-1‘, 1)
local last1car=substr(„`j‘“, length(„`j‘“), 1)
if „`last2car‘“==“1“ {
local firstcar=substr(„`j‘“,1, `=length(„`j‘“)-3‘)
label variable `firstcar‘_`last2car‘`last1car‘ ///„``firstcar‘lab‘ - `last2car‘`last1car‘“
}
else {
local firstcar=substr(„`j‘“,1, `=length(„`j‘“)-2‘)
label variable `firstcar‘_`last1car‘ „``firstcar‘lab‘ - `last1car‘“
}
}
save „$hfcsdata\P`i‘_temp.dta“, replace
clear
use „$hfcsdata\H`i‘.dta“, clear
merge 1:1 sa0010 sa0100 im0100 using „$hfcsdata\P`i‘_temp.dta“, nogen
save „$hfcsdata\M`i‘.dta“, replace
erase „$hfcsdata\P`i‘_temp.dta“
}
*Merging the core with the derived variables
forvalues i=1(1)5 {
use „$hfcsdata\M`i‘.dta“, clear
merge 1:1 sa0010 im0100 sa0100 using „$hfcsdata\D`i‘.dta“
save „$hfcsdata\temp`i‘.dta“, replace
}
*Merging the implicates together1
1 The temporary files are kept for configuring the multiple imputed data and are only erased following this procedure.
use „$hfcsdata\temp1.dta“, clear
forvalues j=2(1)5 {
append using „$hfcsdata\temp`j‘.dta“
}
*Drop unnecessary variables and labels
drop _merge
label drop _merge
*Save the HFCS data
save „$hfcsdata\hfcs.dta“, replace
So-called M-files are created by reshaping the P-files (using the reshape command), including the appropriate labeling of the P-file variables, and by merging the resulting dataset with the H-files. The reshaping of the P-Files is based on a temporary string variable for a correct naming of the personal level variables. The M-files are provided in wide format, 104 i.e. one line of the data matrix contains information on a given household and the information on each individual within a household is included in a separate variable. Merged with the D-files, these M-files yield the entire HFCS dataset in the “hfcs.dta” file.
9.3 Multiple imputations
The next step is to import both the original data and the imputed values into Stata’s mi (i.e. mi estimate commands for appropriate use of the multiple imputation structure). As the original data are not part of the HFCS data files, we have to construct them from the information about whether observations vary across implicates (indicating multiple imputation and, hence, missing values) and from the information about missing values taken from the flags. 105 Finally, original and imputed data must be imported and registered. Users should take note of the “IMPUTEDVARS” macro in the program code below, which contains a string listing all imputed variables once the corresponding loop has been executed. Moreover, the aggregated variables are registered as having been passively imputed. If registration was successful, running the mi varying command should yield only a few variables (e.g. the implicate number “im0100”) and the flags should be shown as “unregistered varying.”
********************************************************************
***Preparing the data for mi import
********************************************************************
*Create the zero implicate to simulate the original data
*Use one implicate of the data
use „$hfcsdata\temp1.dta“, clear
*Replace the implicate number by „0” to simulate the original data
replace im0100=0
*Append all other implicates
append using „$hfcsdata\hfcs.dta“
*For some reason string variables do not play well with mi commands and need to be encoded into numeric variables
foreach var of varlist hb* hc* hd* hg* hh* hi* hr* pa* pe* pf* pg* ra* sa0100 ///sb1000 {
capture confirm numeric variable `var‘
if _rc {
rename `var‘ `var‘_string
encode `var‘_string, gen(`var‘)
drop `var‘_string
}
}
*Set as soft missing („.“) in im0100==0 all values varying, and also those whose flags set them as imputed
global IMPUTEDVARS=““
foreach var of varlist hb* hc* hd* hg* hh* hi* hr* pa* pe* pf* pg* ra* {
capture confirm numeric variable `var‘
if !_rc {
tempvar sd count
quietly bysort sa0100 sa0010 : egen `sd‘=sd(`var‘)
quietly bysort sa0100 sa0010 : egen `count‘=count(`var‘)
quietly count if ( (`sd‘>0 & `sd‘ <. ) | `count‘<6 | /// (f`var‘>4000 & f`var‘<5000) ) & im0100==0
if r(N)>0 global IMPUTEDVARS „$IMPUTEDVARS `var‘“
quietly replace `var‘=. if ( (`sd‘>0 & `sd‘ <. ) | `count‘<6 | ///(f`var‘>4000 & f`var‘<5000) ) & im0100==0
drop `sd‘ `count‘
disp „.“, _continue
}
}
*Here we need to set all derived variables for im0100==0 missing because it is passively imputed
foreach var of varlist d* hb3001-hb40033 hb4099 hb4105 hb4205 {
local type1: type `var‘
local type2=substr(„`type1‘“,1,3)
if „`type2‘“!=“str“ {
replace `var‘=. if im0100==0
}
}
*Drop unnecessary variables
drop id _merge
*Save the HFCS data
save „$hfcsdata\hfcs.dta“, replace
*Erase temporary files that will not be needed anymore
forvalues i=1(1)5 {
erase „$hfcsdata\temp`i‘.dta“
}
********************************************************************
*****Import as multiply imputed data
********************************************************************
*Import the imputation structure of the data into Stata
mi import flong, m(im0100) id(sa0100 sa0010) clear
*Register the variables that are imputed
mi register imputed $IMPUTEDVARS
*Register derived variables as passively imputed
mi register passive d*
*Check whether all imputed variables are registered
mi varying
*Save the HFCS-data with mi structure
save „$hfcsdata\hfcs.dta“, replace
9.4 Survey variables
Having configured the data as multiply imputed, we can designate the data as complex survey data, identify variables that contain information about the survey design and specify the default method for variance estimation. In our case, all this information is contained in the final survey weights (hw0010) and in the 1,000 sets of replicate weights (wr0001–wr1000), which are provided in a separate file and hence have to be merged with the data first.
********************************************************************
***Setting up Complex Survey Design
********************************************************************
*Encode country indicator
use „$hfcsdata\W.dta“, clear
rename sa0100 sa0100_string
encode sa0100_string, gen(sa0100)
drop sa0100_string
save „$hfcsdata\Wtemp.dta“, replace
*Using the HFCS data with mi structure
use „$hfcsdata\hfcs.dta“, clear
*Merging the data with replicate weights
merge m:1 sa0100 sa0010 using „$hfcsdata\Wtemp.dta“
*Drop unnecessary variable and files
drop _merge
erase „$hfcsdata\Wtemp.dta“
*Setting the appropriate survey structure using replicate weights
mi svyset [pw=hw0010], bsrweight(wr0001-wr1000) vce(bootstrap)
*Save the HFCS-data with mi svyset structure
save „$hfcsdata\hfcs.dta“, replace
9.5 Standard estimation procedures
The data are now ready to be analyzed in Stata. After writing “mi estimate: svy:” followed by the estimation command in question Stata will provide correct estimates and standard errors, taking into account both the multiple imputation framework and the replicate weights. 106 The esampvaryok option can be useful when the sample size varies across implicates due to imputations. 107 Stata versions below Stata 12 do not allow the use of replicate weights together with multiply imputed data. For versions before Stata 12, the option vceok (used after the mi estimate command, e.g. “mi estimate, vceok:…”) can be used as a workaround. It should be noted that in order to calculate the correct variance for subsamples of households (see second example in the following program code), Stata requires a dummy variable for each of these subsamples combined with the use of the option for subpopulations (“…svy, subpop(dummy)…”). 108 Alternatively, it is possible to use the option over(variable) for certain estimation commands (see last example in the following program code).
********************************************************************
***Using Standard Estimation Procedures
********************************************************************
*Using the HFCS-data with mi svyset structure
use „$hfcsdata\hfcs.dta“, clear
*Mean of current value of primary housing unit
mi estimate, esampvaryok vceok: svy: mean hb0900
*Mean of current value of primary housing unit for part owner of the primary housing unit
gen partowner=(hb0300==2)
mi estimate, esampvaryok vceok: svy, subpop(partowner): mean hb0900
*Proportions of owner/renter of primary housing unit
mi estimate, esampvaryok vceok: svy: proportion hb0300
*Ratio of current to acquisition value of primary housing unit
mi estimate, esampvaryok vceok: svy: ratio hb0900 hb0800
*Regression of current value of primary housing on acquisition value and year of acquisition
mi estimate, esampvaryok vceok: svy: regress hb0900 hb0800 hb0700
*Average level deposits according to gender of the first person
mi estimate, esampvaryok vceok: svy: mean da2101, over(ra0200_1)
9.6 Additional estimation procedures
To calculate medians or other quantiles, we use a different Stata package, called medianize, which was developed by the ECB (the respective do-file can be found in the online appendix). It must be used with caution it is not yet a standard feature of Stata; so far it has been tested only in limited environments. Other Stata features used are the tabstat command and analytical weights.
********************************************************************
***Including Additional Estimation Procedures
********************************************************************
*ECB-written command to calculate medians (and some other quantile statistics), which should be run before the estimation command
capture program drop medianize
do „$hfcsdofile\medianize.do“
*Median of amount still owned in the first loan collateralized with primary housing unit
mi estimate, esampvaryok vceok: svy: medianize hb1701
*Median of amount still owned in the first loan collateralized with primary housing unit over gender of first person
mi estimate, esampvaryok vceok: svy: medianize hb1701, over(ra0200_1)
*Median of amount still owned in the first loan collateralized with primary housing unit over gender of first person
mi estimate, esampvaryok vceok: svy: medianize hb1701, over(ra0200_1) stat(p10)
9.7 Online appendix
The online appendix contains the Stata code described above and the do-files necessary to estimate certain quantiles. Additionally, an R-version is provided. The code in the online appendix will be updated as required, to include of the user guide program codes for other HFCS-relevant topics.
97 The authors refrain from making a judgment about which programs to use and with which settings. In particular if the size of the subsamples varies in each iteration, the estimation of discontinuous estimators does not comply with the assumptions of the results evidenced in literature (e.g. Little and Rubin, 2002). It is the responsibility of users to check whether individual estimation commands are valid and adequate under particular conditions.
98 The codes were written for Stata version 15.1 or higher and may not be valid for previous Stata versions.
99 Any changes and improvements made to the code are continuously updated in the online appendix. Any adjustments made since the release of the first wave of the HFCS were included in this program code.
100 European Central Bank.
101 Due to the way Stata handles line breaks, they may need to be deleted if the program code is copied by hand.
102 The two macros containing the individual path to the data and the additional do-files must be specified before execution. Given the size and structure of the data and depending on software and hardware specifications, executing the do-file may require a long time.
103 The ECB is expected to make the definitions of the aggregated variables and the datasets available in summer 2023.
104 It is also possible to merge the data files in “long” format using an almost identical code without needing to reshape the personal files.
105 All missing values (including “Don’t know,” “No answer” and skip patterns) are set to “.” and are paired with specific flags reflecting different types of missing values (e.g. skipped observations are flagged with a “0”). Flag variables have the same variable name, but their names are preceded by an “f.”
106 A correct point estimate of statistics can be carried out on the basis of the final survey weights. Replicate weights are needed to calculate a variance estimator.
107 Rubin’s combination rules (see e.g. Little and Rubin, 2002) were derived on the assumption that the same set of observations is used in each imputed data set. Thus, they may not necessarily apply when the sets of observations used in the data analysis differ. This is why mi estimate generates an error when this happens. When the subsets used in each complete data analysis differ relatively little, the conventional formulas may still be applicable. In this case, users can choose to use the esampvaryok option or find a better way to deal with the violation of the assumption of Rubin’s combination rules described above. To our knowledge, this issue has not yet been addressed in literature.
108 The use of an if-condition does not account for the uncertainty of the subsample size and therefore yields incorrect variance estimators.
10 Changes from the third to the fourth wave of the HFCS due to COVID-19
10.1 Introduction
The HFCS has now been conducted four times in Austria. The field phase of the fourth wave lasted from late May 2021 to February 2022. The fourth wave was dominated by three countrywide lockdowns before the start of the field phase (the initial field period was planned for spring/summer 2020, the time of the first lockdown) and one during the field phase. These lockdowns imposed travel restrictions, restrictions on going out, compulsory mask wearing, compulsory testing, distance learning at schools and universities, etc. Under these circumstances, some parts of the HFCS had to be changed in the fourth wave so that the survey could take place despite COVID-19 while maintaining the high quality standards of the previous waves. After all, HFCS data are used by all relevant institutions in Austria as well as the research community around the world for a broad range of research.
This chapter offers a short but comprehensive insight into what has changed from the third to the fourth HFCS wave due to COVID-19 for readers who already have experience in evaluating data from previous waves in Austria. Furthermore, this chapter provides the foundation for evaluations based on at least the last two waves of the HFCS in Austria, which require an understanding of how the survey waves differ from each other.
The structure of this chapter mirrors the structure of the documentation as a whole: Following an overview of the key COVID-19-related changes to the questionnaire (section 10.2) and to interviewer training and selection (section 10.3), we discuss editing measures (section 10.4), the multiple imputation process (section 10.5) and the sampling design (section 10.6). The final two sections deal with the construction of survey weights (section 10.7) and replicate weights (section 10.8). The user guide (chapter 9) is not discussed here, since it was left broadly unchanged (except for an extension to include R code). The chapter finishes with concluding remarks.
10.2 Questionnaire
The HFCS questionnaire used in Austria has traditionally been based on the internationally agreed core questionnaire and has repeatedly been adapted on the basis of experience gained in previous waves. We finished work on the questionnaire, its translation into German and associated programming before the COVID-19 pandemic hit Austria. Changes made necessary by the extraordinary circumstances created by the pandemic, such as the reference year for income (updated from 2020 to 2021 due to the delay of the field period) or an additional set of COVID-19-specific questions, were implemented later. The additional questions – also internationally harmonized – should cover the impact of COVID-19 on the core information collected in the HFCS. These questions related to working status, personal finances and savings. For each one, respondents could select (multiple) option(s) of how the pandemic had changed the situation for their household, e.g., whether the household saved more, less or the same amount during the pandemic. Additionally, respondents were asked by how much income and consumption (absolute amounts) had changed as well as how public expenses could be financed. We integrated this additional set of questions into the consumption section.
In contrast to some other countries taking part in the HFCS, we kept the computer assisted personal interview (CAPI) technique for this survey wave, because it is the best available method given the complexity of the survey.
The field period was initially planned to start in March 2020, but the pandemic made it necessary to postpone the start by more than one year. The Austrian field period of wave four eventually started in May 2021 and lasted until February 2022. Despite this extended period, about 350 randomly selected Viennese addresses of the initial gross sample were not contacted – due to the low number of interviewers – and eventually left out of the sample (see chapter 10.7). Furthermore, because of (regional) restrictions and warnings, the field period had to be interrupted (from November 22, 2021, to December 12, 2021) and restarted. No interviews were conducted during lockdowns.
Health risks made interviewing particularly difficult during this survey wave. However, once a household could be convinced to voluntarily participate, and as safety measures (such as wearing FFP2 masks) were complied with, interviewing was comparable to previous waves.
10.3 Interviewers
COVID-19 posed serious difficulties to HFCS interviewers in Austria. The challenges that had to be addressed during the fourth HFCS wave included, first and foremost, health risks that had to be handled but also challenges in interviewer training and breaks in the field period.
After the delayed start of the field period, interviewer training was redesigned to take place online via Zoom meetings. This increased flexibility in training and reduced the health risks for everybody involved. While the training content remained unchanged, the schedule was split into two meetings on two days to help trainees keep a high level of concentration. Also, trainees were able to use the break between the two sessions to work on their additional take-home exercise interviews. Particular effort was put into keeping the training interactive in the online setting. Interviewers asked questions for clarification – also those that came up during homework – throughout the training. Interviewers that had been trained already in March 2020 had a reduced training schedule of just one day to refresh their knowledge of the survey questionnaire. In total, four one-day refresher training sessions and seven complete two-day training sessions took place for wave four of the HFCS in Austria.
The extraordinary circumstances reduced the readiness of interviewers to work in the HFCS. Especially experienced and older interviewers were more reluctant to work for the survey on account of increased health risks. As a result, the number of interviewers decreased from about 70 in wave three to 47 in wave four, which, in turn, led to a longer field period. The reduction in the number of interviewers may had other effects, such as interviewer effects. Further methodological analyses concentrating on the boundaries of the distribution as well as the variance estimation based on the survey data could shed some light on these issues and their connection to specific results.
During the field period, maximum security measures were implemented to reduce the risk of infection for both respondents and interviewers. Interviewers were required to show proof of vaccination or recovery or a negative test (PCR test or antigen test performed by a pharmacist). Additionally, strict hygiene standards (washing hands, not touching one’s own face, etc.) were implemented in the same manner as wearing an FFP2 mask. No work-related health difficulties were reported in the HFCS.
10.4 Consistency checks and editing
As said above, the field period was interrupted from November 22, 2021, to December 12, 2021. In general, health-related discussions were at the forefront during this time when all the consistency analyses also took place. To be able to react quickly to any development, a high-level steering group was set up that met regularly throughout the period.
Once data were collected, the procedure to assess quality and correct potential mistakes stayed the same. Also, follow-up enquiries by phone were conducted as usual in the HFCS in Austria.
10.5 Multiple imputations
The pandemic also left its mark on households’ nonresponse behavior. While unit nonresponse increased (see section 10.7 on the changes in weighting due to COVID-19), item nonresponse decreased. Table 19 shows several statistics for item nonresponse per household in the fourth wave compared to the previous wave. It can be seen that the mean (median) share of euro variables with missing values among all euro variables per household decreased by 1 percentage point from 4.9% (2.6%) to 3.9% (1.6%).
Third wave (2017) | Fourth wave (2021) | |||||||
---|---|---|---|---|---|---|---|---|
Mean | Median | Minimum | Maximum | Mean | Median | Minimum | Maximum | |
Number of
variables asked |
||||||||
All variables | 1,994.1 | 2,005.0 | 1,440 | 2,506 | 1,917.8 | 1,947.0 | 1,382 | 2,241 |
Euro variables | 116.7 | 118.0 | 62 | 167 | 117.6 | 122.0 | 42 | 166 |
Number of
variables with missing values |
||||||||
All variables | 20.7 | 10.0 | 0 | 467 | 17.3 | 8.0 | 0 | 370 |
Euro variables | 5.8 | 3.0 | 0 | 78 | 4.6 | 2.0 | 0 | 57 |
Share of variables
with missing values in % |
||||||||
All variables | 1.0 | 0.5 | 0 | 19 | 0.9 | 0.4 | 0 | 19 |
Euro variables | 4.9 | 2.6 | 0 | 54.5 | 3.9 | 1.6 | 0 | 47.6 |
Source: HFCS Austria 2017 and 2021, OeNB. | ||||||||
Note: Interval responses are considered as missing values with regard to the corresponding euro variable and
are not included as a separate variable. A question addressed to several household members is entered as several variables, one for each household member. |
Consequently, the total number of variables to be imputed over the whole survey decreased by 134 from 907 in wave 3 to 773 in wave 4 (see table 20). Most of this reduction is attributable to the reduction in item nonresponse rather than to changes in the set of variables (variables that are either new or no longer included in wave 4) (see table 20).
Third wave (2017) | Fourth wave (2021) | |||
---|---|---|---|---|
Number of
variables |
% |
Number of
variables |
% | |
All variables to be imputed | 907 | 100.0 | 773 | 100.0 |
Distinct from other wave | 274 | 30.2 | 140 | 18.1 |
New/old variable | 110 | 12.1 | 118 | 15.3 |
New/old missing | 164 | 18.1 | 22 | 2.8 |
Common in both waves | 633 | 69.8 | 633 | 81.9 |
Chained equations | 602 | 66.4 | 593 | 76.7 |
Ad hoc methods | 31 | 3.4 | 40 | 5.2 |
Source: HFCS Austria 2017 and 2021, OeNB. |
However, at the same time, the number of variables that cannot be imputed using the regular HFCS imputation procedure via chained regression equations increased. Variables with missing values are not imputed with the HFCS imputation procedure when they have insufficient variance or when they have insufficient observations for running a regression. A very small fraction of these variables are imputed with ad hoc methods such as hotdeck imputation after the HFCS procedure has been completed. Due to the smaller net sample size in wave 4 resulting from the increase in unit nonresponse (see section 10.6 on the changes in sampling due to COVID-19), ad hoc imputation methods had to be used more often than in the previous wave. Table 20 shows that among those variables to be imputed in both surveys, 5.2% are imputed using ad hoc methods in wave 4 compared to only 3.4% in wave 3. 109
Apart from this, the HFCS imputation procedure described in chapter 5 remained the same.
10.6 Sampling
The survey sample was initially drawn before the outbreak of COVID-19 in Austria. After the delay in the field period, we kept the initial sample. As the time between sample design (also the data in the background) and contacting the households was extraordinarily long, we expected some more neutral dropouts as people move away or die. Furthermore, the gross sample included 11 addresses that declined to be contacted during the delay of the field period. They were classified as “inaccessible” and treated as addresses with an unknown eligibility status. Other than that, the sampling procedure did not need to be altered from previous waves.
10.7 Construction of survey weights
The pandemic posed several challenges to the construction of survey weights. On the one hand, it had a clear impact on households’ nonresponse behavior. While item nonresponse decreased (see section 10.5 on the changes in multiple imputation due to COVID-19), unit nonresponse strongly increased. Table 21 shows several response behavior indicators in comparison with the previous wave. It can be seen that the response rate decreased by 11 percentage points from about 50% of eligible households in the previous wave to 39% in this wave. 57% of eligible households actively refused to take part in the survey, 12 percentage points more than in the previous wave. As a result, the net sample size decreased by about 800 observations. Although the HFCS weighting procedure described in chapter 7 should still ensure unbiased population estimates under such circumstances, the strong increase in unit nonresponse necessarily increases the variance of the population estimates, which makes them less precise. According to the unequal weighting effect (UWE) statistic developed by Kish (1995), the variance of HFCS population estimates may be increased in this wave by a maximum of 32.9% as a result of weighting, a figure more than double that of the previous wave.
Response behavior
indicator |
Third wave
(2017) |
Fourth wave
(2021) |
---|---|---|
Gross sample size | 6,280 | 6,300 |
Net sample size | 3,072 | 2,293 |
Response rate | 49.8 | 39.0 |
Refusal rate | 45.3 | 56.8 |
Cooperation rate | 50.6 | 39.5 |
Contact rate | 98.5 | 98.7 |
Eligibility rate | 98.2 | 98.9 |
Source: HFCS Austria 2017 and 2021, OeNB. | ||
Note: Response rate = achieved interviews / eligible
sample units; refusal rate = sample units refusing to participate / eligible sample units; cooperation rate = achieved interviews / contacted sample units; contact rate = contacted sample units / eligible sample units; eligibility rate = eligible units / gross sample size. |
A further challenge possibly related to the COVID-19 impact on households’ nonresponse behavior is the fact that the unweighted HFCS sample is biased toward households with older household members. Actually, the estimation of the logit regression used for constructing the nonresponse weights shows a significant positive effect of average age in the municipality on the probability of household participation in the survey (see chapter 7). However, despite including this variable when constructing the nonresponse weights, the age bias still remained. Table 22 shows the age distribution of all household members – after weighting with nonresponse weights, but before weighting with post-stratification weights – in comparison with the previous wave. It can be seen that the proportion of household members aged 65 years or above is 11 percentage points higher than in wave 3 (31% instead of 20%). For this reason, we decided to include the age of the household’s reference person as an additional post-stratification variable when constructing the post-stratification weights (see chapter 7). With this step, we reduced the proportion of individuals aged 65 or over further, to 25%.
Age |
Third wave
(2017) |
Fourth wave
(2021) |
---|---|---|
1–19 years | 19.0 | 15.7 |
20–64 years | 60.4 | 52.9 |
65 years and
over |
20.5 | 31.4 |
Source: HFCS Austria 2017 and 2021, OeNB. | ||
Note: Unweighted for post-stratification but
weighted for nonresponse. |
Finally, there was one more COVID-19-related issue when constructing the survey weights of wave 4: The HFCS sample included 352 Viennese addresses distributed across 15 strata that could not be processed in time before the end of the field phase was declared. Which addresses were left open within each strata is purely random. The reason why they could not be processed in time is the low number of interviewers available during the pandemic (see chapter 3). As these addresses were never contacted, it is unknown whether they are eligible for the HFCS or not, and their paradata are also not available. For the construction of the survey weights, we decided to treat them as if they had never been part of the gross sample and dropped them from the sample. Therefore, we just had to adjust the design weights in the corresponding strata so that they again sum up to the population total. The construction of the survey weights then followed the same procedure as in previous waves and as described in chapter 7. We also tried using an alternative approach of how to treat these addresses, but the nonresponse weights remained quite robust against the alternative approach (see table 23). 110
Responseclasses |
Final
nonresponse weights |
Alternative nonresponse weights
(bootstrap sample) |
|||
---|---|---|---|---|---|
Mean |
Standard
deviation |
Minimum | Maximum | ||
I | 6,465 | 6,356 | 0.602 | 5.301 | 6,911 |
II | 3,729 | 3,902 | 0.035 | 3.811 | 4,008 |
III | 3,754 | 3,602 | 0.039 | 3.522 | 3,740 |
IV | 2,766 | 2,721 | 0.047 | 2.591 | 2,903 |
V | 2,385 | 2,289 | 0.015 | 2.237 | 2,322 |
VI | 2,048 | 1,874 | 0.014 | 1.854 | 1,961 |
VII | 1,652 | 1,670 | 0.004 | 1.660 | 1,683 |
Source: HFCS Austria 2021, OeNB. | |||||
Note: The alternative nonresponse weights included additional 352
Viennese addresses for which the paradata variables had to be multiply imputed. |
10.8 Construction of replicate weights for variance estimation
As usual in the HFCS in Austria, the construction of replicate weights follows the same procedure as the construction of final household weights for a random subsample of households from the gross sample. Thus all the changes explained in chapter 10.7 transfer in the same way to the construction of replicate weights. Other than that, the use of replicate weights need not be altered from previous waves.
10.9 Concluding remarks
This chapter provided a brief but comprehensive overview of the changes from the third to the fourth wave of the HFCS in Austria due to COVID-19. Further, more detailed questions may arise from this overview, like whether interviewer characteristics have changed due to COVID-19 and whether they had an effect on unit nonresponse bias or variance, or whether (other than interviewer-related) determinants of unit nonresponse have changed due to COVID-19. Such questions go beyond the scope of this publication and are left for future research. For more detailed information on specific aspects of this publication, please see the relevant chapters or sections in this documentation.
109 As the variables imputed with ad hoc methods are typically not used as predictors in the chained regression equations from the regular HFCS imputation procedure, the broad conditioning approach applied in the selection of predictors is still valid (see chapter 5.4.7).
110 The alternative treatment consisted in leaving the 352 addresses in the sample and treating them in the same way as the other addresses in the sample that have an unknown eligibility status because interviewers were unable to reach or find them. This alternative procedure required the imputation of the paradata for the 352 Viennese addresses. The paradata variables were imputed according to the probability distributions of the observed paradata of the remaining addresses. At the end alternative nonresponse weights were obtained and the whole process was repeated 1,000 times. Some descriptive statistics of these alternative nonresponse weights over the 1,000 bootstrap samples are shown in table 23.
References
Albacete, N. 2014. Multiple imputation in the Austrian Household Survey on Housing Wealth. In: Austrian Journal of Statistics. 43(1). 5–28.
Albacete, N. and P. Lindner. 2013. Household vulnerability in Austria – a microeconomic analysis based on the Household Finance and Consumption Survey. In: Financial Stability Report 25. OeNB. 57–73.
Albacete, N. and P. Lindner. 2015. Foreign currency borrowers in Austria – evidence from the Household Finance and Consumption Survey. In: Financial Stability Report. 29 OeNB. 93–109.
Albacete, N. and P. Lindner. 2017a. Simulating the impact of borrower-based macroprudential policies on mortgages and the real estate sector in Austria – evidence from the Household Finance and Consumption Survey 2014. In: Financial Stability Report 33. OeNB. 52–68.
Albacete, N. and P. Lindner. 2017b. How strong is the wealth channel of monetary policy transmission? A microeconometric evaluation for Austria. In: Monetary Policy & the Economy Q2/17. OeNB. 32–53.
Albacete, N. and M. Schürz. 2013a. Vergleich der Einkommensmessung für Haushalte in Österreich: HFCS versus EU-SILC. In: Statistiken – Daten und Analysen Q2/13. OeNB. 88–89.
Albacete, N. and M. Schürz. 2013b. Interviewereffekte beim HFCS Austria 2010. In: Statistiken – Daten und Analysen Q3/13. OeNB. 57–68.
Albacete, N. and M. Schürz. 2014a. Paradaten im HFCS Austria 2010 – Teil 1: Evaluierung von Non-Response-Fehlern. In: Statistiken – Daten und Analysen Q1/14. OeNB. 81–97.
Albacete, N. and M. Schürz. 2014b. Paradaten im HFCS Austria 2010 – Teil 2: Evaluierung von Messfehlern. In: Statistiken – Daten und Analysen Q3/14. OeNB. 54–64.
Albacete, N. and M. Schürz. 2015. Interviewereffekte auf Haushaltsvermögen am Beispiel des Household Finance and Consumption Survey Austria 2010. In: Statistiken – Daten und Analysen Q4/15. OeNB. 55–63.
Albacete, N., M. Andreasch and P. Lindner. 2018. Verschuldung der privaten Haushalte in Österreich. In: Statistiken Sonderheft. OeNB. June.
Albacete, N., P. Lindner and K. Wagner. 2016. Eurosystem Household Finance and Consumption Survey 2014. Methodological notes for Austria. Monetary Policy & the Economy Q2/16 – Addendum. OeNB.
Albacete, N., P. Fessler and P. Lindner. 2016. The distribution of residential property price changes across homeowners and its implications for financial stability in Austria. In: Financial Stability Report 31. OeNB. 62–81.
Albacete, N., P. Fessler and P. Lindner. 2018. One policy to rule them all? On the effectiveness of LTV, DTI and DSTI ratio limits as macroprudential policy tools. In: Financial Stability Report 35. OeNB. 67–83.
Albacete, N., P. Fessler and P. Lindner. 2022. The Wealth Distribution and Redistributive Preferences: Evidence from a Randomized Survey Experiment. OeNB Working Paper 239. https://www.oenb.at/dam/jcr:f551a3bf-cd08-4212-8cb2-9fd059342401/WP239.pdf (access on Jun 6, 2023).
Albacete, N., P. Fessler and M. Propst. 2020. Mapping financial vulnerability in CESEE: understanding risk-bearing capacities of households is key in times of crisis. In: Financial Stability Report 39. July -. https://www.oenb.at/dam/jcr:fe93c400-de9f-4163-9644-66b54afc8d60/09_Mapping_financial_vulnerability_in_CESEE.pdf (accessed on June 6, 2023).
Albacete, N., S.T. Dippenaar, P. Lindner and K. Wagner. 2019. Eurosystem Household Finance and Consumption Survey 2017. Methodological notes for Austria. Monetary Policy & the Economy Q4/18 – Addendum. OeNB.
Albacete, N., P. Fessler, F. Kalleitner and P. Lindner. 2021. How has COVID-19 affected the financial situation of households in Austria? In: Monetary Policy and the Economy Q4/20 – Q1/21. OeNB. https://www.oenb.at/dam/jcr:d05a7f28-e7aa-43b8-b54d-9ffbe5ff3861/07_mop_Q4_20-Q1_21_How-has-COVID-19-affected-the-financial-situation.pdf (accessed on June 6, 2023).
Albacete, N., P. Lindner, K. Wagner and S. Zottel. 2012. Eurosystem Household Finance and Consumption Survey 2010. Methodological notes for Austria. Monetary Policy & the Economy Q3/12 – Addendum. OeNB.
Albacete, N., J. Eidenberger, G. Krenn, P. Lindner and M. Sigmund. 2014. Risk-bearing capacity of Households – Linking micro-level data to the macroprudential toolkit. In: Financial Stability Report 27. OeNB. 95–110.
Albacete, N., I. Gerstner, N. Geyer, P. Lindner, N. Prinz and V. Woharcik. 2022. Effects of interest rate and inflation shocks on household vulnerability in Austria: a microsimulation using HFCS data. In: Financial Stability Report 44. OeNB. November. https://www.oenb.at/dam/jcr:21081bd8-1f4c-4525-8d52-236873981585/05_FSR_44_Effects-of-interest-rate.pdf (access on Jun 6, 2023).
Andreasch M., P. Fessler and M. Schürz. 2013. HFCS des Eurosystems – Möglichkeiten und Einschränkungen von Ländervergleichen im Euroraum. In: Statistische Nachrichten 9/2013. 842–51.
Banca d’Italia. 2012. Sample surveys – Household income and wealth in 2010. Supplements to the Statistical Bulletin XXII(6).
Barceló, C. 2006. Imputation of the 2002 wave of the Spanish Survey of Household Finances (EFF). Banco de España Documentos ocasionales 0603.
Beer, C. and K. Wagner. 2017. Households’ housing expenditure in Austria, Germany and Italy. In: Monetary Policy & the Economy Q4/17. OeNB. 48–61.
Bekhtiar,K., P. Fessler and P. Lindner. 2019. Risky assets in Europe and the US: risk vulnerability, risk aversion and economic environment. ECB Working Paper Series No 2270. https://www.ecb.europa.eu/pub/pdf/scpwps/ecb.wp2270~9c72a27c18.en.pdf (accessed on June 6, 2023).
Biemer, P. and S. Christ. 2008. Constructing the survey weights. In: P. Levy and S. Lemeshow. Sampling of populations: Methods and applications. 4th edition. Wiley. 489–516
Bledsoe, R. and G. Friess. 2002. Editing the 2001 Survey of Consumer Finances. Annual Meeting of the American Statistical Association. Joint Statistical Meetings. New York. August. 11–15.
Bover, O. 2011. The Spanish Survey of Household Finances (EFF): Description and methods of the 2008 wave. Banco de España Occasional Paper 1103.
Bricker, J., A. B. Kennickell, K. B. Moore and J. Sabelhaus. 2012. Changes in U.S. family finances from 2007 to 2010: Evidence from the Survey of Consumer Finances. In: Federal Reserve Bulletin 98(2). 1–80.
Cameron, A. and P. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge University Press.
Cochran, W. G. 1977. Sampling techniques. 3rd edition. Wiley.
Cowles, M. K. and B. P. Carlin. 1996. Markov chain Monte Carlo convergence diagnostics: A comparative review. In: Journal of the American Statistical Association 91/434. June. 883–904.
Drescher, K., P. Fessler and P. Lindner. 2020. Helicopter money in Europe: New evidence on the marginal propensity to consume across European households. Economics Letters, Volume 195. October. https://www.sciencedirect.com/science/article/abs/pii/S0165176520302603 (accessed on June 6, 2023).
ECB. 2011. Core output variables catalogue. www.ecb.int/home/pdf/research/hfcn/core_output_variables.pdf?c6a87a29f0c1cdf4b92526aceef3efea (accessed on June 6, 2023).
ECB. 2013a. The Eurosystem Household Finance and Consumption Survey: Methodological report for the first wave. ECB Statistics Paper Series 2. April.
ECB. 2013b. The Eurosystem Household Finance and Consumption Survey: Results of the first wave. ECB Statistics Paper Series 2. April.
ECB. 2016. The Eurosystem Household Finance and Consumption Survey: Methodological report for the second wave. ECB Statistics Paper 17. December.
Fessler, P. and M. Schürz. 2013. Cross-country comparability of the Eurosystem Household Finance and Consumption Survey. In: Monetary Policy & the Economy Q2/13. OeNB. 29–50.
Fessler, P. and M. Schürz. 2015. Private wealth across European countries: The role of income, inheritance and the welfare state. ECB Working Paper Series 1847.
Fessler, P. and M. Schürz. 2017. Zur Verteilung der Sparquoten in Österreich. In: Monetary Policy & the Economy Q3/17. OeNB. 13–33.
Fessler, P. and M. Schürz. 2019. Vermögen der privaten Haushalte in Österreich - Gemeinsamkeiten und Unterschiede. In: Soziale Mobilität und Vermögensverteilung (BMASK). 71–92.
Fessler, P. and M. Schürz. 2022. Structuring the Analysis of Wealth Inequality – Using the Functions of Wealth: A Class-Based Approach. In: Measuring Distribution and Mobility of Income and Wealth. National Bureau of Economic Research. Oktober. https://www.nber.org/books-and-chapters/measuring-distribution-and-mobility-income-and-wealth/structuring-analysis-wealth-inequality-using-functions-wealth-class-based-approach (accessed June 6, 2023).
Fessler, P., K. Jäger-Gyovai and T. Messner. 2015. What can we learn from Eurosystem Household Finance and Consumption Survey data? An application to household debt in Slovakia. In: Focus on European Economic Integration Q2/15. OeNB. 76–87.
Fessler, P., P. Lindner and E. Segalla. 2014. Net wealth across the euro area – why household structure matters and how to control for it. ECB Working Paper 1663.
Fessler, P., P. Lindner and M. Schürz. 2016. Eurosystem Household Finance and Consumption Survey 2014: First results for Austria (second wave). In: Monetary Policy & the Economy Q2/16. OeNB. 35–96.
Fessler, P. , P. Lindner and M. Schürz, 2019. Eurosystem Household Finance and Consumption Survey 2017 for Austria. In: Monetary Policy & the Economy Q4/18. OeNB. 36–66.
Fessler, P., E. List and T. Messner. 2017. How financially vulnerable are CESEE households? An Austrian perspective on its neighbors. In: Focus on European Economic Integration Q2/17. OeNB. 58–79.
Fessler, P., P. Mooslechner and M. Schürz. 2012. Eurosystem Household Finance and Consumption Survey 2010: First results for Austria. In: Monetary Policy & the Economy Q3/12. OeNB. 23–62.
Frumento, P., F. Mealli, B. Pacini and D. B. Rubin. 2012. Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data. In: Journal of the American Statistical Association 107(498). 450–66.
Haslinger, A. and J. Kytir. 2006. Stichprobendesign, Stichprobenziehung und Hochrechnung des Mikrozensus ab 2004. In: Statistische Nachrichten 6/2006. 510–19.
Haziza, D. and J.-F. Beaumont. 2007. On the construction of imputation classes in surveys. In: International Statistical Review 75. 25–43.
Heeringa, S. G., B. T. West and P. A. Berglund. 2017. Applied survey data analysis. 2nd edition. Chapman Hall/CRC Press.
Iannacchione, V. G., J. G. Milne and R. E. Folsom. 1991. Response probability weight adjustments using logistic regression. In: Proceedings of the American Statistical Associations, Section on Survey Methods. 637–42.
Kennickell, A. B. 1998. Multiple imputation in the Survey of Consumer Finances. In: Proceedings of the Section on Business and Economics Statistics. 1998 Annual Meetings of the American Statistical Association. 63–74.
Kennickell, A. B. 2005. The good shepherd: Sample design and control for wealth measurement in the Survey of Consumer Finances. Federal Reserve Board. January.
Kennickell, A. B. 2011. Look again, editing and imputation of the SCF panel data. Prepared for the Joint Statistical Meeting in Miami, Florida. August 3.
Kennickell, A. B., P. Lindner and M. Schürz. 2021. A new instrument to measure wealth inequality: distributional wealth accounts. In: Monetary Policy & the Economy Q4/21. OeNB. https://www.oenb.at/dam/jcr:37664c81-2d0d-409e-8d33-a19fc2b25854/05_mop_q4_21_A-new-instrument-to-measure-wealth-inequality.pdf (accessed on June 6, 2023).
Kennickell, A. B. and D. McManus. 1993. Sampling for household financial characteristics using frame information on past income. In: Proceedings of Survey Research Methods. Section of the American Statistical Association. 88–97.
Kish, L. 1995. Survey sampling. Wiley.
Kolenikov, S. 2010. Resampling variance estimation for complex survey data. In: The Stata Journal 10(2). 165–99.
Kovar, J. G., J. N. K. Rao and C. F. J. Wu. 1988. Bootstrap and other methods to measure errors in survey estimates. In: The Canadian Journal of Statistics 16. 25–45.
Levy, P. and S. Lemeshow. 2008. Sampling of populations: Methods and applications. 4th edition. Wiley.
Lindner, P. 2021. Finanzvermögen der privaten Haushalte aus Perspektive der Mikrodaten. In: Statistiken Sonderheft: Einkommen, Konsum und Vermögen der Haushalte – Sektorale Volkswirtschaftliche Gesamtrechnungen in den letzten 20 Jahren. OeNB. https://www.oenb.at/dam/jcr:5ff19df4-e600-43ec-9782-915b7dce16ba/05_SH_Sektorale-VGR_2021_Lindner.pdf (accessed on June 6, 2023).
Lindner, P. and M. Propst. 2020. Interviewdauer des HFCS in Österreich. In: Statistiken- Daten und Analysen Q2/20. OeNB. https://www.oenb.at/dam/jcr:4d0725ce-c21d-45e3-9405-886f1d55ca2d/08_statistiken_Q2_20_Interviewdauer-des-HFCS.pdf (accessed on June 6, 2023).
Lindner, P. and M. Schürz. 2015. Varianten der Messung von Haushaltsvermögen im HFCS in Österreich. In: Statistiken – Daten und Analysen Q2/15. OeNB. 52–70.
Lindner, P. and M. Schürz. 2017. Kommentare von Respondenten des Household Finance and Consumption Survey zur Befragung. Statistiken- Daten und Analysen Q4/17. OeNB. 50–63.
Lindner, P. and M. Schürz. 2019. The joint distribution of wealth, income and consumption in Austria: a cautionary note on heterogeneity. In: Monetary Policy & the Economy Q4/19. OeNB. https://www.oenb.at/dam/jcr:ed43b82c-cbaa-45db-bce2-07d4e1d1faf9/joint_distribution_mop_%20q4_19_screen-5.pdf (accessed on June 6, 2023).
Lindner, P. and M. Schürz. 2021. Matching survey data on wealth to register data on pension entitlements: what challenges need to be addressed? In: Statistiken – Daten und Analysen Q3/21. OeNB. https://www.oenb.at/dam/jcr:d4d8b02f-8a0e-4dac-82cd-89e96709952e/08_PB_statistiken_Q3_21_Matching%20survey%20data.pdf (accessed on June 6, 2023).
Lindner, P. and V. Redak. 2017. The resilience of households in bank bail-ins. In: Financial Stability Report 33. OeNB. 88–101.
Lindner, P., M. Schürz and J. C. Zhan. 2014. Methodische Verbesserungen im HFCS. In: Statistiken – Daten und Analysen Q4/14. OeNB. 71–83.
Lindner, P., T. Mathä, G. Pulina and M. Ziegelmayer. 2022. Borrowing constraints, own labour an homeownership: Does it pay to paint your walls? Applied Economics, published online: 16 Nov 2022. https://www.tandfonline.com/doi/full/10.1080/00036846.2022.2133893 (access on Jun 6, 2023).
Little, R. J. A. and D. B. Rubin. 2019. Statistical analysis with missing data. 3rd edition. Wiley Series in Probability and Statistics.
Little, R. J. A. and S. Vartivarian. 2003. On weighting the rates in non-response weights. In: Statistics in Medicine 22(9). 1589–99.
OeNB. 1979. Mitteilungen des Direktoriums der Oesterreichischen Nationalbank.
OeNB. 1998. Statistisches Monatsheft. December.
Pattengale, N. D., M. Alipour, O. R. P. Bininda-Edmonds, B. M. E. Moret and A. Stamatakis. 2010. How many bootstrap replicate weights are necessary? In: Journal of Computational Biology 17(3). 337–54.
Rao, J. N. K. and C. F. J. Wu. 1988. Resampling inference with complex survey data. In: Journal of the American Statistical Association 83. 231–41.
Rao, J. N. K., C. F. J. Wu and K. Yue. 1992. Some recent work on resampling methods for complex surveys. In: Survey Methodology 18. 209–17.
Royston, P. 2004. Multiple imputation of missing values. In: Stata Journal 4(3). 227–41.
Schafer, J. L. and M. K. Olsen. 1998. Multiple imputation for multivariate missing data problems: A data analyst’s perspective. In: Multivariate Behavioral Research 33. 545–71.
Statistics Austria. 2013. Bevölkerungsstand 1.1.2013.
Statistics Austria. 2018. Standard-Dokumentation Metainformationen (Definitionen, Erläuterungen, Methoden, Qualität) zu EU-SILC 2017.
Valliant, R., J. A. Dever and F. Kreuter. 2013. Practical tools for designing and weighting survey samples. Springer.
Van Buuren, S. and C. G. M. Oudshoorn. 1999. Flexible multivariate imputation by MICE. TNO-rapport PG 99.054. TNO Prevention and Health. Leiden.
Van Buuren, S., H. C. Boshuizen and D. L. Knook. 1999. Multiple imputation of missing blood pressure covariates in survival analysis. In: Statistics in Medicine 18(6). 681–94.
Van Buuren, S., J. P. Brand, C. G. Groothuis-Oudshoorn and D. B. Rubin. 2006. Fully conditional specification in multivariate imputation. In: Journal of Statistical Computation and Simulation 76(12). 1049–64.
Vehovar, V. 1999. Field substitution and unit nonresponse. In: Journal of Official Statistics 15. 335–50.
Wagner, K. 2014. Intergenerational transmission: How strong is the effect of parental homeownership? In: Monetary Policy & the Economy Q2/14. OeNB. 49–64.
Williams, R. L. 2014. Survey Sampling and Weighting, In: Encyclopedia of Health Economics. Culyer, A. J. (ed.). Elsevier. 371–74.