Getting data on tenure security that is comparable between countries requires a consistent sampling approach. Prindex’s data team provides guidance to all vendors on requirements and expectations surrounding sampling and review proposals for sampling each country from vendors: because we anticipate that not all countries will have good enough (or recent enough) census data, we anticipate some divergence in the way that the underlying sample frame is constructed. The following paragraphs set out the key features of our sampling approach.
We use a 3-stage cluster sampling design.
Our target sample population is adults (i.e. 18 years and above). As we aim to interview a representative sample of the adult population, not the head of household or the most knowledgeable person about the dwelling or land, we use a randomisation process to select which household adult is selected for interview.
The first stage of sampling involves the identification of clusters (sampling units) of households. The ratio of interviews to clusters is 10:1 to minimize problems associated with intra-cluster correlation. Sampling units are stratified by population size and/or geography. Where population information is available, selection is based on probabilities proportional to population size; otherwise simple random sampling is used. Samples are drawn independent of previous years’ surveys. This population strata selection methodology is in line with that taken by leading polling organisations such as Gallup.
In the second stage, we select sampled households through randomising selection procedures. Unless an outright refusal occurs, interviewers make up to three attempts (separated by two hours within the same day) to survey the sampled household. If an interviewer cannot obtain an interview at the initial household, a simple substitution method is employed whereby an effort is made to make contact at the neighbouring (right) household, and if unsuccessful, at the left, thereafter alternating households.
In the third stage, respondents are randomly selected within the nominated households. Interviewers list all eligible household members and their ages and birthdays. Gender matching is undertaken in some Middle Eastern and Asian countries where cultural restrictions so dictate. The respondent is selected by means of the Kish grid (see https://surveymethods.com/blog/what-is-the-kish-selection-procedure/ for an accessible explanation). The interviewer does not inform the person who answers the door of the selection criteria until after the respondent has been identified.
In countries where telephone penetration is over 80%, we will explore the use of telephone interviews rather than a face-to-face method — in the 2018 data collection, this is only relevant for the UK. If this approach is used in future data collection, random digit dial (RDD) or a nationally representative list of phone numbers will be used. In select countries where cell phone penetration is high, a dual sampling frame will be adopted. Random respondent selection will be achieved by using either the latest birthday or Kish grid method. At least three attempts will be made to reach a person in each household, spread over different days and times of day. Appointments for call-backs that fall within the survey data collection period will be made.
Data Preparation & Weighting
The dataset goes through a rigorous quality assurance process before being publicly released. This includes quality checks once 10% and 50% of the data is collected to test results for completeness, suspicious patterns that might suggest interviewer misbehaviour, and improper sampling. New data is collected as necessary.
Data weighting is done to ensure the data are representative at the national level. Base sampling weights are constructed to account for oversamples and household size. Weighting by household size is used to adjust for probability of selection. Population statistics are used to adjust the data by age, gender and, where possible, by location (e.g. urban vs. rural). Design effects and margin of error are calculated – both accounting for and not accounting for intra-class correlation coefficients. For margin of error, we assume 95% confidence level.