























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Non-probability sampling methods, specifically model-based methods, in the context of longitudinal studies. It highlights the use of non-probability sampling to supplement probability samples for specific populations and the need for further research on analyzing combined data. The document also mentions the advantages and disadvantages of non-probability sampling and the challenges of combining probability and non-probability samples.
Typology: Exams
1 / 31
This page cannot be seen from the preview
Don't miss anything!
Paul A. Smith, James Dawber S3RI/Dept of Social Statistics & Demography, University of Southampton
Executive Summary Probability sampling has a well-developed, relatively straightforward, design-based estimation framework providing the best approach to making inference about a population. Non-probability sampling includes a diverse range of methods that are not easily described under a single framework, however model-based methods are required when making inference from a non-probability sample to adjust for differences between the sample and known population information. Inference from the non- probability sampling method is only as good as the model and assumptions that are used. Sampling in longitudinal studies requires a precise definition of the target population, which may not merely be a finite population, but instead a dynamic population or superpopulation. Problems with non-response, attrition and under-coverage should be anticipated and factored into the design of the longitudinal study using model-based methods, rather than addressed post hoc. Having a representative sample is an important aim for a national-level longitudinal study, as this ensures that the data will have a wider potential to be used in the distant future across different disciplines. A probability sample is the best starting point to ensure this. Non-probability sampling for longitudinal studies may be useful to supplement the main sample for specific populations which it is impractical to reach with a probability sample, but the ways to analyse such combinations of data from different sample types needs more research.
1 Introduction Sampling is an essential process in obtaining data from which inference can be made about the wider population. Samples can be collected based on either probability or non- probability sampling methods. Probability sampling methods are characterised by the use of randomisation with known, non-zero probabilities of selection, whereas non-probability sampling methods do not have this property. Several approaches to non-probability sampling are used according to the specific requirements for data collection.
In this report we first briefly introduce probability and non-probability sampling and their respective strengths and weaknesses. We then introduce the idea of design- and model- based inference (plus a hybrid approach) and why this distinction is especially important in non-probability sampling. Next we focus on sampling problems specific to longitudinal studies and the potential to use nonprobability approaches. We briefly address the impacts of opt-in versus opt-out designs. In Annex A, we review existing longitudinal studies with specific emphasis on the use of non-probability sampling methods.
The strengths and weaknesses of non-probability sampling are essentially the opposite of probability sampling. Since a non-probability sample cannot be expected to be representative of the target population, then the estimates cannot be guaranteed to be unbiased. However, the costs can be considerably less, since the samples can be acquired more conveniently. Hence the non-probability sampling approach is intended to be more practical at the expense of being theoretically less rigorous.
Non-probability sampling encompasses a broad range of methods varying in sophistication. At its simplest non-probability sampling includes convenience sampling where attempts to reduce bias are not strongly considered. But non-probability sampling methods can be more methodical than this, for example, where auxiliary data can be used to adjust a non- probability sample in an attempt to minimise the potential bias, and these types of nonprobability design with a theoretical basis were considered in Baker et al. (2013). Inference can be made from a non-probability sample with a model-based approach, which utilises a model, which if it holds gives unbiased estimates. The risk of bias is controlled not through the randomness in the sample design (as in probability samples) but through assumptions that are used to build the model and that can only be validated by information external to the study design (Koch & Gillings, 2006). Building a suitable model can involve considerable statistical expertise, and moves the expenditure of resources more towards the estimation phase compared with design-based approaches. The assumed model cannot normally be assessed from the sample information, and if the model does not hold then estimation will be biased. These model-based assumptions are therefore stronger than the assumptions required in design-based sampling, but nevertheless provide a framework for inference from non-probability samples.
Shifts to web-based sampling as well as more accessible administrative data have brought about increased interest in non-probability sampling and corresponding model-based methods for inference. In response to this, Rivera (2018) reviews some non-probability sampling methods with guidelines to researchers about best practices for their implementation. Brick (2011) outlines future directions for survey sampling beyond probability sampling, highlighting that methods for making inference from web panels is of specific importance. He discusses how there are no clearly foreseeable paradigm shifts in sampling theory but suggests that the future of sampling will be dynamic, highlighting how model-based inference and multiple frame surveys may be of specific importance. An important report on non-probability sampling is provided by Baker et al. (2013). The purpose of this report was to examine under what conditions non-probability sampling will yield useful inferences about the population. Among the conclusions that are made are that non-probability sampling methods must be based on models that address challenges relating to both sampling and estimation. They also suggest that the reason why model-based methods are not more frequently used is the difficulty in testing the model assumptions, requiring significant statistical expertise. Lastly, they acknowledge that there is no universal framework for non-probability methods and that the current framework will need to be more coherent before it gets wider acceptance amongst researchers.
3.1 Design-based versus model-based inference
Rather than compare probability and non-probability sampling we focus on design and model-based approaches to inference. This is more relevant to studies which aim to make inference about a wider population, which is typically the case in social surveys. Sterba (2009) provide a good detailed overview of design and model-based frameworks.
Design-based inference is the more common and intuitive method where inference is made from a random sample to a population. This framework was developed by Neyman (1934) and is the basis for probability sampling.
Model-based inference is less commonly used in surveys, although widely used in general scientific inquiry, but with this framework it is possible to make inference from a non- probability sample under certain model assumptions. Note that model-based inference can also be used with probability samples, and this combination may be very efficient. The general model-based framework was introduced by Fisher (1922). For all intents and purposes model-based inference can be thought of as making adjustments to the non- representative sample in such a way that inference can still be justifiably made. Some of these methods will be discussed in the next section, but essentially the model overcomes (or measures) the potential bias that is otherwise induced by the non-probability sampling.
An important concept for model-based inference is the notion of a superpopulation. A superpopulation is a (hypothetical, unobserved) set of units from which the population values are assumed to be realised. Generally, model-based inference is made from the sample to this superpopulation, as opposed to the population as is done in design-based inference. It could be considered as a finding some general relationship rather than describing a relationship in a specific population.
It is important to realise that design and model-based approaches are not mutually exclusive, and in fact they are often used together. For example, a model can be used to adjust for non-response in a design-based survey; and if we have good information on the likely causes of nonresponse, we may be able to use this model information to adjust the design. In this case both the design and the model are required for inference. We discuss other hybrid approaches in Section 5.
Thompson (2015) provides a useful review of the approaches and methods used when using longitudinal complex survey data. She reduces the methods for analysing longitudinal data down to two broad types. The first is a design-based approach with model-assisted approaches to account for non-response and attrition, the second is a purely model-based approach which accounts for design features. She also notes that the latter option requires complex models which tend to exceed the capabilities of readily available statistical software.
4 Model‐based methods for non‐probability samples Several methods are used which aim to make inference from non-probability samples using models. These methods include, sample matching, propensity scoring and quota matching.
Sample matching aims to reduce selection bias by matching a non-probability sample to a target population based on a reference data source such as the population census. Certain characteristics of the individuals must first be identified for matching on. For example, sex, age and ethnicity may be the selected characteristics and respondents will be sampled until the sample matches the population composition. In this case the model assumptions are that the selected variables account for all the variation in the outcome, which may be more likely to be true if a large number of characteristics is used.
Another method known as propensity score matching is used when comparing treatment versus control without randomisation. This method relies on modelled conditional
a pure probability sample for most variables in one example.
Another hybrid approach is through a sequential sampling design, which includes multiple and general inverse sampling introduced by Chang et al. (1998) and Salehi & Seber (2004) respectively. This design is useful when targeting hard-to-reach subgroups, and basically simple random sampling (SRS) is undertaken repetitively until a desired sample size is reach for targeted strata. Similar to sequential sampling is adaptive sampling where more generally the repetitions in the sampling are conditional on what is already observed. For example adaptive cluster sampling will first undertake SRS followed by resampling around individuals with certain clustered characteristics. Sampling hard-to-reach subgroups is a challenging area with various techniques used, such as network sampling, snowball sampling, capture- recapture methods and respondent-driven sampling. Shaghaghi et al. (2011) and Heckathorn and Cameron (2017) provide an overview of these methods.
Berzofsky et al. (2009) present a summary of hybrid methods before applying a method to national business establishment surveys in the USA. This “model-aided sampling” technique is a hybrid between probability, quota and general inverse sampling which can be used without any introduction of bias. Occupations that have lower response are targeted through this method, ensuring that all groups are well represented, with appropriate weightings. They also argue that response burden is reduced overall by not needing larger samples than specified in occupations that do have high response rates. Elliott (2009) also presents a hybrid approach to estimation from a non-probability sample using “pseudo- weights” and a probability sample with similar predictive covariates.
6 Sampling in longitudinal studies
6.1 Population definition
Sampling in a longitudinal study has additional difficulties to cross-sectional studies. The two primary difficulties are around how the population should be defined, as well as how to manage attrition (Smith et al. 2009).
Defining a population from which to sample is more complicated for longitudinal studies, because the population is not necessarily fixed at a selected time point. Hence the population must not only be defined geographically, but also temporally. Furthermore, it may be that the superpopulation needs to be defined as well, for model-based inference. The representativity of a sample can only be defined relative to a suitable population or frame. Defining a (super)population is important for the design of the study, the probabilities used in the sampling design, the weighting methods and the focus of the analysis. For example, a sample can be representative of the population at the start, end or throughout the study period, each requiring different implications to the study design.
6.2 The need for representativeness
Goldstein et al. (2015) present a discussion on the importance of representativeness, particularly for longitudinal studies. Representativeness is a key factor, and challenge, when assessing the sampling design of a study. Some of the key ideas from this paper are discussed below.
An important distinction between scientific and population inference is highlighted. For the
latter it is important to have a representative sample, whereas in the former it is more important to cover the range of characteristics in the population. Scientific inferences are concerned with establishing causal relationships, which do not necessarily have to be population-wide or population-specific relationships. In such cases a purposive sampling approach could be as or more appropriate rather than probability sampling. For example randomised control trials do not require probability sampling, only a random assignment of cases to groups. The purpose here is to establish causality from a treatment, not to generalise to a specific population. Hence the purposes of the longitudinal study must be considered, and what kind of inference will be best suited to those purposes.
Generally the purpose of a longitudinal study is to enable longitudinal estimation, so the population must be longitudinal also. A longitudinal population is dynamic in nature, meaning that people will join and leave the population over time. Making inference on a dynamic population creates difficulties which is why the population may be restricted to be the cross-sectional population at the beginning of the study. This requires the assumption that the population will not change much, however over long periods this assumption becomes hard to justify. Lynn in Goldstein et al. (2015) argues that when defining a population the representativeness of the sample should be considered not just in relation to a study population but also a policy population at which the study is targeted, that is, the population which will be affected by the study findings. The target population should then be carefully defined at the beginning of the study, from which the appropriateness of design versus model-based approaches can be assessed.
It is worth pointing out that the population defined in the study has immediate consequences on how the parameters should be defined. Furthermore, if the statistical bias is defined as the difference between a sample estimate and the population parameter, then the way the population is defined has implications on how the bias is defined. If probability sampling is optimal for reducing this bias, and it is not well defined then perhaps non- probability sampling may be considered. These factors should all be considered in conjunction when designing a study.
A case is made in the discussion by O’Muircheartaigh in Goldstein et al. (2015) that probability sampling should be used in longitudinal studies to maximise the cross-disciplinary collaboration of the data. This is because in no discipline is a probability sample inferior to a non-probability sample. This is a strong indication that a design-based approach should be attempted, and then model-based methods used to supplement the unavoidable limitations of the sample representativeness. It is also suggested that these model-based approaches with their corresponding model assumptions should be testable and well-justified in practice.
6.3 Attrition
Attrition is unavoidable in longitudinal studies, and various weight adjustment approaches have been suggested, but all require the use of models. For example, Schmidt and Woll (2017) suggest an approach based on logistic regression where probabilities of drop-out are predicted using key auxiliary variables, and the weights are inversely proportional to these probabilities. Cumming and Goldstein (2016) argue that reweighting methods can be inefficient and propose multiple imputation methods.
Both these approaches have less control over the sample characteristics than a probability- based approach. They have been widely used for specialist populations where there is no frame, but as a supplementary sample suitable models would be needed to borrow strength, as discussed for on-line panels. Further research would be needed to investigate how these approaches could be integrated, but this does not seem a priority area currently. We did not find any examples of nonprobability samples being used as boosts in major longitudinal studies, and it may well be that studies of specialist populations are better undertaken and analysed separately.
There is perhaps a need for a pilot investigation of the longitudinal stability of samples collected by network sampling approaches (particularly respondent driven sampling where it is not the researcher doing the recruitment). The first response is the most difficult, and in a longitudinal setup contact information would be available for further waves; but the attrition in these groups might be high.
6.5 Conclusions
The ESRC-funded longitudinal studies are a key component of the UK’s data infrastructure, and therefore have multiple uses, including both scientific and population inference. To support both approaches, probability samples are the only currently available practical approach which provides an uncontested basis for both approaches, and additionally offers a relatively straightforward design-based analytical pathway to a wide range of researchers. The availability of this approach does not preclude the use of model-based approaches, and indeed the discipline of probability sampling may offer additional protection against bias in the modelling by producing samples with known properties.
Further research on the appropriate models and tools for combining probability and nonprobability elements would be valuable (Skinner in Goldstein et al. 2015). If sufficiently generic approaches can be found, this may open the way to making nonprobability components of the major longitudinal surveys viable and thereby increase the available information at relatively low cost.
7 Opt‐in versus Opt‐out A probability sampling approach relies on response rates being high enough to ensure sample representativeness. Recent findings from the Life Study revealed that response rates were too low to provide a representative sample (Dezateux et al., 2016). A factor that was believed to cause this was the opt-in recruitment model used, as opposed to an opt-out model. The former being when consent to participate is affirmed, rather than withdrawn in the case of the opt-out.
Previous studies have assessed the response rate differences between opt-in and opt-out approaches. Junghans et al. (2005) performed a randomised control trial in two general practices in England, where patients with angina were recruited. They found that the recruitment rate was 38% (96/252) for opt-in and 50% (128/258) for the opt-out approach. In Pennsylvania, USA a similar trial was done on diabetic patients, where a recruitment letter for opt-in got 12.7% (63/496) of patients enrolling compared to 38.3% (28/73) for opt- out (Aysola et al., 2018). More relevantly to longitudinal studies, Bray et al. (2015) compared opt-in and opt-out approaches aiming to reengage dropped out participants in the Avon Longitudinal Study of Parents and Children (ALSPAC). They found that only 3% (4/150) in the opt-in arm consented to continue, compared to 31% (46/150) in the opt-out
arm. There is also evidence that opt-out relative to opt-in significantly improves responses rates in children (Spence et al., 2015) and adolescents (Severson and Ary, 1983).
There is evidence that opt-out approaches to recruitment will receive higher response rates in a number of settings. In the examples provided, the response rates are below 50% for both approaches. This relatively low percentage creates challenges in making the sample representative. But given these challenges it seems sensible to aim for opt-out approaches to recruitment if it is ethical to do so. Then adjustments to non-response can be made using auxiliary data and appropriate weighting or model-based methods.
8 Recommendations and further research For a national-level longitudinal study probability sampling should be used to ensure representativeness, and provide a robust resource which can be used for general purposes into the future. Non-probability samples do not currently offer sufficient protection against the risk of bias in population inference; scientific inference does not deteriorate if based on a large probability sample. Models for nonresponse and attrition rates should be used to adjust the sample requirements in the design stage of the longitudinal study; model-based methods will still be needed to make adjustments for the actual outcomes. Auxiliary data sources will be useful for the modelling, potentially including linked administrative data. Longitudinal study designs should consider precise definitions of the population, allowing for a clear understanding of what should be sampled. Non-probability sampling for longitudinal studies should only be considered in situations where probability designs are not feasible (for example if there is no reliable sampling frame, or if the target population is hard to reach). There may be potential to use nonprobability samples to boost a probability sample for certain population subgroups, but the challenges of producing suitable models to combine these two types of data are not yet worked out, and more research in this area is needed.
9 References Aysola, J., Tahirovic, E., Troxel, A. B., Asch, D. A., Gangemi, K., Hodlofski, A. T., … Volpp, K. (2018). A Randomized Controlled Trial of Opt-In Versus Opt-Out Enrollment Into a Diabetes Behavioral Intervention. American Journal of Health Promotion, 32(3), 745–752. https://doi.org/10.1177/
Baker, R., Brick, J. M., Bates, N. A., Battaglia, M., Couper, M. P., Dever, J. A., ... & Tourangeau, R. (2013). Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology, 1(2), 90-143. https://doi.org/10.1093/jssam/smt008 (see also the full report at https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/NPS_TF_Report_Final_7_revised _FNL_6_22_13.pdf).
Berzofsky, M., Williams, R., & Biemer, P. (2009). Combining probability and non-probability sampling methods: Model-aided sampling and the O*NET data collection program. Survey Practice, 2(6), 2984. https://doi.org/10.29115/SP-2009-
Blom, A., Bosnjak, M., Cornilleau, A., Cousteaux, A.S, Das, M., Douhou, S & Krieger, U.
and N. L. Johnson). https://doi.org/10.1002/0471667196.ess1235.pub
Lucas, S. R. (2016). Where the Rubber Meets the Road: Probability and Nonprobability Moments in Experiment, Interview, Archival, Administrative, and Ethnographic Data Collection. Socius. https://doi.org/10.1177/
MacInnis, B., Krosnick, J. A., Ho, A. S., & Cho, M. J. (2018). The Accuracy of Measurements with Probability and Nonprobability Survey Samples: Replication and Extension. Public Opinion Quarterly, 82(4), 707-744.
Nærde, A., Janson, H., & Ogden, T. (2014). BONDS (The Behavior Outlook Norwegian Developmental Study): A prospective longitudinal study of early development of social competence and behavior problems. ISBN 978-82-93406-00-6. Oslo, Norway: The Norwegian Center for Child Behavioral Development.
Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558-625.
Rivera, J. D. (2018): When attaining the best sample is out of reach: Nonprobability alternatives when engaging in public administration research, Journal of Public Affairs Education. https://doi.org/10.1080/15236803.2018.
Salehi, M., & Seber, G. A. (2004). A general inverse sampling scheme and its application to adaptive cluster sampling. Australian & New Zealand Journal of Statistics, 46(3), 483-494.
Schmidt, S. C., & Woll, A. (2017). Longitudinal drop-out and weighting against its bias. BMC medical research methodology, 17(1), 164.
Severson, H. H., & Ary, D. V. (1983). Sampling bias due to consent procedures with adolescents. Addictive Behaviors, 8(4), 433-437. https://doi.org/10.1016/0306- 4603(83)90046-
Shaghaghi, A., Bhopal, R. S., & Sheikh, A. (2011). Approaches to recruiting ‘hard-to- reach’populations into research: a review of the literature. Health promotion perspectives, 1(2), 86. https://doi.org/10.5681/hpp.2011.
Smith, P., Lynn, P. & Elliot, D. (2009). Sample design for longitudinal surveys. Pp 21-33 in P. Lynn (ed) Methodology of longitudinal surveys. Wiley: Chichester.
Spence, S., White, M., Adamson, A. J., & Matthews, J. N. (2015). Does the use of passive or active consent affect consent or completion rates, or dietary data quality? Repeat cross- sectional survey among school children aged 11–12 years. BMJ open, 5(1). https://doi.org/10.1136/bmjopen-2014-
Sterba S. K. (2009). Alternative Model-Based and Design-Based Frameworks for Inference From Samples to Populations: From Polarization to Integration. Multivariate behavioral research, 44(6), 711–740. https://doi.org/10.1080/
Stephenson, B. C. (1979). Probability Sampling with Quotas: An Experiment, Public Opinion Quarterly, 43, 477–496. https://doi.org/10.1086/
Sudman, S. (1966). Probability Sampling with Quotas. Journal of the American Statistical Association, 61(315), 749-771. https://doi.org/10.2307/
Thompson, M. E. (2015). Using Longitudinal Complex Survey Data. Annual Review of Statistics and its Application, 2, 305–320. http://dx.doi.org/10.1146/annurev-statistics- 010814-
Thompson, S.K. (1990). Adaptive cluster sampling. Journal of the American Statistical Association 85(412): 1050–1059. https://doi.org/10.1080/01621459.1990.10474975.
Thurber, K. A., Banks, E., Banwell, C., & LSIC Team (2015). Cohort Profile: Footprints in Time, the Australian Longitudinal Study of Indigenous Children. International journal of epidemiology, 44(3), 789–800. https://doi.org/10.1093/ije/dyu
Wilson, I., Huttly, S. R., & Fenn, B. (2006) A Case Study of Sample Design for Longitudinal Research: Young Lives, International Journal of Social Research Methodology, 9:5, 351-365. https://doi.org/10.1080/
Yang, K., & Banamah, A. (2014). Quota Sampling as an Alternative to Probability Sampling? An Experimental Study. Sociological Research Online, 19(1), 1–11. https://doi.org/10.5153/sro.
is clear that the two components can be used together to strengthen any inference made, using similar methods to Elliot (2009) for example. The Life Study was stopped prematurely for reasons discussed in Dezateux et al. (2016).
These four longitudinal examples highlight three different scenarios where non-probability sampling is favourable to probability sampling: when there is no reliable sampling frame, when targeting hard-to-reach populations, and when scientific inference rather than inference about population quantities is the aim of the study.
Annex B
Longitudinal surveys are generally characterised by a need for population representative sampling as a main strategy. According to Lynn (in Goldstein et al., 2015) such features include the limited advance knowledge about estimation parameters, the inability to specify all estimation requirements to flow from the study in advance as well as the long scale timeline between data collection and analysis/policy implementation.
More often than not, longitudinal surveys aim to have a stable sample with high response rates across multiple waves. Commonly, the initial sample of such longitudinal surveys is representative of the population of interest. One of the advantages of having a representative initial sample is that the initial respondents can then be used to adjust analysis at subsequent waves to account for any bias arising from attrition (Goldstein et al., 2015).
Because the purpose of longitudinal studies is to be used as data resources for primary/secondary analysis it is important that the initial sample is representative of the population. Lynn (in Goldstein et al., 2015) argues that this can be regarded as a safety net ensuring that the population distribution of (multiple) topics of interest can be covered therefore allowing for the analysis of various research topics not envisioned at the design stage.
Further benefits of probability sampling, according to Lehtonen (in Goldstein et al., 2015) are in the flexibility it allows for the control selection of participants as well as its ability to “provide a basis for proper statistical inference under the actual sampling design” (p.469).
However, one of the disadvantages that longitudinal studies can’t go around is
that outcomes measured longitudinally can only be based on those people who
were in the ‘population’ at both points in time (i.e. baseline and outcome). Those
people who ‘enter the population’ after the baseline measure (people being
born, people emigrating into the country) as well as people who leave the
‘population’ before the outcome measurement (through death or immigration)
cannot contribute to the estimate (Lynn in Goldstein et al., 2015).
Muniz-Terrera and Hardy (in Goldstein et al., 2015) highlight the complication that even in situations where a random, representative sample is selected at the
onset of a longitudinal survey, this representativeness is not likely to last over
time. The original sample is bound to change, while the study population will
also change due to loss at follow-up. They use the example of the MRC
National Survey of Health and Development (NSHD), for which the initial
sample was made up of:
All babies born to women with husbands in non-manual and agricultural employment One in four births by women with husbands in manual employment
Wadsworth (1991) explains how the aim of employing this sampling scheme
was to achieve similar numbers of children in both groups. With the sample
aged 69 at the latest (24th^ ) data collection, the participants are no longer representative of the entire population aged 69 in England, Scotland and Wales.
Because of population demographics changing naturally but also due to
immigration and emigration occurring over the lifetime of the cohort, any
estimates coming from this study can only be representative of the British-born
population in 1946. This constitutes a disadvantage of random samples for
longitudinal surveys in the long-term.
One way of dealing with this would be for national cohort studies to boost the
sample in such a way as to maintain representativeness such as the 1958 British Birth Cohort (NCDS) and the 1970 British Birth Cohort which topped up
the sample with immigrants born in the cohort member reference birth week
using school records. The original birth sweep for NCDS managed to collect
information about 17,415 (98%) of all new-borns in Great Britain in that week. In
three instances at ages 7, 11 and 16, the sample was topped up with children
(and subsequently teenagers) who had been born overseas in the relevant
week and had subsequently moved to Great Britain. The new augmented
sample was made up of 18,558 cohort members (Cls.ucl.ac.uk, 2019).