Coverage Technical Report, Census of Population, 2021
7. Census Undercoverage Study

The primary objective of the Census Undercoverage Study (CUS) is to estimate the number of persons in the 2021 Census target population who were not enumerated at the national, provincial and territorial levels. A sample of individuals was drawn from six sampling frames independent of the 2021 Census. The data for the selected persons (SPs) were linked with tax data and other administrative sources to obtain recent information about their usual residence, contact addresses, household members, and related groups of persons.

A set of complex automated linkages and manual searches was done to find the SP in the 2021 Census Response Database (RDB). The census coverage studies (CCS), including the CUS, were carried out based on the version of the RDB that was available in mid-October 2021 (i.e., before the end of census processing). This version, which predates the final 2021 RDB, was called the CCS-RDB. There are a few minor differences between the CCS-RDB and the later versions of the census databases. The CCS-RDB, a database of persons, comprises all the records of enumerated persons, except three record groups: census records imputed through whole household imputation (WHI); all census records that were added late (after processing for the CUS began), but this did not happen in 2021 compared with the two preceding cycles; and, census records called “incomplete enumerations.” Section 7.4.6 provides more information on incomplete enumerations.

When a search produces no matches, multimode collection is done to determine whether the SP was a member of the target population and to get additional information (including addresses) to help find the SP in the CCS-RDB. At the end of the search, each SP is classified as out-of-scope (deceased, emigrated, temporarily outside Canada), enumerated or missed. A small number of non-response cases, consisting mostly of persons who could not be traced through collection, must be processed and are used to adjust respondent weights based on a non-response adjustment model.

7.1 Sampling

The sampling frame for the CUS target population, which includes all persons who should have been enumerated in the 2021 Census, is constructed from six frames independent of the 2021 Census. The first five frames were used to select a sample to estimate undercoverage in the 10 provinces, while estimates for the three territories were calculated using samples from the last frame only.

At the provincial level, sampling began with the persons who were in the 2016 Census target population. This includes all persons enumerated in the 2016 Census and all persons missed by the 2016 Census, represented by the portion of the sample of SPs in the 2016 CUS who were classified as missed. To account for persons added to the target population since the last census, intercensal (i.e., between the 2016 and 2021 censuses) births and immigrants were added, as were non-permanent residents as of Census Day in 2021. The data sources for these frames are as follows:

  • Census frame: Persons who were enumerated in the 2016 Census and appear in the 2016 CCS-RDB.
  • Missed frame: There is no comprehensive list of missed persons. However, there is a representative sample of these persons: the 2016 CUS sample of SPs classified as missed. They are all included in the 2016 sample with their 2016 weights.
  • Birth frame: Vital statistics data on intercensal births. Since the final vital statistics file on births is only available late, the CUS sample of births is drawn from a mix of preliminary, final and raw vital statistics data files.
  • Immigrant frame: Administrative data from Immigration, Refugees and Citizenship Canada (IRCC) on immigrants who arrived in Canada during the intercensal period.
  • Non-permanent resident frame: Administrative data from IRCC on persons claiming refugee status on Census Day and persons with a valid work or study permit on Census Day.

For each territory, the main survey frame consisted of health insurance files for persons eligible for health care on Census Day. Although this frame has excellent coverage, it is incomplete, so the sampling weight must be adjusted. Each frame for a given territory is independent of the other territory frames and is used to estimate the undercoverage only for that given territory. In addition, the territory frames are not used to estimate undercoverage in the provinces. In the 2021 CUS, non-permanent residents in the territories who had work or study permits and were not already included in health insurance files were added to the territory frames.

None of the first five frames for the provinces covered persons who had emigrated or who were outside Canada during the 2016 Census and did not complete a 2016 Census questionnaire and who returned during the intercensal period (“returning Canadians within a province”). According to the 2021 Census long-form questionnaire, the number of persons in this group was estimated at 252,089. In addition, the number of persons returning from a territory to a province was estimated at 13,426. Added to this number were 120 persons from reserves and settlements that were incompletely enumerated in 2016 and enumerated in 2021, and 8,489 persons from reserves or settlements who had returned in 2016 and were enumerated in 2021, but who were excluded from the 2016 Census frame. Also, persons born after the 2016 Census outside Canada or in the territories who have Canadian citizenship and who returned to one of Canada’s 10 provinces by Census Day in 2021 were not covered by the first five CUS census frames. According to the 2021 Census long-form questionnaire, the number of persons in this group was estimated at 16,925. Coverage error estimates do not include these populations, estimated at a total of 291,049 persons.

One problem with using multiple sampling frames is the possibility that the same person could be included in more than one frame. For example, a person in the immigrant frame may have been in Canada on a work permit in May 2016 and therefore may have been enumerable in the 2016 Census. That person would then be in both the immigrant frame and the census frame if they were enumerated, or in the missed frame if they were not enumerated. Consequently, it is important to identify all cases of frame overlap. Otherwise, estimates may be too high because some people are included twice in the frames. Whenever possible, this overlap is identified when the sampling frames are constructed, but some overlap is also identified later using information provided by respondents.

The sample design varied by frame depending on the type of list used. A one-stage stratified design was used for the 2016 Census frame. The stratification methodology was significantly changed during the 2021 CUS. Prior to stratification, several deterministic linkages were done. First, there was a linkage of the frame with the tax data, and over 96% of the persons were linked. Then there was a linkage with the vital statistics death files. There was also a linkage with IRCC files to find non‑permanent residents in the frame. Finally, there was a linkage with the 2021 RDB using the monster match program, which is also used for the processing of the CUS sample. This process provides suggestions for potential enumeration and an indicator of the strength of this suggestion. Some suggestions are strong enough to consider the enumerated person without having to check the suggestion. These cases are called self-enumerations. Following these linkages, the frame was stratified. Two take-all strata were created: the deceased stratum and the self-enumerated stratum. Next, six take-some strata were created taking into account the probability of enumeration of persons (strength of the suggestion in the 2021 RDB), the tax situation and the likelihood of being out of scope of the census. However, enumerated persons on reserves and settlements in the 2016 Census were placed in separate strata using the same criteria, but by grouping some strata together as the population is smaller and more homogeneous.

Second, the take-some strata were stratified by province. For those residing in the six smallest provinces in 2016, the stratification province was the province of residence in 2016 (in the 2016 RDB). For persons in the four largest provinces in 2016, the derivation of the stratification province varied by stratum. In the strata with high probability of enumeration in the 2021 RDB, the province of potential enumeration in the 2021 RDB was used. Otherwise, where the person was linked to the tax data, the most recent province of residence based on these data was used. As a last resort, the province listed in the 2016 RDB was used.

The missed frame is a sample-based frame because there is no list of all persons missed in the 2016 Census. The sample for this frame consists of all cases classified as “missed” in the 2016 CUS. Although the sample was not stratified as such, implicit stratification was inevitable because the 2016 missed cases were from different frames and strata.

To construct the birth frame, copies of intercensal birth registrations were obtained from vital statistics through the National Routing System, which provides faster access to these data. The frame contains all births between May 10, 2016, and May 10, 2021, inclusively. The frame was then stratified by the mother’s province of residence or province of birth if this data was not available.

The immigrant frame was constructed with records from IRCC. The immigrant frame contains all persons who immigrated to Canada between May 10, 2016, and May 10, 2021, inclusively. Those who were non-permanent residents on Census Day in 2016 were removed from the 2016 immigrant frame because they were already covered by the 2016 Census frame or by the 2016 missed frame. The immigrant frame was stratified by province. The province was derived based on information available in an address file provided by IRCC and in the IRCC immigration file. The most likely province of residence on Census Day in 2021 was selected. Then, immigrants from all provinces were separated into two strata by their immigration date. The first stratum consisted of immigrants who arrived between May 10, 2016, and April 30, 2020, and the second consisted of immigrants who arrived between May 1, 2020, and May 10, 2021, because newer immigrants are usually more likely to be missed in the census.

The non-permanent resident frame (persons who hold a work or study permit and refugee claimants) was constructed with IRCC records. Non-permanent residents as of Census Day in 2016 and intercensal immigrants were removed from the 2021 non‑permanent resident frame. The frame was stratified by province, according to the most likely province of residence on Census Day in 2021. To this end, a deterministic linkage of the frame with the tax data was done. The IRCC address file and the various IRCC non-permanent resident files were also used. At the end of the process, a number of non-permanent residents had no associated provinces of residence (residents with an open permit), so they were placed in a national stratum.

In the provinces, the total size of the 2021 sample was determined to achieve two main objectives. First, the 2021 CUS collection budget was to remain the same as the 2016 CUS collection budget (but adjusted for unit cost increases between 2016 and 2021). Only a portion of the persons in the sample required collection, and proportions varied by frame and stratum. Second, the CUS sought to obtain standard errors in the rate of similar undercoverage among provinces of comparable size. The aim was to produce smaller standard errors for the larger provinces than for the small provinces as this would help to obtain a small standard error at the national level. Where possible, standard errors were not to be higher than those obtained in 2016.

Starting in 2020, by constantly updating the parameters used to calculate the standard error of undercoverage and the number of persons requiring collection, sample size simulations by frame and stratum were done to calculate the appropriate standard errors at all levels (national, provincial, age and gender). The frames and results of the 2016 CUS were used to make these simulations. Since some survey frames were ready before others, sample sizes were determined for these frames before establishing sizes for other frames and strata. Among other things, the sample size of the stratum for the 2016 missed frame was already set because everyone who was classified as “missed” in the 2016 CUS was selected. Then, the size of the first stratum of the immigrant frame was determined in the summer of 2020, and so on for the other strata and frames (births and non‑permanent residents). The sample allocation was completed in November 2021 with the stratification of the 2016 Census frame as described above.

In several strata, a total size was determined for all ten provinces, and then a power-allocation scheme was used to allocate the total sample among the provinces. Minimum sample sizes were also set in the smallest provinces.

In addition, for some strata of the sampling frame, sub-stratification by sex and age group was performed to ensure that there were sufficient numbers of persons missed from these domains. Similarly, the allocation of the sample to the reserve strata of the census frame was carried out to obtain clarification on the undercoverage in the reserves at least as good as in the 2016 CUS. The final total allocated sample was 32,534 SPs across the frames in the provinces. Table 7.1.1 shows the final sample allocation by stratum for all provinces. According to this sample allocation, the target standard errors for the undercoverage rate ranged from 0.16% to 0.42% at the provincial level, and was 0.09% for the provinces as a whole. It should be noted that the resulting allocation does not guarantee that this level of precision will necessarily be achieved, because assumptions have been made about several parameters that are included in the calculation of the standard error of the undercoverage (strata and frame sizes, missed rate, CUS collection response rates, etc.). In addition, the effects of the COVID-19 pandemic may have affected the accuracy of these assumptions, including the number of immigrants and non-permanent residents, interprovincial migration and missed rates in the 2021 Census.

Table 7.1.1
Sample allocation, sampling frames and strata for all provinces Table summary
This table displays the results of Sample allocation, sampling frames and strata for all provinces. The information is grouped by Sampling frames (appearing as row headers), , calculated using (appearing as column headers).
Sampling frames Strata within each province Number of people
Note ...

not applicable

TS = take-some
Source: Statistics Canada, 2021 Census Undercoverage Study.
Take-all total ... not applicable 26,944,027
2016 Census Deceased 1,239,662
Auto-enumerated in a province 25,704,365
Take-some total ... not applicable 32,534
2016 Census Off reserves TS_1: Strong suggestions of enumeration 5,559
Off reserves TS_2: Strong suggestions of incomplete enumeration 369
Off reserves TS_3: High probability of being out of scope 510
Off reserves TS_4: Medium suggestions of enumeration 757
Off reserves TS_5: High probability of being missed 5,041
Off reserves TS_6: Others 1,712
Reserves TS_7: Strong or medium suggestions of enumeration 270
Reserves TS_8: High probability of being missed 505
Reserves TS_9: Others 200
Reserves TS_10: Newfoundland and Labrador and Prince Edward Island 60
2016 missed No further stratification 4,821
Births No further stratification 5,978
Immigrants Between May 10, 2016, and April 30, 2020 2,593
Between May 1, 2020, and May 10, 2021 588
Non-permanent residents No further stratification 3,571

The sampling methodology for the territories was similar to that of the census frame for the provinces. The persons included in the sampling frame for each of the territories were linked to the tax data and then to the 2021 RDB, using the monster matching process, which is also used for the processing of the CUS sample (see Section 7.2.1). Following these steps, the frame was stratified, taking into account the strength of the linkage with the 2021 RDB, the location of the enumeration and recent fiscal activity. A take-all self-enumeration stratum in the territory was formed, and six take-some strata were formed (see Table 7.1.2). For the first and sixth strata, a sub-stratification by sex and three age groups (0 to 17 years, 18 to 29 years and 30 years of age and older) was performed.

For sample allocation to the territories, the first step was to determine the total sample to be allocated to each territory in order to achieve similar and adequate precision of the undercoverage. In 2021, the target standard error for the undercoverage rate was approximately 0.40% in Yukon and the Northwest Territories (an improvement from 2016) and 0.60% in Nunavut (similar to 2016). Using the results of the 2016 CUS, assumptions of missed rates, undercoverage rates, and others were calculated for each stratum. For the first take-some stratum, the sample size was set manually in each territory as this stratum had very little effect on the accuracy of the undercoverage rate but more impact on the accuracy of the enumeration rate. This is important for the calculation of a calibration factor at the time of weighting. In addition, the workload of the employees who had to check the sample of this stratum had to be taken into account. Similarly, a sample was manually set for the fourth stratum as it represented persons who are almost certainly out of scope, but who are subject to some research work by CUS’s employees. Then, iteratively, an optimal distribution of the total sample was made among the other take-some strata, including the six substrata of the last stratum. An approximate total size was initially set, then the accuracy of the optimal distribution was calculated, and this was repeated by increasing or decreasing the total size until the desired precision for the undercoverage rate in each territory was obtained. The final total allocated sample was 4,285 SPs across the frames in the territories.

Table 7.1.2 shows the allocation by stratum for all territories.

Table 7.1.2
Sample allocation, strata by territory Table summary
This table displays the results of Sample allocation, strata by territory. The information is grouped by Strata (appearing as row headers), , calculated using (appearing as column headers).
Strata Yukon Northwest Territories Nunavut Total
TS = take-some
Source: Statistics Canada, 2021 Census Undercoverage Study.
Take-all: Auto-enumerated within its territory 27,881 26,696 16,981 71,558
Take-some total 1,156 1,331 1,798 4,285
TS_1: Strong suggestions of enumeration 530 440 468 1,438
TS_2: Medium suggestions of enumeration 57 196 356 609
TS_3: Strong suggestions of incomplete enumeration 30 30 44 104
TS_4: Strong suggestions of enumeration outside its territory 53 78 70 201
TS_5: High probability of being out of scope 97 83 96 276
TS_6: High probability of being missed (substratification)
Females, 0 to 17 years 30 59 158 247
Females, 18 to 29 years 48 44 69 161
Females, 30 years and older 109 117 157 383
Males, 0 to 17 years 33 61 132 226
Males, 18 to 29 years 54 61 65 180
Males, 30 years and older 115 162 183 460

Table 7.1.3 shows the sample allocation for Canada, the provinces and the territories.

Table 7.1.3
Sample size for Canada, provinces and territories Table summary
This table displays the results of Sample size for Canada, provinces and territories. The information is grouped by Provinces and territories (appearing as row headers), , calculated using (appearing as column headers).
Provinces and territories Take-all strata (number of people) Take-some strata (number of people)
NPR-CA = non-permanent residents without a known province
Source: Statistics Canada, 2021 Census Undercoverage Study.
Canada 27,015,585 36,819
All provinces 26,944,027 32,534
Newfoundland and Labrador 393,554 1,551
Prince Edward Island 106,063 1,437
Nova Scotia 696,275 1,943
New Brunswick 579,964 1,680
Quebec 6,668,208 4,298
Ontario 10,415,555 7,126
Manitoba 947,750 2,579
Saskatchewan 794,538 2,540
Alberta 2,940,437 4,215
British Columbia 3,401,683 5,015
NPR-CA 0 150
All territories 71,558 4,285
Yukon 27,881 1,156
Northwest Territories 26,696 1,331
Nunavut 16,981 1,798

A systematic sampling method within the strata was used to select samples. Here is the list of sorting variables used to obtain an efficient sample (implicit stratification), classified by sampling frame:

  • 2016 Census frame: sex, age, Code M,Note 1 2016 geography, tax situation, reason for potentially being out of scope and likely province in 2021 (if stratified in the six smallest provinces);
  • Birth frame: age on Census Day, sex, age group of mother and postal code;
  • Immigrant frame: age group, sex and country of birth;
  • Non-permanent resident frame: type of permit, age group, sex and country of birth;
  • Territories frame: sex, age, code M, tax situation and municipality of residence.

No sampling was required for the 2016 missed frame, as all persons missed in the 2016 CUS were selected from the 2021 CUS sample.

Following the selection of provincial and territorial samples, these samples must be prepared by checking the quality of information for the different variables of interest (i.e., geographic and demographic variables); for example, the accuracy of names and the validity of birth dates were checked. Addresses were standardized to facilitate subsequent processing activities. To update the geographic information, especially for the census sample and the missed persons whose information was from 2016, these were linked with the Canada Revenue Agency (CRA) records, including personal income tax records for 2015 to 2021 and Canada Child Tax Benefit records for 2016 to 2022. CRA files and vital statistics data were also used to check whether any selected persons had died. This preparation stage was important because it helped to determine the persons enumerated in the census frames, and to contact persons not found and interview them.

7.2 Processing and classification

7.2.1 Processing

The objective of processing is to provide information for the classification of SPs for the purposes of non-response adjustment and estimation. Specifically, processing is carried out to:

  • determine whether the SPs are enumerated in the Census Response Database
  • determine whether the SPs are in the census target population
  • provide further information for non-response adjustment.

The processing results were recorded in a classification assigned to each SP for estimation and tabulation purposes (see Section 7.4 and Section 9).

Most of the processing work involved automated and computer-assisted searching of the census coverage studies version of the 2021 Census Response Database (CCS-RDB) to determine whether the SP was enumerated.

Various elements of information were used for searching, including surnames, given names and birth dates. Telephone numbers and addresses associated with the SP or members of their household were also used. Questionnaires in which the SP could have been listed were identified from a variety of sources, including the following:

  • matches with the CCS-RDB using the birth date and sex of the SP and members of the household, or the SP’s name, postal code or telephone number;
  • selection addresses from the sampling frame;
  • address updates from tax records;
  • information from the computer-assisted telephone interview (CATI) (see Section 7.3).

The first step after sample preparation was to search the CCS-RDB for each SP by processing all SPs with the addresses available from the sampling frame and tax data. There were two outcomes. When the SP was found, they were usually classified as “enumerated,” and no further processing was required, except for SPs who were later identified through vital statistics information as being deceased before the census. When the SP was not found, the case was sent for collection. While collection was taking place, the CCS-RDB search continued. When CATI data were available, researchers could determine whether each SP was part of the census target population. If so, the CATI data could enable further searching.

Searching for the SP was done both automatically and manually by coding staff guided by subject matter experts. To ensure coding uniformity, coding staff were provided with a highly detailed procedure manual that spelled out the specific steps for coding the search results. Automated searches were conducted first. For addresses obtained from a match with the CCS-RDB, there was a corresponding census questionnaire. A measure of similarity between the census questionnaire and the data available for the survey was calculated. When this measure was above a specified threshold, it was automatically concluded that the SP was enumerated at that address. In these cases, neither this address nor the SP’s other addresses needed to be processed by the coding staff. Computer programs also determined when one address was a duplicate of another. These duplicate addresses also did not need to be processed.

For other cases, a manual linkage was conducted using DocLink’s Interactive Verification Application (DIVA), an application developed specifically for this operation. The coding staff used a number of tools for this process, such as Geographical Reference Files, electronic telephone directories and the Street Attributes File. There were often suggested census questionnaires or census collection units that matched the address that was used as the first step for searching. Staff could also search the CCS-RDB using flexible parameters further in the process (searching by name, date of birth, etc.). The results of the manual search were then automatically edited via DIVA built-in edits to minimize errors. A file containing the search results was then produced. The data from this file were used to classify SPs.

7.2.2 Classification

Processing provides the information required to determine whether SPs were:

  • included in the “census target population” or “out of scope” (not included)
  • “classified” or “not classified”
  • “listed” or “not listed”
  • “identifiable” or “non-identifiable”
  • “enumerated”
  • “missed.”

Some SPs fit into more than one category, which will be explained in greater detail in this section.

7.2.2.1 “Target population” or “out-of-scope” classification

The “census target population” includes the group of persons mentioned in Section 2.2. An SP is considered “out of scope” if they are not in the census target population. Each SP classified as “out of scope” is assigned one of the following statuses: deceased, emigrated or represented in another frame. For a person to be classified as deceased, they must appear as deceased in at least two administrative sources (vital statistics death files, income tax files, death files), or in the CUS collection interview. Permanent or temporary emigrants were also determined through a collection interview based on certain criteria and the response on their place of residence on Census Day, the amount of time spent outside Canada, their intention to return to live in Canada and the reason they were outside Canada on Census Day. Other SPs were also classified as “listed emigrants,” regardless of whether they were respondents during collection. These are non-permanent residents (from the 2016 Census and missed frames) who no longer had a work or study permit in 2021 or immigrant status since 2016.

SPs classified as “represented in another frame” includes cases selected in a province but classified in one of the three territories. Cases selected in a territory but classified in a province or another territory are also classified as “represented in another frame.”

SPs classified in the census target population were either “enumerated,” “missed” or “not classified” (see Section 7.2.2.2). An SP was considered “enumerated” if they were in the CCS-RDB. SPs in the census target population were classified as “missed” if they were not enumerated or “not classified.”

7.2.2.2 Classification for non-response and non-response adjustment

Whether an SP was classified as “listed” or “not classified” depended on the usefulness of the addresses provided and the CATI information. In many cases, collection provided information and one or more addresses that could not be found from other sources. In other cases, all the addresses and all the information obtained through collection could be found from other sources.

An SP was “listed” if they were classified without using CATI data; even if data were collected, the addresses and information collected through the interview were not required.

A person was considered “not classified” if it was possible to determine whether they were in the target population but not whether they were missed. This occurred when the place of residence on Census Day, as defined in Section 2.4, was known but not identified in the CCS-RDB. Persons whose place of residence on Census Day was not specific enough (e.g., only the name of a large city) and persons without a fixed address were included in this category.

SPs for whom one or more of the characteristics in the list above could not be determined were considered non-respondents. There are three types of non-respondents:

  • An SP was “not identified” when it could not be determined whether they were listed. In other words, since the information about the SP was incomplete, it was impossible to link the SP with the CCS-RDB or to collect their information through an interview.
  • An SP was “not traced” when it could not be determined whether they were included in the census target population.
  • A “not classified” SP was deemed to be partial non-response. It was known that the person was in the target population but not whether they were missed or enumerated.

7.2.2.3 Distribution of the sample by classification

Table 7.2 shows the distribution of the sample by classification and sampling frame. This table excludes persons in the take-all strata as these persons were classified (enumerated or deceased) prior to sample selection. Classification is determined from specific combinations of the characteristics of the list presented above. Initially, a total sample of 36,819 SPs was selected in the provinces and territories. Of that number, 22,083 SPs were classified as “enumerated,” 7,453 as “missed,” and 5,171 as non‑respondents, of which 169 were classified as “not classified.” The other 2,112 SPs were classified as “out of scope,” specifically 583 “deceased,” 938 “emigrants” (permanent or temporary), 405 persons outside the universe of the territories or provinces, and 186 persons, for other reasons. A non-response adjustment was made during estimation (see Section 7.4). It is important to note that for the purposes of classification and, therefore, estimation, the definition of a non-respondent differs from the usual definition of a non-respondent that data collection is attempted but not completed. This is because classification is based on data from several sources, including collection. To prevent any confusion, Section 7.3 on collection refers to “completed collection” rather than “response.”

7.2.2.4 Implications of the classification

“Traced” SPs are SPs for whom it was possible to determine whether they were included in the census target population. For purposes of estimation and tabulation, traced SPs who were also classified were the respondents. Since names, including those of household members, and addresses were available in the CCS-RDB, and since the tools for consulting the database were sufficiently powerful, it was possible to verify whether an SP was enumerated at an address even if the address provided was vague.

The usefulness of knowing whether an SP was enumerated is self-evident. SPs who were in the census target population but who were not enumerated and were therefore classified as “missed” formed the basis for the undercoverage estimate. We also wanted to classify SPs according to the above-mentioned characteristics so that the most appropriate respondents could be chosen to represent non-respondents.

Lastly, except for SPs who were not classified, the Census Day address (usual place of residence) of each SP in the census target population was determined. This is the address where, according to census instructions, the SP should have been enumerated. If the SP was enumerated, the enumeration address was considered to be the Census Day address, despite other information provided that may suggest that the census instructions were not well understood.

For more information on processing and classification, see Parenteau (2023).

Table 7.2
Classification of selected people, sampling frames for Canada Table summary
This table displays the results of Classification of selected people, sampling frames for Canada. The information is grouped by Classification (appearing as row headers), Births, Total, Territorial strata, Territorial frames1, Immigrants, Non-permanent residents , 2016 Census1, 2016 missed and Provincial strata, calculated using % and number units of measure (appearing as column headers).
Classification Provincial strata Territorial strata Total
2016 CensusTable 7.2 Note 1 2016 missed Births Immigrants Non-permanent residents Territorial framesTable 7.2 Note 1
number % number % number % number % number % number % number %
Note 1

Excluding the take-all strata.

Return to note 1 referrer

Source: Statistics Canada, 2021 Census Undercoverage Study.
Total 14,983 100.0 4,821 100.0 5,978 100.0 3,181 100.0 3,571 100.0 4,285 100.0 36,819 100.0
Enumerated 7,354 49.1 3,122 64.8 5,210 87.2 2,610 82.0 2,015 56.4 1,772 41.4 22,083 60.0
Listed 7,201 48.1 3,112 64.6 5,206 87.1 2,604 81.9 1,997 55.9 1,760 41.1 21,880 59.4
Not listed 153 1.0 10 0.2 4 0.1 6 0.2 18 0.5 12 0.3 203 0.6
Missed 4,156 27.7 710 14.7 432 7.2 284 8.9 630 17.6 1,241 29.0 7,453 20.2
Listed 821 5.5 86 1.8 68 1.1 22 0.7 49 1.4 238 5.6 1,284 3.5
Not listed 3,335 22.3 624 12.9 364 6.1 262 8.2 581 16.3 1,003 23.4 6,169 16.8
Out of scope 882 5.9 433 9.0 102 1.7 100 3.1 188 5.3 407 9.5 2,112 5.7
Listed 505 3.4 327 6.8 79 1.3 10 0.3 104 2.9 293 6.8 1,318 3.6
Not listed 377 2.5 106 2.2 23 0.4 90 2.8 84 2.4 114 2.7 794 2.2
Non-response 2,591 17.3 556 11.5 234 3.9 187 5.9 738 20.7 865 20.2 5,171 14.0
Traced not classified 87 0.6 17 0.4 17 0.3 2 0.1 10 0.3 36 0.8 169 0.5
Identified not traced 2,492 16.6 539 11.2 217 3.6 185 5.8 728 20.4 829 19.3 4,990 13.6
Not identified 12 0.1 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 12 0.0

7.3 Collection

7.3.1 Overview

Head office staff in Ottawa worked closely with staff in the Statistics Canada regional offices (ROs) to collect data during the survey phase of the Census Undercoverage Study (CUS). The suggestions and recommendations made by the ROs as a result of conducting the 2016 CUS were incorporated into the design and operations of the 2021 survey.

The main purpose of the CUS is to find (trace) the correct selected persons (SPs) and collect demographic and address information so they can be classified as enumerated, missed or out of scope for the census. The classification results are used to estimate the number of persons who were missed, or undercovered, in the census. To help find and classify the SPs, the Census Day address and household composition were collected, as well as any other address where the SP may have been enumerated. Other information, such as the SP’s mother tongue, was also collected for the coverage study tables.

The CUS take-some sample size was 36,819 (Section 7.1 describes the sample design). Pre-collection processing attempted to find these cases on the CCS-RDB, in vital statistics and in other administrative files. The cases that were matched or found in those files, and that could thus be classified as either enumerated or deceased before Census Day, were not sent to collection. All other cases that were not classified were sent to collection. The total number of cases sent to collection (the collection sample size) was 13,096. During the collection period, the processing team continued to try to match some of the cases, and those that could be classified were removed from collection (see Table 7.3.2 for these counts).

By design, collection was by proxy for SPs who were younger than 18 years. Proxy respondents were also used when the SP was not available during the collection period or was difficult to reach. Overall, 34% of the completed cases were by proxy, and a higher percentage of proxy cases were completed by interviewers than by self-response.

For deceased SPs, it was important to determine whether they had died before, on or after Census Day, since different questionnaire flows were used, depending on the date of death. In some cases—for example, by matching tax records and vital statistics—SPs were determined to be deceased before Census Day, prior to collection. These cases were not sent for collection. However, when in doubt, cases were sent for collection with a note indicating that the SP may be deceased.

It was imperative that the correct SP (or a proxy for the correct SP) be interviewed. If data were collected about the wrong person, the matching and resulting classification would be incorrect. The computer-assisted telephone interview (CATI) system was designed to instruct interviewers to verify that the person they were interviewing was the correct SP at the beginning of the interview. If an interview was completed with someone other than the SP (e.g., someone with a similar name and date of birth), the case was sent back to collection to be completed with the correct person.

The CUS is a mandatory multi-mode survey. The main data collection mode is CATI, and the secondary mode is self‑enumeration. For 2021, the CUS used web-based electronic questionnaires for both modes as it transitioned to the Integrated Collection and Operation System, which is a standardized collection application developed at Statistics Canada. Previously, the CUS self‑response mode used paper questionnaires. The transition to an electronic questionnaire was a big improvement, as it decreased respondent burden and reduced operating time and costs associated with mailing out paper questionnaires and manually entering the returned data.

The third collection mode was personal visits by field interviewers. The plan for the 2021 CUS was to continue to use field interviews in a limited scope, as in previous cycles (in the 2016 CUS, only 0.5% of cases were completed by field interviewers), but instead of the paper questionnaires that were used in the past, field interviewers would have used a laptop and the same application as telephone interviewers. However, all in-person interviews were cancelled at the collection planning stage because of the COVID-19 pandemic.

7.3.2 Operations

Data collection for the CUS began in all ROs on March 28, 2022. The last day of active collection was November 4, 2022. Table 7.3.2 shows the distribution of cases loaded into CATI from head office over time. The majority of cases were sent at the start of collection on March 28 and consisted of adult cases from all frames except Nunavut. The adjusted total represents the number of cases sent to collection, excluding the cases removed from collection.

Table 7.3.2
Total cases in collection Table summary
This table displays the results of Total cases in collection. The information is grouped by Description (appearing as row headers), , calculated using (appearing as column headers).
Description Count
Source: Statistics Canada, 2021 Census Undercoverage Survey.
Cases started March 28, 2022: Adults in all frames except Nunavut 9,922
Cases started April 27, 2022: Minors in all frames (including most of the birth frame) except Nunavut 1,822
Cases started June 6, 2022: Nunavut frame and remaining birth frame cases 1,352
Total cases sent 13,096
Cases dropped by head office: Collection no longer required (classified in processing as either enumerated or out of scope) 309
Adjusted total 12,787

Introductory letters explaining the CUS and advising the SP (or proxy) that they had been selected for the survey were sent for all cases that started collection in March and April and that had a valid mailing address. A phone number was provided if they had any questions or if they wanted to call the RO to complete the survey. Cases without a contact phone number (requiring tracing) were also provided with a secure access code and a link to the self-response questionnaire. Introductory letters were not sent for the cases starting in June; instead, they received the reminder letters sent in July. These reminder letters were sent for all cases not yet completed near the midway point of collection. A second reminder letter was sent one month later. All reminder letters contained secure access codes and links to the self-response questionnaire. New for the 2021 CUS, near the end of collection, email reminders were sent for all incomplete cases that had a valid email address.

Near the end of collection, in an effort to boost response rates, the Toronto and Western ROs began a process similar to the field interview visits done in the past. If there was an address for an SP close to where an interviewer was visiting for another survey, they would visit the address to try to find the SP. If they located the SP or confirmed that the address was the SP’s residence, they requested a phone number and time for the RO to call back to complete the interview. If they were speaking to the SP, they could also provide a secure access code to complete the questionnaire online. If the SP was not there, the interviewer tried to collect any contact information that could be useful for tracing.

Data quality analysis was performed to verify the completeness and accuracy of each case. Cases with missing or ambiguous data in key fields, or where the data collected were for someone other than the SP, were reactivated and sent back to collection for follow-up. There were 41 reactivated cases in the 2021 CUS. Cases that passed the data quality analysis were compiled into batches for processing, as described in Section 7.2.1.

Quality management of the collection operation involved a two-day virtual training session for regional data collection managers, who in turn trained their interviewers. Weekly meetings between head office and ROs were held during collection to discuss progress and address any issues that arose. A ticket-based communication tool was used to centralize and facilitate communication between head office and ROs. It tracked all questions and issues and ensured that each one was resolved in a timely manner. RO managers allocated resources to the survey while balancing the needs of other surveys taking place in their region. Sustained efforts to interview persons who initially refused to participate in the survey improved response rates.

Detailed management reports were created at head office on a daily and weekly basis to document survey collection progress. The reports presented the number of cases collected and response rates by province of selection and sampling frame.

7.3.3 Tracing

As part of the sample preparation, cases were linked to tax and other administrative data to provide updated contact information for the SP and their household members. In some cases, initial CATI data were outdated or incomplete, and tracing was required. Tracing is the process of searching for contact information for either an SP or a suitable proxy, and it is a major part of the CUS.

Tracing leads were loaded into the CATI application as alternate contacts prior to collection, and additional leads were sent to the ROs as they were found in processing during the collection period. More tracing source files were sent to collection for the 2021 CUS (29 files, compared with 13 in 2016), and an improvement in processing meant that only new phone numbers and addresses were sent to the ROs, with no duplication of previous sources.

The CUS had agreements with and received tracing information from 11 provinces and territories, 9 of which used deemed employees. Head office sent files containing names of SPs, which were matched with health care files and sent back with updated contact information. Having a deemed employee meant that both the name and date of birth of the SP could be supplied, making it easier to match the files.

At the start of data collection, only 2.1% of the cases had insufficient contact information and needed to be traced. Because of the quality and quantity of tracing sources provided by head office, 90.6% of the completed cases used phone numbers that were provided by head office. Another 8.6% of the completed cases were contacted with a new phone number that was found by the RO tracing efforts, and a final 0.8% were completed when respondents called in to the RO.

7.3.4 Collection statistics

Many statistics were monitored throughout the data collection period, and they were analyzed after collection was completed.

Table 7.3.4.1 shows the provincial and territorial completion rates by collection method. Of the 7,702 completed cases, 87.6% were completed by CATI and 12.4% by online self-response.

Table 7.3.4.1
Completion counts and rates by collection method for Canada, provinces and territories of selection Table summary
This table displays the results of Completion counts and rates by collection method for Canada, provinces and territories of selection. The information is grouped by Provinces and territories (appearing as row headers), Cases sent, Self-response, Total and Interviewer, calculated using completion rate (%) and cases completed units of measure (appearing as column headers).
Provinces and territories Cases sent Interviewer Self-response Total
Cases completed Completion rate (%) Cases completed Completion rate (%) Cases completed Completion rate (%)
NPR-CA = non-permanent residents without a known province
Source: Statistics Canada, 2021 Census Undercoverage Survey.
Canada 12,787 6,745 52.7 957 7.5 7,702 60.2
Newfoundland and Labrador 503 291 57.9 33 6.6 324 64.4
Prince Edward Island 487 278 57.1 52 10.7 330 67.8
Nova Scotia 620 374 60.3 37 6.0 411 66.3
New Brunswick 522 286 54.8 34 6.5 320 61.3
Quebec 1,315 769 58.5 93 7.1 862 65.6
Ontario 2,406 1,214 50.5 235 9.8 1,449 60.2
Manitoba 852 451 52.9 47 5.5 498 58.5
Saskatchewan 832 425 51.1 51 6.1 476 57.2
Alberta 1,375 703 51.1 100 7.3 803 58.4
British Columbia 1,746 828 47.4 152 8.7 980 56.1
Yukon 460 239 52.0 29 6.3 268 58.3
Northwest Territories 632 345 54.6 33 5.2 378 59.8
Nunavut 950 529 55.7 56 5.9 585 61.6
NPR-CA 87 13 14.9 5 5.7 18 20.7

Table 7.3.4.2 shows the completion rates by sampling frame and collection method. As expected historically, the non-permanent resident frame had the lowest completion rate, 49.4%, as SPs in this frame tend to be more mobile and have less contact information, making tracing more difficult.

Table 7.3.4.2
Completion counts and rates by sampling frame and collection method for Canada Table summary
This table displays the results of Completion counts and rates by sampling frame and collection method for Canada. The information is grouped by Sampling frames (appearing as row headers), Cases sent, Self-response, Total and Interviewer, calculated using completion rate (%) and cases completed units of measure (appearing as column headers).
Sampling frames Cases sent Interviewer Self-response Total
Cases completed Completion rate (%) Cases completed Completion rate (%) Cases completed Completion rate (%)
Source: Statistics Canada, 2021 Census Undercoverage Survey.
Total 12,787 6,745 52.7 957 7.5 7,702 60.2
2016 Census 6,773 3,720 54.9 482 7.1 4,202 62.0
2016 missed 1,310 691 52.7 84 6.4 775 59.2
Births 671 377 56.2 40 6.0 417 62.1
Immigrants 553 280 50.6 87 15.7 367 66.4
Non-permanent residents 1,438 564 39.2 146 10.2 710 49.4
Yukon 460 239 52.0 29 6.3 268 58.3
Northwest Territories 632 345 54.6 33 5.2 378 59.8
Nunavut 950 529 55.7 56 5.9 585 61.6

Table 7.3.4.3 shows the completion rates by sex and age group. The lowest completion rates were for both sexes aged 20 to 44 years, and the best rate was for females aged 45 years and older.

Table 7.3.4.3
Completion counts and rates by collection method, sex and age group for Canada Table summary
This table displays the results of Completion counts and rates by collection method, sex and age group for Canada. The information is grouped by Sex and age groups (appearing as row headers), Cases sent, Self-response, Total and Interviewer, calculated using completion rate (%) and cases completed units of measure (appearing as column headers).
Sex and age groups Cases sent Interviewer Self-response Total
Cases completed Completion rate (%) Cases completed Completion rate (%) Cases completed Completion rate (%)
Note: This table excludes four cases for which the sex was unknown.
Source: Statistics Canada, 2021 Census Undercoverage Survey.
Both sexes 12,783 6,745 52.8 957 7.5 7,702 60.3
0 to 19 years 1,930 1,062 55.0 140 7.3 1,202 62.3
20 to 29 years 2,420 1,198 49.5 169 7.0 1,367 56.5
30 to 44 years 4,697 2,303 49.0 389 8.3 2,692 57.3
45 years and older 3,736 2,182 58.4 259 6.9 2,441 65.3
Males 6,952 3,609 51.9 496 7.1 4,105 59.0
0 to 19 years 963 530 55.0 74 7.7 604 62.7
20 to 29 years 1,273 623 48.9 99 7.8 722 56.7
30 to 44 years 2,678 1,305 48.7 199 7.4 1,504 56.2
45 years and older 2,038 1,151 56.5 124 6.1 1,275 62.6
Females 5,831 3,136 53.8 461 7.9 3,597 61.7
0 to 19 years 967 532 55.0 66 6.8 598 61.8
20 to 29 years 1,147 575 50.1 70 6.1 645 56.2
30 to 44 years 2,019 998 49.4 190 9.4 1,188 58.8
45 years and older 1,698 1,031 60.7 135 8.0 1,166 68.7

7.4 Estimation

The CUS estimate was divided into two parts. First, the SPs were weighted, and then the census undercoverage was calculated. Weighting involves determining the initial sampling weights of SPs, and all adjustments made to these initial weights, to create the SPs’ final weights. Weighting involves several steps that are described in Sections 7.4.1 to 7.4.4. The methodology for calculating census undercoverage is described in Section 7.4.6.

7.4.1 Calculating the initial weights

For SPs of all sampling frames except the 2016 missed frame, initial weights were based on the inverse of the probability of being selected in the sample. However, the initial weight of an SP from the 2016 missed frame corresponds to the final weight assigned to it during the 2016 CUS when the SP was classified as “missed.”

7.4.2 Initial weight adjustments

The weights of SPs from the 2016 Census frame who were enumerated more than once in 2016 were adjusted downward to account for the fact that these individuals had more than one chance of being selected.

Then, the initial influential weights in the 2016 missed frame were adjusted. The objective was to reduce the effect of high and influential weights on estimates and standard errors through the trimming of their initial weights. Some of the 4,821 people in the 2016 missed frame had a very high initial weight. The method used was to truncate weights to a multiplier of the median of weights in each trimming group formed. The trimming groups were formed by the province of selection and five age groups. The weight of a person with a weight above the threshold was reduced to that value. The truncated weights were redistributed evenly to other persons in the trimming group.

7.4.3 Non-response adjustment

To reduce statistical bias, the initial weights of respondents had to be adjusted to account for non-response. The weight of persons who could not be classified (non-respondents) was redistributed among persons who were classified (respondents). There are three types of non-response. First, there are the unidentified persons (only 12 SPs). The initial weights of these persons were transferred to identified persons in each sampling stratum.

The second type of non-response involves untraced persons (4,990 SPs). The adjustment involved forming response homogeneity groups (RHGs) among unlisted persons (listed persons being the persons classified without the help of CUS collection) and transferring the weight of untraced persons to unlisted traced persons within the RHGs.

The first step in the creation of the RHGs was to group unlisted persons (12,337 SPs) into main groups based on their estimated propensity to be in the target population. The groups were formed based on an analysis of the correlation between several tax indicators, particularly those for 2020 and 2021, and the final classification for unlisted traced persons. Up to seven main groups were created based on the sampling frame. These main groups were also strongly correlated with the likelihood to respond. The second step in creating RHGs was to group unlisted persons based on their likelihood to respond in each domain, with a domain being defined by crossing a sampling frame with a main group. In each domain, the likelihood to respond was analyzed using a national logistic regression model (and regional, when the data allowed it) and an analysis of multi-level, cross-frequency tables. For the models, several auxiliary variables available for both traced and untraced persons were used: variables available in the sampling frames (e.g., age, sex, relationship to other household members, country of origin, and type of non-permanent resident), variables available in the tax data for related persons (e.g., whether they were in certain files, frequency of address changes since 2016, and type of address), variables related to contact information (e.g., number and sources of telephone numbers, address availability and link of last known address with the 2021 Census), and a few other variables. Thus, the auxiliary variables that were significantly correlated with the likelihood to respond were determined and used to form the RHGs. In most domains, the RHGs were formed within the province or territory of selection. Therefore, the adjustment consisted of transferring the weight of untraced persons to unlisted traced persons within each RHG.

The third non-response adjustment was the adjustment for unclassified persons (169 SPs). An unclassified person is a person who had their primary residence in a given province or territory on Census Day (thus in the census target population), but for whom it was not certain whether they were missed or enumerated. Using the same principle as with untraced persons, homogeneous groups of classified persons were formed within each sampling frame and province of classification. The adjustment consisted of transferring the weight of unclassified persons to unlisted classified persons within each homogeneous group.

7.4.4 Final adjustments to the weights for classified persons

7.4.4.1 Adjustment for influential weights

At this stage, some SPs have a weight that is high and considered influential in their province of classification. To reduce the effect of high and influential weights on provincial estimates and their standard errors, an adjustment to influential weights was made in the five frames for the provinces. The method used was to trim weights by a multiplier of the median of weights in each trimming group formed. There are two types of influential weights at this stage.

First, there are SPs whose province of classification is different from the province of selection. Therefore, the weight is very high compared to other SPs in this province of classification. Consider, for example, an SP selected in Ontario with a large weight, who is classified in Prince Edward Island. In this situation, the weight is truncated according to the threshold established by trimming group. A factor between four and six times the median for each group was used as a pruning threshold. The trimming groups were formed according to the province of classification and five age groups. The truncated weights of an SP were redistributed evenly to the other SPs in the same province of selection, the same sampling frame, the same classification (enumerated, missed or out-of-scope person), the same status (listed or unlisted) and by age group. Therefore, the influential weight of a missed SP in a given province of classification was allocated to other missed persons, but in the province of selection of the SP. For this first type of influential weight, there were 49 SPs whose weight was truncated (i.e., 33 enumerated persons and 16 missed persons).

The second type of influential weight relates to the SPs from the 2016 missed frame only, who still had a high and influential weight within their province of classification even though it was identical to the province of selection (which is, in fact, the province of classification in 2016). For this type of influential weight, the threshold was set at four times the median weight in the trimming group. The truncated weights of SPs were redistributed evenly to the other SPs in the same province of classification and the same classification, thus having no effect on the estimate of provincial undercoverage. For this first type of influential weight, there were 95 SPs whose weight was truncated (i.e., 10 enumerated persons, 55 missed persons and 30 out-of-scope persons).

7.4.4.2 Weight calibration for the birth frame

For the birth frame sample, enumerated persons were calibrated to take into account cases where a provincial sample would contain too many or too few enumerated persons. An automated deterministic linkage applied to the 2021 CCS-RDB helped to determine the control totals per province for the enumerated persons calibration group. Then, for the other persons in the frame, a linkage to the tax data determined their province of residence on Census Day (otherwise, the province of selection was used) to determine the control totals per province for the non-enumerated persons calibration group. In addition, control totals by year of age (0-4 years) were calculated. The calibration was carried out using a raking mechanism for the margins using the 20 control totals described above as the first margin, and 5 calibration groups by age as the second margin. To this end, Statistics Canada’s Generalized Estimation System (G-EST) was used.

7.4.4.3 Weight calibration for the immigrant frame

For the immigrant frame sample, a calibration of the number of persons in certain calibration groups was carried out to take into account cases where a provincial sample would contain too many or too few enumerated persons or persons in other groups. An automated deterministic linkage applied to the 2021 CCS-RDB helped to determine the control totals per province for the enumerated persons calibration group. Then, for the other persons in the frame, a linkage to the tax data determined their tax status (active or non-active) and their province of residence on Census Day (otherwise, the province of selection was used) to determine control totals by province for the other non-enumerated persons calibration groups. In the four largest provinces, three control totals were determined: for enumerated persons, for persons with recent fiscal activities, and for other persons. However, in the other six provinces, only two control totals were determined: for enumerated persons and for other persons. Thus, 24 control totals were formed. A simple poststratification method was then used to calibrate the immigrant frame.

7.4.4.4 Post-stratification adjustment for the territories

After the initial weight adjustment, the estimated number of enumerated persons in the territories was observed to be traditionally lower than the comparable census count. This was due to undercoverage of the census target population in health insurance files. To address this undercoverage, the weights of the SPs selected in each territory were adjusted so that the estimated number of enumerated persons equalled the comparable census count for that territory. The adjustments were made for six calibration groups (by age and gender) in each territory.

7.4.4.5 Adjustment for overlap of frames or strata

For a small number of SPs in the five provincial frames, the weight is not the final weight, as another adjustment must be made to take into account the overlap between the sampling frames or, in some cases, the overlap between the census frame strata (i.e., overcoverage in 2016), but which was noted only after the CUS collection in 2021. As for the few SPs who overlap frames, it is mostly SPs from the immigrant frame or the non-permanent resident frame who were finally taken into account in the 2016 Census frame (i.e., enumerated in 2016). This information was not known when these sampling frames were prepared. Therefore, an adjustment factor was calculated taking into account the probability of selection in both sampling frames.

7.4.5 Weighted distribution by classification

Table 7.4.5 shows the weighted distribution of SPs by classification and sampling frame. For a reminder of the definitions, see Section 7.2. Only SPs found in the CCS-RDB were classified as “enumerated.” Persons who were in the target population but not in the CCS-RDB were classified as “missed.” The remaining SPs were classified as “out of scope” (e.g., deceased or emigrated).

Table 7.4.5
Weighted classification of selected people, sample frames for Canada Table summary
This table displays the results of Weighted classification of selected people, sample frames for Canada. The information is grouped by Classification (appearing as row headers), Provincial strata, Non-permanent residents, 2016 Census, Total, Immigrants, Territorial frames, Territorial strata, Births and 2016 missed, calculated using number and % units of measure (appearing as column headers).
Classification Provincial strata Territorial strata Total
2016 Census 2016 missed Births Immigrants Non-permanent
residents
Territorial frames
number % number % number % number % number % number % number %
Source: Statistics Canada, 2021 Census Undercoverage Study.
Total 32,933,387 100.0 2,830,944 100.0 1,855,111 100.0 1,072,833 100.0 1,140,539 100.0 137,867 100.0 39,970,681 100.0
Enumerated 29,127,257 88.4 1,784,797 63.0 1,646,438 88.8 874,651 81.5 639,236 56.0 94,583 68.6 34,166,962 85.5
Listed 29,023,031 88.1 1,773,009 62.6 1,643,876 88.6 871,537 81.2 626,405 54.9 94,272 68.4 34,032,130 85.1
Not listed 104,226 0.3 11,788 0.4 2,562 0.1 3,114 0.3 12,831 1.1 311 0.2 134,832 0.3
Missed 2,083,885 6.3 662,494 23.4 164,767 8.9 130,942 12.2 387,586 34.0 32,760 23.8 3,462,434 8.7
Listed 243,914 0.7 41,300 1.5 16,954 0.9 7,987 0.7 14,693 1.3 5,567 4.0 330,415 0.8
Not listed 1,839,971 5.6 621,194 21.9 147,813 8.0 122,955 11.5 372,893 32.7 27,193 19.7 3,132,019 7.8
Out of scope 1,722,245 5.2 383,653 13.6 43,906 2.4 67,240 6.3 113,717 10.0 10,524 7.6 2,341,285 5.9
Listed 1,402,710 4.3 206,632 7.3 25,675 1.4 3,964 0.4 32,613 2.9 7,768 5.6 1,679,362 4.2
Not listed 319,535 1.0 177,021 6.3 18,231 1.0 63,276 5.9 81,104 7.1 2,756 2.0 661,923 1.7

7.4.6 Calculating census undercoverage

Note the following definitions:

C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@36BE@
=
published census count of the number of persons in the target population
U ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyvayaaja aaaa@36E1@
=
undercoverage estimate
=
estimate of the number of persons not included in C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@36BE@ who should have been
M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@
=
estimate of the number of persons in the CUS target population who were not enumerated
=
sum of the final weight of persons considered to be missed
X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@
=
the number of persons included in C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@36BE@ who could not be identified with certainty as enumerated in the CUS.

Census population undercoverage was estimated by the number (weighted) of missed persons less the number of persons counted in the census (term C) but excluded from the CCS-RDB:

U ^ = M ^ X MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyvayaaja Gaeyypa0JabmytayaajaGaeyOeI0Iaamiwaaaa@3A93@

X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@ has three components: imputations, incomplete enumerations and late enumerations.

The SP’s address on Census Day refers to a dwelling for which an enumeration was imputed. This was the case in particular for non-response dwellings for which another household’s data were used in WHI.

Some enumerations in the census database were deemed too incomplete to be used by the CUS to determine whether an SP was enumerated. Incomplete enumerations in this context usually involve missing or invalid date of birth or name data (e.g., “?”, “Mr.”, “Unknown” or “Person 1”). An SP enumerated in this manner was classified as “missed.” This was referred to as a “CUS incomplete enumeration.” This category of enumeration also includes certain types of collective dwellings for which only the number of usual residents was collected in the census (no names or dates of birth). Data of people living in these collective dwellings was imputed from the RDB.

At the national level, X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@ made up slightly less than half of M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@ . The value of X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@ increased from 2016 because of an increase in the number of persons imputed as part of the WHI and the increase in imputations in certain types of collective dwellings (incomplete enumerations).

Table 7.4.6 shows the national numbers for the various components of the population undercoverage estimate, namely the numbers for the three components of the term X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@ .

Table 7.4.6
Components of the population undercoverage estimate for Canada Table summary
This table displays the results of Components of the population undercoverage estimate for Canada. The information is grouped by Components (appearing as row headers), , calculated using (appearing as column headers).
Components Number of people
CUS = Census Undercoverage Study
M = number of people in the Census Undercoverage Study (CUS) target population who were not enumerated
X = number of people included in the published census count but who could not be identified with certainty as enumerated in the CUS
U = undercoverage
Source: Statistics Canada, 2021 Census Undercoverage Study.
Estimate of M 3,462,434
Total X 1,564,558
X for imputed people 931,346
X for late enumerations 0
X for CUS incomplete enumerations 633,212
Estimate of U 1,897,876

Lastly, the variance of the undercoverage estimates was calculated as follows:

v( U ^ )=v( M ^ X)=v( M ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamODaiaacI caceWGvbGbaKaacaGGPaGaeyypa0JaamODaiaacIcaceWGnbGbaKaa cqGHsislcaWGybGaaiykaiabg2da9iaadAhacaGGOaGabmytayaaja Gaaiykaaaa@4377@

v( M ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamODaiaacI caceWGnbGbaKaacaGGPaaaaa@392D@ = estimated variance of M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@ as determined by the CUS design.

The variance was calculated using the classic bootstrap resampling method. To that end, weights of 500 bootstrap replicates were produced.


Date modified: