3.1 Sources
Although census data collection and processing have to meet high quality standards, it is very difficult to eliminate all potential errors. There are two kinds of population coverage error. Population undercoverage refers to the exclusion of persons who should have been enumerated, and population overcoverage refers to the inclusion of persons who were enumerated more than once (generally twice). Overcoverage also includes persons who were enumerated but should not have been. However, this type of error is considered negligible; consequently, it is not measured.
Undercoverage can occur in the first stage of the census if the list of dwellings used for the dwelling universe is incomplete. This risk is higher, for example, if a dwelling is under construction. Conversely, overcoverage can occur if a dwelling is listed twice.
Coverage error can also occur during the field data collection stage. Respondent error is responsible for coverage error when the person completing the census form omits someone whose usual place of residence, according to census rules, is the dwelling concerned; this is undercoverage. The person may also include someone whose usual place of residence is not the dwelling concerned; there is overcoverage if this person has already been enumerated at their usual place of residence or somewhere else. In most cases, it is easy to determine a person’s usual place of residence. However, as stated in the previous section, the process is sometimes more complex, and special rules have been developed for determining an individual’s usual place of residence. The rules are spelled out in the census questionnaire, but the list is long, and there can be comprehension difficulties. Coverage error may result when the rules are not consulted or are incorrectly applied. The idea of using Census Day as the reference date for determining usual residence may also be misunderstood, and this can lead to coverage error.
Coverage errors may also be committed during the processing stage at any point when records for persons or households are added to or removed from the census database. Records can be deleted by mistake. Questionnaires may be linked to the wrong record or returned too late to be included.
Even though efforts are made to enumerate the homeless population, the risk of undercoverage is high. Some other living arrangements are also susceptible to coverage error. For example, young adults newly away from home may be either undercovered, because neither their roommates nor their parents include them in the census questionnaire, or overcovered, because they are included in both census questionnaires. Persons who maintain a second residence because of their employment can also cause coverage error.
Users should also be aware of the extent to which reserves and settlements participated in the 2021 Census. In some cases, enumeration was not permitted by the community or was interrupted before it could be completed. These geographic areas (63 in all in 2021, an increase from 14 in 2016) are considered incompletely enumerated reserves and settlements. There are no 2021 data for incompletely enumerated reserves and settlements, and those areas are not included in the totals. Similar problems have occurred in previous censuses. For example, 22 reserves and settlements were incompletely enumerated in the 2006 Census, and 31 in the 2011 Census.
The population estimates for the 63 incompletely enumerated reserves and settlements are based on a model. However, since no reliable source is available to verify the assumptions in the model, the estimates must be used with caution. For more information, see Section 12.2.
3.2 Control
Potential sources of coverage error were recognized during the planning stage of the 2021 Census, and the following measures were taken to minimize the associated risks:
- Collection unit (CU) boundaries were carefully defined and mapped to ensure that no geographic areas were left out or included twice.
- List/leave areas: The enumerator’s manual contained instructions on how to enumerate a CU so as to minimize the risk of missing dwellings. The total number of dwellings from the 2016 Census was provided to field operations supervisors to help them identify significant changes. In addition, when the listing operation resulted in a substantial difference in the number of dwellings relative to the 2016 Census, the listing was checked. Lastly, specific quality control procedures were applied to the CU to evaluate and correct any changes made in the listing.
- Mail-out areas: Mail-out was based on a list of addresses from Statistics Canada’s Address Register. This list was updated regularly, and listing activities were carried out (in the field and remotely) mainly in areas where frame errors were more likely. These listing activities were carried out continuously, but more intensively in the two years preceding the census. The work of enumerators was closely monitored.
- In 2021, the mail-out with drop-off (MODO) methodology was introduced. MODO areas are those where all dwellings have addresses, the majority of which are mailable. In these mixed areas, those dwellings with a valid mailing address were mailed the regular mail-out material (just like the mail-out areas), while those that did not have a valid mailing address (that corresponds to the civic address) received an invitation letter dropped at their door by a census employee. The MODO areas were introduced to maximize the number of census mail-out dwellings. As for mail-out areas, MODO was based on a list of addresses from Statistics Canada’s Address Register, and the list was updated regularly.
- Special procedures were defined for the enumeration of the population residing on reserves.
- Advertisements informed Canadians about the census and indicated what to do if they did not receive a questionnaire.
- The Census Help Line (CHL) was available to answer any questions about the census, including questions about coverage.
- When calls were received at the CHL regarding a dwelling that may have been missed by the census, a process was in place to examine the surrounding area for other potential missed dwellings.
- There was a “Whom to include” section in the questionnaire so respondents could determine which persons should be included. Also, slightly more than 84% of the responses to the 2021 Census were obtained through the Internet, and the electronic questionnaire included additional verification questions when respondents reported a dwelling as unoccupied or non-existent, or if they had a problem determining whether a person should be included or not.
- In the questionnaire, respondents were asked to indicate whether any persons had not been listed because they were not sure they should be included. The electronic questionnaire provided guidance so respondents could make the right decision. In the other cases, a telephone follow-up was subsequently carried out with the respondent to determine whether the persons in question should be listed in the questionnaire or not.
- Telephone follow-up was carried out after questionnaires were reviewed for coverage inconsistencies or to verify household status, including questionnaires containing only foreign residents or persons temporarily present.
These procedures, along with appropriate staff training, supervisory checks and quality controls during the collection and processing stages, helped to reduce the number of coverage errors.
3.3 Definitions
Algebraic definitions of coverage errors are presented in this section. Let
denote the total or the “actual” number of persons targeted by the Census of Population. Let
denote the published census count of persons in the target population. The error associated with using
instead of
is as follows:
This error, denoted as
,
is the net population coverage error.
Let
denote population undercoverage, the number of persons not included in
who should have been.
The census count
is composed of two elements:
where:
is the number of persons enumerated. This is the number of persons who were listed on a census questionnaire.
is the number of persons imputed. This is an estimate of the number of persons missed because their dwelling was classified as occupied but non-response or was misclassified as unoccupied (therefore, no follow-up was done). For more information on whole household imputation (WHI), see Section 3.6 of the Sampling and Weighting Technical Report, Census of Population, 2021, Statistics Canada Catalogue no. 98-306-X.
Undercoverage compared with the published census count
is therefore what remains of the persons who should have been listed on a census questionnaire and who were not taken into account by the WHI. In other words, it does not include the estimate of the number of persons who were not enumerated either because no completed census questionnaire was returned for the dwelling (non-response dwelling) or because the dwelling was misclassified as unoccupied (classification error) and did not receive a questionnaire.
The concept of undercoverage before the WHI also exists. This is what is referred to as Census of Population collection undercoverage. For more information, see Section 12.1.
Let
denote population overcoverage, the number of excess enumerations included in
that should not have been.
has two components. One is the excess enumerations of persons enumerated more than once. Coverage studies focus on these excess enumerations. The second is persons who were enumerated but who were not in the census target population. For example, foreign residents visiting Canada who are listed on a census questionnaire as usual residents of a dwelling should not be included in
.
Fictitious persons are another example. According to previous studies, the number of persons who are enumerated but are not in the census target population is generally very small and can be ignored. Consequently, census coverage does not measure this component of coverage error.
Since
refers to persons who were not enumerated but should be included in
and since
denotes enumerations that should not be included in
,
the difference between
and
is
less
.
That is:
The actual number of persons in the census target population is therefore:
In practice, for reasons of cost and timeliness of the data produced, an estimate of
is given by
,
based on sample studies, where:
is an estimate of the number of persons not included in
who should have been, and
is an estimate of the number of persons included in
who should not have been. We can assume that overcoverage from persons included in
who are not in the census target population is zero, since it is negligible. Consequently,
is simply an estimate of the number of duplicate enumerations. The purpose of census coverage studies is to determine the values of
and
.
In summary, the actual population
is composed of the census count
and the net undercoverage
.
This is referred to as net undercoverage because
is generally larger than
in the context of the current census in Canada. However, the opposite is possible, whereby
would be negative.
consists of
plus the number of persons added in WHI, and this imputation
targets persons living in non-response dwellings or in occupied dwellings misclassified as unoccupied.
Census population coverage errors can generally be expressed as rates relative to the actual population. The undercoverage rate
is
as a percentage of
.
The overcoverage rate
is
as a percentage of
.
The net undercoverage rate
is the difference between
and
as a percentage of the census target population. These three rates can be estimated by
,
and
,
as follows:
A positive net undercoverage rate indicates that the undercoverage rate is higher than the overcoverage rate. That is, the number of persons not included in the published census count
is higher than the number of excess enumerations. That is generally the case for all Canadian censuses. For some domains of interest, however, negative net undercoverage is sometimes observed.
3.4 Evaluation
Two postcensal studies were carried out to estimate the 2021 Census population coverage error. The Census Undercoverage Study (CUS) provided estimates for population undercoverage, while the Census Overcoverage Study (COS) estimated population overcoverage. As previously mentioned, the Dwelling Classification Survey (DCS) does not contribute to census coverage error estimates since census counts are already adjusted to take DCS results into account.
The CUS and COS were conducted subsequent to field collection and census processing operations. Preliminary estimates of 2021 Census population coverage error were released on April 28, 2023. Following an in-depth validation exercise with the Centre for Demography and the provincial and territorial statistical focal points, final estimates were released on September 27, 2023. The data were released at the same time as the new official demographic estimates reflecting the update of the base population to the 2021 Census. Census population counts adjusted for net population undercoverage constituted the updated estimates of the base population.
A brief description of the methodology used in the two census coverage studies is presented below:
Census Undercoverage Study
In the CUS, a random sample of individuals representing the 2021 Census target population was selected from frames independent of the census. These frames are described in Section 7.1. The 2021 Census database was then searched to determine whether these persons had indeed been enumerated.
Where necessary, interviews were conducted, mostly via computer-assisted telephone interviewing from the regional offices (ROs), to collect information for use in additional searches of the 2021 Census database. An interview was completed for 60.2% of the 12,787 cases sent to the ROs. The sampling weight was adjusted for non-response. Specifically, the total sampling weight of non-respondents was divided among groups of respondents most like the non-respondents in their response probability.
The estimate of population undercoverage is based on the number of persons in the CUS sample who were classified as “missed.” These persons were part of the target population for the 2021 Census, but no evidence of enumeration could be found in the 2021 Census Response Database. Nationally, 6,212 persons in the CUS sample were classified as missed in the provinces and 1,241 in the territories.
Census Overcoverage Study
Overcoverage was measured by creating a list of potentially duplicated records on the 2021 Census database using probabilistic and deterministic linkage methods to match the final 2021 Census database to itself, and then matching the final 2021 Census database and a list of persons who should have been enumerated according to administrative data sources. Probabilistic linkage estimates the probability that linked pairs are true matches based on the agreement patterns obtained by comparing the linkage variables between records. Deterministic linkage classifies pairs of records as a match when the linkage variables mostly agree and a non-match otherwise. No weight is given to the strength of the match.
A random sample of potential duplicates was selected from the list created from the linkages, and the pairs were manually verified by comparing demographic characteristics and names to identify true cases of overcoverage. Population overcoverage was estimated from this sample.