Coverage Technical Report, Census of Population, 2021
8. Census Overcoverage Study

8.1 Overview

Overcoverage error occurs when in-scope individuals are enumerated more than once or when individuals who should not have been enumerated are included in the target population of a survey or a census. The purpose of the Census Overcoverage Study (COS) is to estimate the number of persons enumerated more than once in the Canadian Census of Population.Note 1

The 2021 COS consisted of two types of linkages, namely deterministic and probabilistic. The deterministic linkage (DL) identified definite pairs of duplicate persons, meaning those persons were enumerated more than once and hence represent overcoverage. The methodology was based on a modification of the Automated Match Study (AMS), which had been used in previous census cycles to evaluate the COS. The probabilistic linkage (PL) identified possible pairs of duplicate persons and was based on the methods used in past cycles of the COS. The COS used data from the 2021 Census Response Database and administrative data from the Canadian Statistical Demographic Database provided by the Census Research section of the Statistical Integration Methods Division. The COS sampling frame was created in multiple steps and includes definite and possible pairs of duplicate persons identified with both the DL and PL, along with an extension of the sampling frame based on households. A sample of possible pairs of duplicate persons was drawn from the COS frame and sent for manual verification to determine whether the sampled pairs were indeed duplicate persons. With the result of the manual verification of the sampled pairs, and definite pairs of duplicate persons identified by the DL, an estimate of overcoverage was then obtained.

8.2 Linkage steps

8.2.1 Data used for the linkages

Two sources of data were used for the linkages.

Firstly, the Census Coverage Studies version of the Census Response Database (CCS-RDB, referred to as the RDB in this chapter) was a version of the Census Response Database that did not include late or incomplete enumerations, or persons added through the whole household imputation process. The RDB contained a little over 35 million records and included responses from individuals living in both private and collective dwellings. It contained names (including given names and surnames), demographic information (including date of birth and sex) and geographic information (including province or territory and postal code, and census geographic variables such as collection unit (CU), census subdivision (CSD) and census metropolitan area (CMA)).

Secondly, administrative (ADM) data were used, based on the Canadian Statistical Demographic Database provided by the Census Research section of the Statistical Integration Methods Division. They comprised records from multiple ADM data sources and aimed to represent persons in scope for the census. The ADM data consisted of around 53 million records. They included names (given names and surnames), demographic information (including date of birth and sex) and geographic information (including province or territory and postal code).

The following matching variables were used in the linkages (when applicable):

  • names: given name(s) and surname(s) variables
  • demographic data: date of birth and sex variables
  • geographic data: province or territory and postal code, and census geographic variables.

8.2.2 Deterministic linkage

The purpose of the DL was to identify high-quality pairs of duplicate persons, consisting of two records from the RDB, which were classified as definite pairs of overcoverage. The deterministic matching programs traditionally used for the AMS were modified to include, as part of the linkage criteria, a comparison of names and also considered matches between a household living in a private dwelling and a household living in a collective dwelling.

The DL was based on the following series of operations:

  • Deterministic matching programs were used to identify household pairs that were “similar.” Similarity was described in terms of their relative geographic proximity (households within the same CU, households in different CUs within the same CSD, etc.) and the number of persons matched between them. Persons were matched based on the variables of name, sex and date of birth. Two persons were said to be an exact match if they had the same sex; day, month and year of birth; and names also match. Two persons were said to be a near match if their names matched and three of the four other components (sex and day, month and year of birth) agreed or just the day and month of birth were reversed. Household pairs consisted of one or both households living in a private dwelling.
  • An initial list of possible pairs of duplicate persons was created from household pairs.
  • A verification sample was taken from the initial list of possible pairs of duplicate persons for manual verification purposes to confirm their high quality before classifying them as definite pairs of duplicate persons (i.e., overcoverage).
  • A final list of pairs of duplicate persons was determined, and they were classified as definite pairs of duplicate persons resulting from the DL.

There were 460,572 definite pairs of duplicate persons resulting from the DL.

8.2.3 Probabilistic linkage

The purpose of the PL was to identify possible pairs of duplicate persons. The PL consisted of an internal probabilistic record linkage of the entire RDB to itself, referred to as the RDBRDB linkage, and an external probabilistic record linkage of the RDB to ADM data, referred to as the RDBADM linkage. The RDBRDB linkage resulted in pairs of RDB records, whereas the RDBADM linkage resulted in pairs where one record was from the RDB and the other record was from ADM data, and pairs of RDB records were later derived.

PL is conducted with G-Link, a probabilistic record linkage system designed at Statistics Canada that uses the Fellegi–Sunter method to solve large file linkage problems when there are no direct identifiers common to both sources (Fellegi and Sunter, 1969). As in past cycles, G-Link was used in 2021, and the following series of operations were done separately for the RDBRDB and RDBADM linkages.

The first task in a probabilistic linkage is to build a set of potential pairs (also known as a linked set),which is used to estimate the characteristics of the set of true matched pairs. To do this, a set of selection criteria was applied, which reduced the Cartesian product of all the possible matches to a more manageable comparison space. Improvements were made in the 2021 selection criteria to overcome challenges that arose with the 2016 selection criteria. In addition, rather than use identical selection criteria for both the internal and external linkages, criteria were developed, tested and optimized separately for these two linkages. Many of the RDB pairs derived from the RDBADM linked set had corresponding RDBADM pairs that were captured by different criteria, suggesting that a direct comparison would not be able to capture them. The selection criteria of the internal RDBRDB linkage returned a linked set of 86,429,651 RDBRDB pairs. The selection criteria of the external RDBADM linkage returned 70,274,756 pairs. Of these, 41,474,581 involved multiple RDB records linked to the same ADM record; hence, the RDBADM linked set contained these 41,474,581 pairs.

Once a linked set of pairs was obtained, the records of the pairs were compared by applying linkage rules in G-Link, which calculated the weights of the results of the linkage rules. Quality linkage rules that address all sets of characteristics for which two records agree were necessary to ensure the completeness of the COS sampling frame resulting from the PL. If some sets of characteristics are not addressed by the linkage rules, then pairs with such characteristics are likely to be assigned a lower linkage weight and to be rejected when thresholds are applied. Many improvements were made in the 2021 linkage rules to ensure that estimated linkage weights were well correlated with the likelihood of a pair being a true match. In 2021, more linkage variables were added to the rules, and the outcomes for existing rules that had been used in 2016 were modified, such as the rules on names. Outcomes based on census-specific geographic variables—such as the unique identifier of a dwelling (known as the FRAME_ID), CU and CSD—were added in 2021 and were applicable only to the RDBRDB linkage.

A linkage weight threshold for each province and territory was then established separately for the RDBRDB and RDBADM linkages. The objective in choosing linkage weight thresholds was to optimally partition the pairs from the linked set into two classes: potential matched pairs and unmatched pairs. As in previous cycles, provincial and territorial thresholds were chosen outside G-Link because the built-in tools for finding thresholds did not work well with the many user-defined linkage rules used by the COS PL. The thresholds were selected in two steps. First, a set of preliminary thresholds was selected. In general, choosing a lower threshold is somewhat subjective. The COS employed the guidelines for profile reviews developed by the Social Data Linkage Environment (SDLE) experts at Statistics Canada to assist in choosing a preliminary lower threshold. To avoid missing potential overcoverage, a fairly low threshold was initially selected. Then a sample of pairs from above and below the preliminary threshold was selected, and the threshold was adjusted as required. The final threshold was chosen to minimize a target missed match rate of 0.01. Because a set of definite pairs of duplicate persons was obtained through the DL step, no upper threshold was selected. All the pairs from the RDBRDB and RDBADM linked sets whose weight was greater than the threshold were selected and considered to be potential pairs.

8.3 Creation of the Census Overcoverage Study sampling frame

The COS sampling frame was created in multiple steps and included linked pairs identified with the DL and the PL, along with an extension of the sampling frame based on households. Then, sampling units are created.

As previously described, the DL was used to identify a set of RDBRDB pairs that were classified as definite pairs of duplicate persons. The PL was used to identify a set of potential pairs. The internal linkage of the RDB to itself identified potential RDBRDB pairs that were classified as possible pairs of duplicate persons. RDBRDB pairs needed to be derived from the RDBADM linked set to include them in the COS sampling frame. Potential pairs identified through the RDBADM linkage were converted into RDBRDB pairs. Where two RDB records were linked to the same ADM record, those two RDB records became an RDBRDB pair. One-to-one RDBADM pairs were not of interest, as the goal was to measure overcoverage (i.e., duplicate persons) on the RDB. The final set of RDBRDB pairs derived from the RDBADM linkage contained 4,301,512 pairs, which were classified as possible pairs of duplicate persons.

As in previous cycles, the frame was then enriched with additional pairs not already identified by the PL but created from the households of pairs linked by the internal and external linkage steps. The purpose of this step was to identify additional possible pairs of duplicate persons in the households of captured pairs that may not have been caught with the PL, because the PL was based on comparisons of individuals rather than households. Potential pairs from this step were known as extension pairs and classified as possible pairs of duplicate persons. To construct the set of extension pairs, a household pair was first produced for each RDBRDB pair classified as a possible pair of duplicate persons resulting from the PL by adding the other household members to it. Second, sex and date of birth were used as variables to identify new RDBRDB pairs by comparing the persons present in the household pair. Comparison rules were applied to identify pairs that might represent overcoverage cases. The extension pairs included pairs from two private households, or pairs where an individual from a private household was linked to an individual from a collective dwelling. Pairs where both records were from collective dwellings were excluded.

The final linked set comprised pairs from the DL, extension pairs, pairs from the PL of the RDBADM and pairs from the PL of the RDBRDB.

Table 8.3.1
Breakdown of possible pairs of duplicate people by linkage type Table summary
This table displays the results of Breakdown of possible pairs of duplicate people by linkage type. The information is grouped by Linkage type (appearing as row headers), , calculated using (appearing as column headers).
Linkage type Frequency Percent
DL = deterministic linkage
RDB-ADM = probabilistic linkage of Census Response Database to administrative data
RDB-RDB = probabilistic linkage of Census Response Database to itself
Source: Statistics Canada, 2021 Census Overcoverage Study.
DL 460,572 3.62
Extension 471,688 3.71
RDB-ADM 4,301,351 33.80
RDB-RDB 7,491,998 58.87

When pairs in the PL set were also found by the DL, the linkage type was set to DL. Then, the possible pairs of duplicate persons obtained from the DL, the PL and the extension were combined and deduplicated.

Since 2011, the COS has used interconnected record groups to estimate overcoverage in the census rather than record pairs. This is because overcoverage estimated by record pairs would be positively biased in the presence of triple or higher-order enumerations. Thus, mutually exclusive groups of connected RDB records were formed, where most of the groups of records on the frame resulted in one or two pairs (involving two or three records). For cases where the groups of records contained more than 10 links, a graph theory method was applied to reduce the group into small subgroups called “neighbourhoods” (Dasylva et al., 2015) to facilitate manual verification.

Lastly, the COS sampling frame consisted of three types of sampling units: pairs, groups and neighbourhoods. Sampling units were categorized by three process types: (1) DL-only, composed of pairs and groups or neighbourhoods of RDB records resulting from the DL; (2) PL-only, composed of pairs and groups or neighbourhoods of RDB records resulting from the RDBRDB linkage, RDBADM linkage and extension pairs; and (3) PLDL, composed of groups or neighbourhoods of RDB records from both the PL and DL (including extension pairs).

Table 8.3.2
Distribution of deterministic linkage-only, probabilistic linkage-only and probabilistic linkage-deterministic linkage pairs, groups and neighbourhoods in the 2021 Census Overcoverage Study sampling frame Table summary
This table displays the results of Distribution of deterministic linkage-only, probabilistic linkage-only and probabilistic linkage-deterministic linkage pairs, groups and neighbourhoods in the 2021 Census Overcoverage Study sampling frame. The information is grouped by Sampling unit types (appearing as row headers), PL-DL, Process type, PL-only, Total and DL-only, calculated using units of measure (appearing as column headers).
Sampling unit types Process type Total
DL-only PL-only PL-DL
DL = deterministic linkage
PL = probabilistic linkage
PL-DL = probabilistic linkage-deterministic linkage (some of the pairs in the group were identified by the probabilistic linkage only, while others were identified by the deterministic linkage)
Source: Statistics Canada, 2021 Census Overcoverage Study.
Group 4,822 1,635,296 86,930 1,727,048
Neighbourhood 64 161,641 6,493 168,198
Pair 345,243 5,931,084 0 6,276,327
Total 350,129 7,728,021 93,423 8,171,573

8.4 Sample design

The first level of stratification was by linkage process type, resulting in three strata:

  • Stratum 1 consisted of DL pairs and groups or neighbourhoods made up of DL pairs only. This was treated as a take-all stratum, and sampling units in this stratum were classified as definite pairs of duplicate persons.
  • Stratum 2 consisted of PL pairs and groups or neighbourhoods that contained only PL pairs. A probabilistic sample was drawn from this stratum, and the pairs were sent for manual verification.
  • Stratum 3 consisted of groups or neighbourhoods that had a combination of PL and DL pairs. This stratum was further divided into two substrata. The first substratum was composed of groups and neighbourhoods that contained at least one DL pair that was sampled as part of the DL verification sample used to confirm the quality of these pairs. This substratum was treated as take-all. The second substratum was composed of groups and neighbourhoods that did not contain any DL pairs that were part of the DL verification sample. It had a probabilistic sample of PLDL groups or neighbourhoods drawn from it. PL pairs in groups with DL pairs belonging to the first substratum were sent for manual verification, along with the PL and DL pairs selected from the second substratum.

The targeted sample size was approximately 55,000 pairs from the PL-only stratum and around 4,500 pairs from the PLDL stratum. In this section, intraprovincial means all RDB records in a sampling unit are from the same province or territory, and interprovincial means RDB records in a sampling unit are from more than one province or territory. Tables in this section present counts of pairs whether the sampling unit is a pair, group or neighbourhood. Groups and neighbourhoods are broken down into their constituent pairs to derive the count of pairs. For simplicity, sampled pairs were sent for manual verification rather than groups of records.

For the PL-only stratum, the sampling unit type substrata were further stratified into 14 strata: 13 provincial strata containing sampling units (pairs or interconnected record groups or neighbourhoods) where all of the records belong to the same province or territory, and an interprovincial stratum where the sampling units have records from different provinces or territories. As in 2016, the interprovincial units may be groups that also contain some intraprovincial pairs. This was unavoidable when using interconnected record groups. To better control the sample size, the group and neighbourhood sampling units were further stratified by the number of pairs in the group. Finally, the sampling units were sorted by the estimated overcoverage propensity in the case of groups or neighbourhoods and by their conditional match probabilities in the case of pairs, and a systematic sample was then drawn.Note 2

For the first PLDL substratum, the DL pairs that were part of the verification sample had already been verified and so were not sent for manual verification. This was advantageous and allowed for a larger sample in the PLDL substratum with at least one DL pair in the verification sample. The PLDL groups for which none of the DL pairs were part of the verification sample were further stratified into 14 strata: 13 intraprovincial strata and an interprovincial stratum. As with the PL-only stratum, these 14 substrata were further stratified by the number of links to better control the sample size. The sampling units were then sorted by the estimated overcoverage propensity, and a systematic sample was drawn.

8.4.1 Deterministic linkage-only stratum

As mentioned above, the DL-only pairs and groups or neighbourhoods were considered definite matches and were not sent for manual verification. As shown in Table 8.3.2, there were fewer interconnected record groups among the DL pairs than among the PL pairs. In Table 8.4.1.1, which shows the breakdown of DL-only pairs by province or territory and interprovincial pairs, there were also fewer interprovincial DL-only pairs than interprovincial PL-only pairs (1.39% from Table 8.4.1.1 versus 18.34% from Table 8.4.2.2). This was what would be expected for pairs that were true matches.

Table 8.4.1.1
Frequency of deterministic linkage-only pairs by province or territory and interprovincial strata Table summary
This table displays the results of Frequency of deterministic linkage-only pairs by province or territory and interprovincial strata. The information is grouped by Provinces and territories (appearing as row headers), , calculated using (appearing as column headers).
Provinces and territories Frequency Percent
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 5,133 1.42
Prince Edward Island 1,678 0.47
Nova Scotia 9,145 2.54
New Brunswick 8,230 2.28
Quebec 77,599 21.54
Ontario 122,913 34.12
Manitoba 11,908 3.31
Saskatchewan 13,159 3.65
Alberta 36,290 10.07
British Columbia 67,876 18.84
Yukon 463 0.13
Northwest Territories 494 0.14
Nunavut 391 0.11
Interprovincial 5,001 1.39

8.4.2 Probabilistic linkage-only stratum

Table 8.4.2.1 shows the number of pairs for each sampling unit type and an estimate of the number of sampling units needed to obtain approximately that many pairs in the sample. The allocation to pairs and groups or neighbourhoods was proportional to size.

Table 8.4.2.1
Frequency of pairs, sampling units and sample sizes by sampling unit type Table summary
This table displays the results of Frequency of pairs, sampling units and sample sizes by sampling unit type. The information is grouped by Sampling unit types (appearing as row headers), , calculated using (appearing as column headers).
Sampling unit types Number of pairs Number of sampling units Sample size
(in terms of pairs)
Percent of sample
(in terms of pairs)
Sample size
(in terms of sampling units)
Source: Statistics Canada, 2021 Census Overcoverage Study.
Group or neighbourhood 6,411,761 1,796,937 28,110 52 9,599
Pair 5,931,084 5,931,084 25,920 48 25,920
Total 12,342,845 7,728,021 54,030 100 35,519

Probabilistic linkage-only pairs

The PL-only pairs were first stratified by intraprovincial and interprovincial pairs. Table 8.4.2.2 below gives the breakdown of intra- and interprovincial pairs among the PL-only pairs. Sample allocation to the intra- and interprovincial substrata was proportional to size.

Table 8.4.2.2
Frequency of intraprovincial and interprovincial pairs among probabilistic linkage-only pairs and sample sizes Table summary
This table displays the results of Frequency of intraprovincial and interprovincial pairs among probabilistic linkage-only pairs and sample sizes. The information is grouped by Types of pairs (appearing as row headers), , calculated using (appearing as column headers).
Types of pairs Frequency of pairs Percent Number of sampled pairs
Source: Statistics Canada, 2021 Census Overcoverage Study.
Intraprovincial 4,843,438 81.66 22,004
Interprovincial 1,087,646 18.34 4,753

Within the intraprovincial pair stratum, a power allocation was used to allocate the PL-only pairs across provinces, with the measure of size taken to be the number of pairs in each province and q = ½. The pairs were then sorted by their conditional match probabilities, and a systematic sample was drawn. Note that the three territories were take-all. Table 8.4.2.3 shows the allocation of PL-only intraprovincial pairs by province or territory.

Table 8.4.2.3
Probabilistic linkage-only intraprovincial pairs sample allocation by province or territory Table summary
This table displays the results of Probabilistic linkage-only intraprovincial pairs sample allocation by province or territory. The information is grouped by Provinces and territories (appearing as row headers), , calculated using (appearing as column headers).
Provinces and territories Frequency
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 568
Prince Edward Island 300
Nova Scotia 760
New Brunswick 748
Quebec 6,915
Ontario 5,600
Manitoba 730
Saskatchewan 701
Alberta 1,740
British Columbia 2,655
Yukon 383
Northwest Territories 448
Nunavut 456
Total sample size 22,004

The PL-only interprovincial pairs were further stratified by unique province combination and allocated proportional to size. There were 78 unique province combinations among the interprovincial pairs. Within the provincial combination substrata, pairs were sorted by their conditional match probabilities, and systematic sampling was used to draw the sample.

Probabilistic linkage-only groups and neighbourhoods

For the groups and neighbourhoods, the pairs were first stratified by intraprovincial and interprovincial groups. A group was considered interprovincial if it contained at least one interprovincial pair. Table 8.4.2.4 shows the breakdown of intra- and interprovincial groups in the PL-only stratum. The sample was allocated proportional to size between the intra- and interprovincial strata.

Table 8.4.2.4
Frequency of intraprovincial and interprovincial groups or neighbourhoods and sample sizes Table summary
This table displays the results of Frequency of intraprovincial and interprovincial groups or neighbourhoods and sample sizes. The information is grouped by Group types (appearing as row headers), , calculated using (appearing as column headers).
Group types Frequency of pairs Percent Number of sampled pairs Number of sampled groups
Source: Statistics Canada, 2021 Census Overcoverage Study.
Intraprovincial 4,070,750 63.49 17,970 6,656
Interprovincial 2,341,011 36.51 10,140 3,022

Within the intraprovincial stratum, groups were allocated to provinces using a power allocation. Table 8.4.2.5 shows the allocation of PL-only intraprovincial sampling units by province or territory. As there were so few sampling units in the territories, these substrata were take-all. To better control the final sample size, the provincial strata were further stratified by group size in terms of the number of pairs in the group. The sample within each provincial stratum was allocated among group sizes proportional to the size. A minimum of one sampling unit was sampled within each stratum.

Table 8.4.2.5
Allocation of probabilistic linkage-only intraprovincial sampling units by province or territory Table summary
This table displays the results of Allocation of probabilistic linkage-only intraprovincial sampling units by province or territory. The information is grouped by Group levels (provinces and territories) (appearing as row headers), , calculated using (appearing as column headers).
Group levels (provinces and territories) Number of sampled pairs Number of sampled groups
Note: The three territories are take-all strata.
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 270 112
Prince Edward Island 100 43
Nova Scotia 439 187
New Brunswick 450 184
Quebec 7,594 2,514
Ontario 4,949 1,899
Manitoba 385 162
Saskatchewan 342 143
Alberta 1,119 467
British Columbia 2,123 866
Yukon 54 24
Northwest Territories 68 27
Nunavut 64 28
Total sample size 17,957 6,656

The interprovincial group and neighbourhood stratum was divided into two substrata: those with a majority province or territory (i.e., a province or territory to which most pairs in the group belong) and those without a majority province or territory (i.e., the pairs within the group are split evenly among the provinces or territories involved). The breakdown of pairs by majority and no majority groups and neighbourhoods is given in Table 8.4.2.6.

Table 8.4.2.6
Frequency of pairs within probabilistic linkage-only interprovincial groups or neighbourhoods by group with a majority province or territory or group without a majority province or territory, and sample sizes Table summary
This table displays the results of Frequency of pairs within probabilistic linkage-only interprovincial groups or neighbourhoods by group with a majority province or territory or group without a majority province or territory, and sample sizes. The information is grouped by Group types (appearing as row headers), , calculated using (appearing as column headers).
Group types Frequency of pairs Percent Number of sampled pairs Number of sampled groups
Source: Statistics Canada, 2021 Census Overcoverage Study.
With a majority province or territory 515,949 90 9,099 2,702
Without a majority province or territory 57,923 10 1,041 422

Interprovincial groups and neighbourhoods with a majority province or territory were further stratified by dominant province or territory in the group and allocated using a power allocation. The sampling units within the provincial substrata were then stratified by the number of pairs in the groups. Allocation to group size was proportional to size. The sampling units were then sorted by expected overcoverage in the group and the proportion of intraprovincial pairs in the group, and a systematic sample was drawn. Because there were only 102 groups with a majority territory, these strata were take-all. A minimum of at least four sampling units were drawn from the other strata. Table 8.4.2.7 shows the allocation of PL-only interprovincial sampling units with a majority province or territory by majority province or territory.

Table 8.4.2.7
Allocation of probabilistic linkage-only interprovincial sampling units with a majority province or territory by majority province or territory Table summary
This table displays the results of Allocation of probabilistic linkage-only interprovincial sampling units with a majority province or territory by majority province or territory. The information is grouped by Group levels (provinces and territories) (appearing as row headers), , calculated using (appearing as column headers).
Group levels (provinces and territories) Number of sampled pairs Number of sampled groups
Note: The three territories are take-all strata.
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 282 91
Prince Edward Island 127 37
Nova Scotia 481 156
New Brunswick 433 144
Quebec 2,000 540
Ontario 2,593 685
Manitoba 346 108
Saskatchewan 264 99
Alberta 898 292
British Columbia 1,399 448
Yukon 129 49
Northwest Territories 102 39
Nunavut 36 14
Total sample size 9,090 2,702

Groups with no majority province or territory were stratified by group size, and the sample was allocated proportional to size. Table 8.4.2.8 shows the allocation of PL-only interprovincial groups with no dominant province or territory by group size.

Table 8.4.2.8
Allocation of probabilistic linkage-only interprovincial groups with no dominant province or territory by number of pairs in groups Table summary
This table displays the results of Allocation of probabilistic linkage-only interprovincial groups with no dominant province or territory by number of pairs in groups. The information is grouped by Number of pairs (appearing as row headers), , calculated using (appearing as column headers).
Number of pairs Number of sampled pairs Number of sampled groups
Source: Statistics Canada, 2021 Census Overcoverage Study.
2 548 274
3 372 124
4 36 9
5 55 11
6 6 1
7 7 1
8 8 1
9 9 1
Total sample size 1,041 422

8.4.3 Probabilistic linkage–deterministic linkage stratum

The breakdown of PL pairs and DL pairs in the PLDL groups and neighbourhoods is shown in Table 8.4.3.1.

Table 8.4.3.1
Probabilistic linkage pairs and deterministic linkage pairs in the probabilistic linkage-deterministic linkage groups and neighbourhoods Table summary
This table displays the results of Probabilistic linkage pairs and deterministic linkage pairs in the probabilistic linkage-deterministic linkage groups and neighbourhoods. The information is grouped by Linked by (appearing as row headers), , calculated using (appearing as column headers).
Linked by Frequency Percent
PL = probabilistic linkage
DL = deterministic linkage
Note: The term “probabilistic linkage-deterministic linkage” means some of the pairs in the group were identified by the probabilistic linkage only, while others were identified by the deterministic linkage.
Source: Statistics Canada, 2021 Census Overcoverage Study.
PL 250,270 70.64
DL 104,020 29.36

As previously mentioned, a sample of DL pairs was drawn during the DL step and sent for manual verification to evaluate the quality of DL pairs and ensure that all DL pairs could be classified as definite pairs of duplicate persons. This sample was referred to as the DL verification sample. To use the DL verification sample, groups to which these sampled pairs belonged were treated as take-all strata, and the corresponding PL pairs, and any corresponding DL pairs not part of the DL verification sample, were sent for manual verification.

There were 1,010 sampled DL pairs among the pairs in the PLDL interconnected record groups. These pairs belonged to 929 groups. The breakdown of PL and DL pairs among these 929 groups is shown in Table 8.4.3.2.

Table 8.4.3.2
Probabilistic linkage pairs and deterministic linkage pairs among the 929 probabilistic linkage-deterministic linkage groups that contained deterministic linkage pairs that were part of the deterministic linkage verification sample Table summary
This table displays the results of Probabilistic linkage pairs and deterministic linkage pairs among the 929 probabilistic linkage-deterministic linkage groups that contained deterministic linkage pairs that were part of the deterministic linkage verification sample. The information is grouped by Linked by (appearing as row headers), , calculated using (appearing as column headers).
Linked by Frequency Percent
PL = probabilistic linkage
DL = deterministic linkage
Note: The term “probabilistic linkage-deterministic linkage” means some of the pairs in the group were identified by the probabilistic linkage only, while the deterministic linkage identified others.
Source: Statistics Canada, 2021 Census Overcoverage Study.
PL 2,553 71.31
DL 1,027 28.69

There were 17 DL pairs and 2,553 PL pairs sent for manual verification. The 1,010 DL pairs that were part of the DL verification sample had already been verified. Hence, they were not sent for manual verification.

An additional sample of 533 groups (1,930 pairs) was selected from the PLDL stratum. The PLDL stratum was stratified by group-level province or territory and group size, and the sample was selected so that the full PLDL sample was approximately proportional to size. The pair-level provincial breakdown of the full PLDL sample is given in Table 8.4.3.3.

Table 8.4.3.3
Breakdown of pairs in the probabilistic linkage-deterministic linkage sample by pair-level province or territory Table summary
This table displays the results of Breakdown of pairs in the probabilistic linkage-deterministic linkage sample by pair-level province or territory. The information is grouped by Provinces and territories (appearing as row headers), , calculated using (appearing as column headers).
Provinces and territories Frequency Percent
Note: The term “probabilistic linkage-deterministic linkage” means some of the pairs in the group were identified by the probabilistic linkage only, while others were identified by the deterministic linkage.
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 62 1.13
Prince Edward Island 32 0.58
Nova Scotia 87 1.58
New Brunswick 105 1.91
Quebec 1,583 28.76
Ontario 1,683 30.57
Manitoba 91 1.65
Saskatchewan 83 1.51
Alberta 311 5.65
British Columbia 714 12.97
Yukon 15 0.27
Northwest Territories 15 0.27
Nunavut 9 0.16
Interprovincial 715 12.99
Total 5,505 100.00

The provincial strata were further stratified by number of links, and a systematic sample was drawn.

8.4.4 Final sample sizes (by pairs)

Table 8.4.4.1 below shows the final sample sizes for the PL-only and PLDL strata that were sent for manual verification. The DL-only stratum consisted of 360,280 pairs that were classified as definite pairs of duplicate persons.

Table 8.4.4.1
Final sample sizes for probabilistic linkage-only and probabilistic linkage-deterministic linkage strata sent for manual verification Table summary
This table displays the results of Final sample sizes for probabilistic linkage-only and probabilistic linkage-deterministic linkage strata sent for manual verification. The information is grouped by Strata (appearing as row headers), , calculated using (appearing as column headers).
Strata Number of pairs by stratum Number of sampled pairs sent for manual verification (after de-duplication for overlapping neighbourhoods)
PL = probabilistic linkage
DL = deterministic linkage
PL-DL = probabilistic linkage-deterministic linkage (some of the pairs in the group were identified by the probabilistic linkage only, while others were identified by the deterministic linkage)
Source: Statistics Canada, 2021 Census Overcoverage Study.
PL-DL (without 1,010 DL pairs that were part of the DL verification sample) 92,494 4,495
PL-only interprovincial pairs 1,087,646 4,753
PL-only intraprovincial pairs 4,844,635 22,004
PL-only intraterritorial groups (take-all) 4,070,750 186
PL-only intraprovincial groups 484 18,153
PL-only interterritorial groups with a majority territory (take-all) 266 266
PL-only interprovincial groups with a majority province 2,184,665 8,815
PL-only interprovincial groups with no majority province or territory 156,079 1,040
Total size 12,437,019 59,712

8.5 Manual verification operation

The manual verification operation was a clerical operation and had several objectives:

  • independently verify sampled pairs to determine whether they are overcoverage
  • review the household members associated with the sampled pairs to potentially identify additional cases of overcoverage not on the COS frame
  • code the potential cause of the overcoverage (i.e., overcoverage scenario).

Manual verification was done pair by pair. When a group or neighbourhood was sampled, all of the pairs that it contained were examined manually. However, coders were not provided the grouping information for the pairs in groups and neighbourhoods. Each pair was verified on its own. The pairs were examined only once, even if they belonged to more than one sampled neighbourhood.

The manual verification process consisted of a comprehensive examination of all available information on the RDB. As in 2016, it consisted of the following steps:

  1. comparing the sampled RDB persons based on the names, sex, birth date and relationships, as well as some additional information added in 2021
  2. comparing the RDB household members based on the same criteria
  3. weighing the evidence for or against overcoverage between two RDB persons and between two RDB households
  4. determining the overcoverage scenario if there was overcoverage (Table 8.5.1 provides a list of overcoverage scenario codes and their description).
Table 8.5.1
Overcoverage scenario codes Table summary
This table displays the results of Overcoverage scenario codes. The information is grouped by Codes (appearing as row headers), , calculated using (appearing as column headers).
Codes Description
FRAME_ID = unique household identifier
Source: Statistics Canada, 2021 Census Overcoverage Study.
1.1 Two different FRAME_IDs for the same household; same or similar address
1.2 Two different FRAME_IDs for the same household; different address
2.1 Child of parents in separate households
2.2 Child (age 0 to 17) with other relative(s)
2.3 Child (age 0 to 17) with other unrelated adult(s)
3.1 Student or young adult (age 18 to 24) newly away from home
3.2 Young adult (age 18 to 24) entering or leaving married or common law relationship
3.3 Young adult (age 18 to 24) with other relative(s)
3.4 Young adult (age 18 to 24) with other unrelated adult(s)
4.1 Adult (age 25 or older) newly away from home
4.2 Adult (age 25 or older) entering or leaving married or common law relationship
4.3 Adult (age 25 or older) with other relative(s)
4.4 Adult (age 25 or older) with other unrelated adult(s)
5.1 One household not a private dwelling
6.1 Intrahousehold overcoverage (same Frame_ID)
7.1 Other

The sample was divided into batches of 500 household pairs (household A, household B). Each batch was assigned to a clerk (verifier), who examined and decided whether the selected person of household A was duplicated (overcovered) with the selected person of household B for each household pair in the batch. A selected pair of records was the sampled pair of interest. Furthermore, the verifier identified additional pairs of duplicate persons (if any) from each household pair and within each household.

When verifiers were uncertain of how to code a case, they were instructed to refer it to their supervisor, who in turn consulted with the Data Quality (DQ) team (a team of subject matter experts in the Coverage Measurement Section of the Statistical Integration Methods Division) or referred the case to the DQ team. In 2021, some complex sampled pairs were sent directly to the DQ team to verify. Complex sampled cases included

  • intra-household cases, that is, when the pair is from within a single household (for example, the same person is listed twice)
  • one-person households (when the pair comes from two different households, each of which has a household size of 1).

Experience from past cycles showed that these complex sampled cases required the expertise of the DQ team to code them properly. The DQ team was also able to consult additional sources of information to help make an accurate decision, such as consulting the current and/or past census cycle’s questionnaire data and using linkages conducted by the SDLE team at Statistics Canada. All sampled cases had to be coded with certainty, as no non-response was permitted.

Confidence in the coded results was required for the manual operation since the results directly contributed to the estimate of overcoverage. Thus, a 100% verification was implemented. This means two different verifiers coded the same batch. Once a batch had been coded by two different verifiers, their results were compared. All coded fields were compared. If any of the coding did not match, then the case was sent to the DQ team to make an informed decision. The 100% verification strategy ensured high-quality coded results, and continuous feedback was also provided to the clerks throughout the manual verification operation.

8.6 Weighting and estimation

8.6.1 Weighting

The initial weight of a sampling unit was simply the inverse of its selection probability. The sampling units that were groups and neighbourhoods varied in terms of the number of pairs they contained. These units were stratified by the number of pairs during sampling to better control the final sample size. However, for the interprovincial groups and neighbourhoods, the weighted provincial or territorial counts may have differed from what was on the frame. Therefore, a calibration step was added to ensure correct representation of the number of pairs in each province and territory. The sampling weights of the interprovincial groups and neighbourhoods were calibrated so that the estimated number of intraprovincial and interprovincial pairs in each province and territory matched the corresponding frame counts. Statistics Canada’s Generalized Estimation System (G-EST) was used to perform the calibration. Table 8.6.1.1 shows the calibration factors for each province and territory.

Table 8.6.1.1
Average calibration factor (ratio of frame total to weighted estimate) by stratum and type of pair for intraprovincial and interprovincial groups and neighbourhoods Table summary
This table displays the results of Average calibration factor (ratio of frame total to weighted estimate) by stratum and type of pair for intraprovincial and interprovincial groups and neighbourhoods. The information is grouped by Provinces and territories (appearing as row headers), , calculated using (appearing as column headers).
Provinces and territories Intraprovincial Interprovincial
Source: Statistics Canada, 2021 Census Overcoverage Study.
Newfoundland and Labrador 0.76 0.74
Prince Edward Island 0.69 1.30
Nova Scotia 1.01 0.96
New Brunswick 0.99 0.88
Quebec 0.98 1.08
Ontario 0.99 0.99
Manitoba 1.36 1.07
Saskatchewan 1.13 0.90
Alberta 1.08 1.08
British Columbia 1.07 0.97
Yukon 1.47 1.21
Northwest Territories 1.40 0.42
Nunavut 1.22 2.80

During the manual verification operation, verifiers identified cases of overcoverage in the households of sampled pairs that were not covered by the COS frame, and these pairs of duplicate persons were referred to as additional pairs of overcoverage found during manual verification. This occurred when the differences between the two records were too great for the pair to have been captured by the linkage processes. For example, if there were multiple typos, errors or too many differences in the fields used during the linkage process, the overcoverage pair was not on the COS frame.

This situation is illustrated in Figure 1 below. The oval with a blue outline represents the COS frame, while the oval with a green outline represents the target frame, which includes a small number of pairs that could not be captured with the linkage processes (i.e., the unobserved part of the target frame). The solid yellow oval represents the selected sample, which includes sampled person pairs, while the solid red oval represents the verified sample, which includes sampled person pairs and their household members. There are no weights directly associated with those pairs in the solid red oval that fall outside the COS frame (i.e., a small portion of the solid red oval falls in the unobserved portion of the target frame). The Generalized Weight Share Method (GWSM) (Lavallée, P. 2007) was used to assign weights from the weights of sampled pairs, through which these were indirectly sampled. Hence, all the additional pairs of overcoverage found during manual verification had a weight derived for them, and they were added to the sample for the purpose of estimation. This replaced the adjustment based on the AMS, which took into account overcoverage measured by the AMS outside the COS frame. This had been used since the 2006 COS.

Figure 1. Illustration of selected sample, verified sample, Census Overcoverage Study  frame and target frame

Description for Figure 1

This figure consists of four ovals. An oval with a green outline is the largest and represents the target frame = Census Overcoverage Study frame + unobserved part. An oval with a blue outline is within the oval with a green outline and represents the Census Overcoverage Study frame. A solid red oval represents the verified sample and is situated fully within the oval with a green outline, with a small part of it outside the oval with a blue outline (i.e., a small portion of the solid red oval falls in the unobserved portion of the target frame). A solid yellow oval represents the selected sample and is situated fully inside the solid red oval, the oval with a blue outline and the oval with a green outline.

Source: Statistics Canada, 2021 Census Overcoverage Study.

There were some limitations associated with the way additional pairs of overcoverage were identified. Duplicated single-person households or duplicated persons whose other household members have nothing in common within the unobserved part of the target frame would not be captured by manually verifying all household members of a sampled pair. Thus, it is acknowledged that the 2021 COS may still not represent the entire target frame of duplicate persons in the census. This would have also been the case when using the AMS to adjust the COS in previous cycles. However, the unobserved portion of the target frame is expected to be extremely small.

8.6.2 Estimation

The results from the manual verification operation were processed to create overcoverage groups that were used for estimation. Overcoverage groups consisted of all RDB records that were linked together by verified overcoverage. The COS estimates were based on the sum of the overcoverage estimate counted in each overcoverage group. For an overcoverage group that was a pair, the overcoverage count was simply 1. If the overcoverage group was contained within a small group of records (i.e., a group not broken into neighbourhoods), then:

Overcoverage = number of records in overcoverage group – 1.

For overcoverage groups broken down into neighbourhoods, overcoverage was counted in the following two steps:

  1. Calculate overcoverage in each neighbourhood whose anchor (i.e., the RDB record acting as the centre of the neighbourhood) was involved in verified overcoverage for that overcoverage group as follows:

Overcoverage in the neighbourhood =  ( numberofrecordsbelongingtotheovercoveragegroup-1 ) numberofrecordsbelongingtotheovercoveragegroup MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaada qadaqaaiaab6gacaqG1bGaaeyBaiaabkgacaqGLbGaaeOCaiaaysW7 caqGVbGaaeOzaiaaysW7caqGYbGaaeyzaiaabogacaqGVbGaaeOCai aabsgacaqGZbGaaGjbVlaabkgacaqGLbGaaeiBaiaab+gacaqGUbGa ae4zaiaabMgacaqGUbGaae4zaiaaysW7caqG0bGaae4BaiaaysW7ca qG0bGaaeiAaiaabwgacaaMe8Uaae4BaiaabAhacaqGLbGaaCjaVlaa bkhacaqGJbGaaCjaVlaab+gacaqG2bGaaeyzaiaabkhacaqGHbGaae 4zaiaabwgacaaMe8Uaae4zaiaabkhacaqGVbGaaeyDaiaabchacaqG TaGaaeymaaGaayjkaiaawMcaaaqaaiaab6gacaqG1bGaaeyBaiaabk gacaqGLbGaaeOCaiaaysW7caqGVbGaaeOzaiaaysW7caqGYbGaaeyz aiaabogacaqGVbGaaeOCaiaabsgacaqGZbGaaGjbVlaabkgacaqGLb GaaeiBaiaab+gacaqGUbGaae4zaiaabMgacaqGUbGaae4zaiaaysW7 caqG0bGaae4BaiaaysW7caqG0bGaaeiAaiaabwgacaaMe8Uaae4Bai aabAhacaqGLbGaaCjaVlaabkhacaqGJbGaaCjaVlaab+gacaqG2bGa aeyzaiaabkhacaqGHbGaae4zaiaabwgacaaMe8Uaae4zaiaabkhaca qGVbGaaeyDaiaabchaaaaaaa@AA9A@

  1. Add up the neighbourhood overcoverage to obtain the total overcoverage in the overcoverage group.

Domain overcoverage was obtained by prorating the total pair, group or neighbourhood overcoverage by the proportion of RDB records in the given domain among those that belonged to the overcoverage group.

For interprovincial groups and neighbourhoods, the overcoverage calculated for a unit was multiplied by the calibrated weight to obtain the weighted estimate. Additional pairs of overcoverage found during manual verification were multiplied by their derived sampling weight from the use of the GWSM, to obtain the weighted estimate. Otherwise, the overcoverage calculated for a unit was multiplied by its initial sampling weight to obtain the weighted estimate. The variance of the estimate was calculated using G-EST.

8.7 Results

The 2021 COS estimated that 755,635 persons were enumerated more than once in the 2021 Census of Population. The results were examined by each of the components that led to the construction of the sampling frame and its contribution to the overall estimation of census overcoverage. Potential reasons why persons were counted more than once in the census were also examined.

8.7.1 Overcoverage by component

Each case of overcoverage (definite or manually verified) was characterized by the COS components that identified the pairs in its sampling unit. They are of four types:

  • DL-only: all the pairs in the overcoverage group were identified by the DL
  • PL-only: all the pairs in the overcoverage group were identified only by the PL
  • PLDL: some of the pairs in the group were identified by the PL only, while others were identified by the DL
  • overcoverage manual verification (OCMV): all the pairs in the overcoverage group were additional pairs of duplicate persons found during manual verification that were not on the COS sampling frame and for which an indirect sampling weight was derived using the GWSM.

It is important to remember that pairs identified by both the PL and DL steps were classified as DL, so the “DL-only” category includes all the groups that are made up only of pairs that were identified by the DL, even though some of those same pairs could also have been identified by the PL.

Table 8.7.1.1 presents the number of overcoverage cases estimated by each of the COS components, as well as the percentage of the total estimated overcoverage that it represented, for Canada, as well as by province or territory.

Table 8.7.1.1
Contribution of each 2021 Census Overcoverage Study component to the total estimated overcoverage for each province and territory Table summary
This table displays the results of Contribution of each 2021 Census Overcoverage Study component to the total estimated overcoverage for each province and territory. The information is grouped by Provinces and territories (appearing as row headers), PL-DL , PL-only, Total, OCMV and DL-only, calculated using standard error, % of total and estimated number units of measure (appearing as column headers).
Provinces and territories DL-only PL-only PL-DL OCMV Total
Estimated number % of total Estimated number % of total Estimated number % of total Estimated number % of total Estimated number Standard error
DL-only = all the pairs in the overcoverage group were identified by the deterministic linkage
PL-only = all the pairs in the overcoverage group were identified only by the probabilistic linkage
PL-DL = some of the pairs in the group were identified by the probabilistic linkage only, while others were identified by the deterministic linkage
OCMV = all the pairs in the overcoverage group were additional pairs of duplicate persons found during manual verification that were not on the Census Overcoverage Study sampling frame and for which an indirect sampling weight was derived using the generalized weight share method
Note: Coverage estimates may not necessarily add up to the totals because of rounding.
Source: Statistics Canada, 2021 Census Overcoverage Study.
Canada 352,059 46.6 318,459 42.1 81,172 10.7 3,946 0.5 755,635 9,648
Newfoundland and Labrador 5,148 50.5 4,664 45.8 382 3.7 0 0.0 10,194 439
Prince Edward Island 1,678 51.0 1,284 39.0 311 9.5 16 0.5 3,289 191
Nova Scotia 9,200 47.6 8,412 43.5 1,639 8.5 94 0.5 19,344 736
New Brunswick 8,079 48.8 6,890 41.7 1,440 8.7 132 0.8 16,541 641
Quebec 76,760 42.1 80,126 43.9 25,242 13.8 385 0.2 182,513 5,915
Ontario 120,765 44.7 118,619 43.9 29,382 10.9 1,334 0.5 270,100 6,888
Manitoba 11,930 51.5 10,231 44.2 970 4.2 29 0.1 23,160 757
Saskatchewan 13,210 54.7 9,501 39.3 1,194 4.9 258 1.1 24,163 689
Alberta 36,902 47.3 35,085 44.9 5,527 7.1 570 0.7 78,084 2,736
British Columbia 66,976 53.2 42,762 34.0 15,017 11.9 1,078 0.9 125,832 2,778
Yukon 479 57.9 315 38.1 21 2.5 12 1.4 827 38
Northwest Territories 508 60.6 293 35.0 25 3.0 12 1.4 837 15
Nunavut 423 56.3 277 36.9 23 3.1 28 3.7 751 17

At the national level, the DL-only and PL-only components represented 46.6% and 42.1%, respectively, of the total estimate of overcoverage, while the PLDL component represented 10.7%, and the OCMV component accounted for 0.5%.

The DL-only contribution to the total provincial or territorial estimate was higher for the Northwest Territories (60.6%) and Yukon (57.9%) and lower for Ontario (44.7%) and Quebec (42.1%). The PL-only contribution to the total provincial or territorial estimate was higher for Newfoundland and Labrador (45.8%) and Alberta (44.9%) and lower for the Northwest Territories (35.0%) and British Columbia (34.0%). As for the PLDL component, its contribution was higher in Quebec (13.8%) and British Columbia (11.9%) and lower for the three territories (ranging from 2.5% to 3.1%). Lastly, for the OCMV component, its contribution was higher for the three territories (ranging from 1.4% to 3.7%) and lower for Manitoba (0.1%) and Newfoundland and Labrador (0.0%), where no additional pairs of duplicate persons were identified during the manual verification operation that were not already on the COS sampling frame.

8.7.2 Overcoverage by scenario

Table 8.7.2.1 shows the estimated overcoverage by potential reason why the overcoverage occurred, called the overcoverage scenario, at the national and provincial and territorial levels for 2021. It is important to mention that these results are not comparable to the 2016 overcoverage results by scenario for two reasons:

  • The overcoverage scenario was coded during the manual verification operation. Since the DL-only pairs were considered as definite pairs of duplicate persons without manual verification, an overcoverage scenario is not available for those pairs.
  • The codes used for the scenarios were modified for the 2021 cycle to improve the consistency of the coding and the usefulness of the results.

Excluding the DL-only cases, almost 25% of all overcoverage at the national level is between two identical households. This proportion is a little lower for Newfoundland and Labrador and higher for British Columbia.

When only overcoverage within non-identical households is considered and the DL-only cases are excluded again, the most frequent overcoverage scenario is a child enumerated by both parents in separate households, as was the case in 2016 and previous cycles. This is true for every province and territory, except for Nova Scotia and Nunavut. In Nova Scotia, the most frequent scenario was a student or young adult (age 18 to 24) newly away from home, while in Nunavut, it was a child (age 0 to 17) with other relative(s).

Table 8.7.2.1
Distribution of 2021 Census overcoverage by scenario for each province and territory Table summary
This table displays the results of Distribution of 2021 Census overcoverage by scenario for each province and territory. The information is grouped by Provinces and territories (appearing as row headers), 3.1, 3.4, 7.1, 8.1, One household not a private dwelling, Student or young adult (age 18 to 24) newly away from home, Intrahousehold overcoverage (same FRAME_ID), Other, 5.1, Deterministic linkage, 6.1, Young adult (age 18 to 24) with other unrelated adult(s), Adult (age 25 or older) with other relative(s), Child of parents in separate households, Child (age 0 to 17) with other unrelated adult(s), Overcoverage scenario, 3.2, Adult (age 25 or older) entering or leaving married or common law, Adult (age 25 or older) newly away from home, Young adult (age 18 to 24) entering married or common law, 4.4, 4.2, 1.1, 2.3, Child (age 0 to 17) with other relative(s), Young adult (age 18 to 24) with other relative(s), 2.2, Adult (age 25 or older) with other unrelated adult(s), 4.3, 2.1, 4.1, 3.3 and Identical households, calculated using percent units of measure (appearing as column headers).
Provinces and territories Overcoverage scenario
1.1 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 5.1 6.1 7.1 8.1
Identical households Child of parents in separate households Child (age 0 to 17) with other relative(s) Child (age 0 to 17) with other unrelated adult(s) Student or young adult (age 18 to 24) newly away from home Young adult (age 18 to 24) entering or leaving married or common law relationship Young adult (age 18 to 24) with other relative(s) Young adult (age 18 to 24) with other unrelated adult(s) Adult (age 25 or older) newly away from home Adult (age 25 or older) entering or leaving married or common law relationship Adult (age 25 or older) with other relative(s) Adult (age 25 or older) with other unrelated adult(s) One household not a private dwelling Intrahousehold overcoverage (same FRAME_ID) Other Deterministic linkage
percent
FRAME_ID = unique household identifier
Note: Overcoverage by scenario is estimated at the pair level rather than the group level hence there is a small difference in the percentages when compared to Table 8.7.1.1.
Source: Statistics Canada, 2021 Census Overcoverage Study.
Canada 12.5 11.3 0.8 0.3 5.8 1.2 0.5 0.7 3.6 3.5 3.9 1.6 2.4 0.6 3.2 48.2
Newfoundland and Labrador 9.5 11.5 1.2 0.0 7.7 2.5 0.3 0.0 1.6 4.9 3.6 1.0 3.0 0.3 3.3 49.8
Prince Edward Island 11.2 11.5 0.3 0.4 9.0 1.1 0.8 0.3 3.7 1.8 1.2 1.3 2.5 0.0 0.8 54.0
Nova Scotia 10.8 9.9 2.0 0.0 15.5 1.9 0.0 0.6 1.9 3.0 2.1 1.2 1.0 0.4 1.8 47.9
New Brunswick 11.0 10.5 1.7 0.5 6.4 3.4 0.4 0.4 2.8 4.5 2.7 0.4 2.2 0.0 3.1 50.0
Quebec 12.1 16.6 0.7 0.2 6.0 1.7 0.6 0.5 4.1 4.7 3.9 0.9 2.3 0.7 2.9 41.9
Ontario 13.0 10.6 0.5 0.1 5.4 0.6 0.2 0.9 4.1 3.0 4.9 1.6 2.3 0.7 3.0 49.2
Manitoba 10.5 8.8 2.4 0.5 5.1 1.7 1.1 0.9 3.2 3.5 2.4 1.8 4.1 0.3 2.5 51.2
Saskatchewan 9.3 11.1 1.5 0.8 4.0 0.6 1.5 1.1 2.4 1.5 3.7 1.7 3.1 0.2 2.8 54.6
Alberta 11.5 9.3 0.7 0.8 6.6 2.1 0.5 0.5 3.3 3.8 3.9 2.5 2.7 0.5 4.4 47.0
British Columbia 14.6 7.0 0.7 0.5 4.7 0.5 0.5 0.9 3.0 2.8 2.5 2.1 2.3 0.4 4.0 53.6
Yukon 9.2 13.1 0.4 0.2 4.0 0.8 0.8 0.3 0.8 3.1 1.9 1.4 1.4 0.1 3.0 59.5
Northwest Territories 10.8 8.2 2.1 1.0 1.4 0.7 0.6 0.5 0.9 2.7 2.9 2.7 1.8 0.5 1.7 61.4
Nunavut 13.6 4.5 8.8 1.2 1.6 1.1 2.7 0.7 1.2 0.8 4.8 1.2 3.2 0.4 2.3 51.7

Date modified: