Language databases

We grouped these into five categories including 1) The Census Programme (1901-), 2) Official Language Minorities Surveys, 3) Education Surveys, 4) Literacy Surveys, and 5) other surveys. The last group includes language-related intersections with immigrant, aboriginal, and economic topics. In a broader scope, we introduce the databases’ general characteristics, development, and efficient ways in which each can be exploited.

Census programme (1901-)

The Census Programme remains the single most important source spanning 115 years of language data. Needless to say, not all current language questions were included in the 1901 census. Instead, the seven current language questions were gradually phased in as the needs and interests of Canadians evolved overtime. These historical developments are traced in detail in Section Census Demolinguistic Questions. For our purposes here, however, we point out four key points related to the Census Programme: Its frequency, collection methods, sampling rate, and—more importantly for researchers— the availability of public aggregated and confidential data.

We would also like to note that the term Census Programme refers to the Census of Population from 1901 onwards as well as to the National Household Survey for the 2011 year. We exclude therefore the census of population prior to 1901 because they do not contain language questions. By contrast, we include the National Household Survey (2011) because it was used instead of the long form 2B for that census year. Moreover, in regards to the latter, users should keep in mind that, in contrast to the long form 2B of other years, the NHS (2011) is based on voluntary responses with sampling being set at 33% and a total response rate of 68% (Howatson-Leo & Trépanier, 2012).

In regards to the census of population, users should note that it is not one single survey but a group of slightly different questionnaires. More specifically, here's a list of all forms and they general intended purpose (Census of Population Questionnaires, 2016):

-Short Form 2A: Composed of about a dozen questions and sent out to all residents of all private dwellings.
-Short Form 3A: Sent to individuals who wish to be enumerated separately.
-Short Form 2C: Sent to Canadians residing abroad (e.g. Foreign Service, Canadian Forces, etc.)
-Long Form 2B (or in 2016 the 2A-L, 2A-R): Composed of about fifty questions and sent out to a sample of Canadian households.

Moving forward, whenever we mention the Census or Census Programme we mean specifically Form 2A, 2B, and NHS (2011) unless otherwise stated.

Census of Population Characteristics — Frequency, Methods, and Sampling Rates

In general, between 1971 and 2006, and again in 2016, the Census of Population has been composed of two simultaneous mandatory questionnaires: Short-form 2A and Long-form 2B (in 2016, the 2A-L and 2A-R). Normally, the short form schedule contains questions on the individual’s name, date of birth, age, sex, marital status, mother tongue, and relationship to the principal respondent. It is sent out to all Canadian households. Meanwhile, the long-form questionnaire is composed of, on average, fifty other questions and is sent out instead only to a sample of Canadian households. Every census cycle consultations on methodology and content are carried out to adjust the program as necessary given the data needs of Canadians. We would like to highlight here three areas of change and development: Frequency of collection, collection methods, and sampling rates.

One point to keep in mind is the fact that for more than 100 years the Census was carried out at ten-year intervals. Increasingly, the need for reliable and frequent data became more pronounced and the national requirement was changed to five years. The Statistics Act (1970) codified this new change and mandated its introduction beginning with the 1971 census (Howatson-Leo & Trépanier, 2012: 6).

Another point to note is the way in which collection was carried out. Prior to 1971, enumerators were sent out to all major urban population centers and many rural areas; they were individually charged with the actual collection of information. By contrast, since 1971, given rising literacy rates, this practice was changed to allow for self-enumeration for all but a few rural areas, Aboriginal reserves, and third wave non-respondents. Under this system, participants receive and send out their questionnaires through the postal service. This change was “expected to reduce the variance of census figures...while at the same time giving the respondent more time and privacy in which to answer the census questions—factors which might also be expected to yield more accurate responses.” (Benjamin, Hovington, and Bankier, 2001: 12). Fast-forward thirty-five years to 2006, we find that participants are now also offered the option of completing their questionnaires online (Howatson-Leo and Trépanier, 2012: 7).

Lastly, it is worth noting that while “sampling was first used in the Canadian census in 1941” (Benjamin, Hovington, and Bankier, 2001: 12) the sampling rate for the long-form 2B schedule has always been subject to some fluctuation. Specifically, in the 1971 and 1976 Census of Population, a 1 in 3 fraction was used (Royce, 2011: 49). Subsequently, from 1981 to 2006 the census used a 1 in 5 fraction (Royce, 2011: 49). Lastly, for the 2011 NHS a 1 in 3 fraction is used. More recently, the current 2016 Census has made use of a 1 in 4 fraction. In other words, sampling rates for the long-form 2B schedule have been variously set to 20%, 25%, or even 30% depending on the census year. See the table below for specific percentages from year to year.

Sampling rates
Census Year	Short Form 2A	Long Form 2B
1901	No Sampling	No Sampling
1911	No Sampling	No Sampling
1921	No Sampling	No Sampling
1931	No Sampling	No Sampling
1941	100% of Households (Population Schedule)	10% of Households¹ 27 Questions-Housing Schedule
1951	TBA	20% of Households² 24 Questions-Housing Schedule
1961	TBA	20% of Households³
1971	66% of Households^v	(1/3) 33% of Households⁴ 40 Questions ^a
1976	66% of Households^v	(1/3) 33% of Households⁵
1981	80% of Households*	(1/5) 20% of Households⁶
1986	80% of Households*	(1/5) 20% of Households⁷
1991	80% of Households* 9 Questions^b	(1/5) 20% of Households⁸ 53 Questions^b
1996	80% of Households*	(1/5) 20% of Households⁹
2001	80% of Households*	(1/5) 20% of Households¹⁰
2006	80% of Households^v 8 Questions	(1/5) 20% of Households¹¹ 61 Questions
2011	100% of Households^v	(1/3) 33% of Households¹² 66 Questions - NHS
2016	100% of Households^v	(1/4) 25% of Households^13v

* To be verified
v Verified

a Statistics Canada (1975) Methodology and Processing of the 1971 Census. In Statistics Canada (1975) 1971 Census of Canada, Public Use Sample Tapes User Documentation. p. 70.
b) University of Mcgill (1999) Census Questions Since Confederation. Electronic Data Resources Center.

1 Statistics Canada (2015) Sampling and Weighting Technical Report. National Household Survey, 2011(pdf, 728 KB) (Cat. No. 99-002-X2011001). p.48

2 Census of Population, 2016

Dataset Availability – National, Provincial, Territorial, and Metropolitan Aggregated Tables

Keeping the above characteristics in mind, here we outline where this data may be found. Specific data from 1991 up to 2011 may be found at the Statistics Canada’s page for the Census Program Datasets (1991-). This page hosts tables at the national, provincial, and metropolitan geographic levels in .html format.

For historical datasets prior to 1991 several options are available. One such option is to browse through the publication entitled Historical Statistics of Canada which contains 1,088 statistical tables on social, economic, and institutional conditions of Canada from the start of the Confederation in 1867 to the mid-1970s. Relevant topics in this series include population and immigration(pdf, 408 KB) (including mother tongue) and education statistics(pdf, 368 KB). Another option is to access the datasets compiled by the IPUMS North Atlantic Population Project. Data is hosted by the Minnesota Population Center and is available for the years 1852, 1871, 1881, 1891, and 1901—although for language data only the 1901 dataset is useful. Lastly, users may also wish to consult datasets found via Library Archives Canada for the years 1666-1921 (at various degrees of completeness). In general, however, it is important to note that census data for the 1931-1986 period is largely only found in print-form at Statistics Canada or partner libraries throughout the country.

Dataset Availability – Microdata Files

Access to confidential 20% microdata from 1911-2011 is only open to researchers through the Research Data Centers (RDC) program. To be granted access, the RDC program requires that a research proposal be submitted. Currently, completed applications take in the order of 8-10 weeks to review.

Public Use Microdata Files (PUMFs) found through ODESI which are made public under the Data Liberation Initiative (DLI) are accessible through any Canadian university’s library portal. You must be a registered student, researcher, or faculty member with a Canadian university to gain access. Moreover, please be aware that PUMF files are in fact abridged forms of microdata which employ either reduced sample sizes—2% in the case of NHS and Census data—or various data perturbation techniques that effectively hide personal identifiers. The net effect of these techniques is to retain confidentiality; however, it may also to some extent limit the potential uses of the data.

Official-language minorities

Another potential source, particularly for official language minorities, can be found in the form of the Survey on the Vitality of Official-Language Minorities (SVOLM; 2006). This survey is unrivaled in Canada for both its breadth and depth of coverage vis-à-vis language practices and trajectories, in the various spheres including home, public domain, and work.

In fact, the Survey on the Vitality of Official-Language Minorities is the only survey that specifically focuses on Canada’s official-language minorities. Namely, French-speaking persons outside Québec and English-speaking persons in Québec. Its origins go back to 2003 when the Official Language Branch of the Privy Council Office approached Statistics Canada about conducting a survey on the vitality of official language minorities. The survey was therefore conducted within the 2003-2008 Action Plan for Official Languages.

The goals of the survey were two-fold. First, the survey would collect information on, for example, official language minority community’s education, health, and justice issues among others. Second, it would produce information with a view to policy development and implementation. To this aim, topics in the survey include demographic, linguistic, cultural, and social information about language minority parents and children. For example, daycare and school attendance, access to health care, civic participation, volunteering and social support, geographic mobility, economic and income characteristics, identity, as well as language practices at home, at work, in public, and during leisure activities. In other words, it included a fairly comprehensive spread of socio-economic and language variables.

Of note is the fact that this survey comprises two universes. On the one hand, it surveys adults aged 18 and over, while on the other it surveys children under 18 years of age with at least one parent from the official minority language community. All in all, the final dataset contains 20,067 adults and 15,550 children. This data can be used to gain a deeper understanding of the current situation of individuals belonging to the official-language minority groups as well as their various trajectories.

Education

Users searching for data on second language education programs in Canada will note that this topic has been well represented since the 1990s. At the moment, one major survey captures information on second language education at the elementary and secondary levels. However, this survey is by no means the first. In fact, the Elementary-Secondary Education Survey (ESES) represents the latest harmonized iteration of several previous surveys on education that we outline below.

Indeed, beginning in 2003, Statistics Canada piloted the Elementary-Secondary Education Statistics Project (ESESP). The purpose of this project was to replace several other surveys that were collecting related information on school enrollments, graduations, expenditures, and staffing. Some of these surveys include the Elementary-Secondary School Enrollment Survey, the Minority and Second Language Education Survey, the Secondary School Graduate Survey, the Elementary-Second Education Staff Survey, and the Survey of Uniform Financial System - School Boards.

Initial feedback was positive and in 2010 the survey became a permanent multi-year census that we now call the Elementary-Secondary Education Survey (ESES). Since it collects administrative data from every public, private, and home-school program in the country it is, in fact, a census of education where no sampling is done. The goal of this census is therefore to produce relevant, comparable, and timely statistics, and to reduce the otherwise unwieldy response burden on education organizations and school principals. Users may find this census particularly useful for its content on core, intensive, immersion, and minority language programs in Canada.

Literacy

Where literacy is concerned, two key surveys are available to users. The first option is the Programme for the International Assessment of Adult Competencies (PIAAC; 2010-). While the second option is the Programme for International Student Assessment (PISA; 2000/2009).

The first survey covers adult education and training (ages 16-65) with a targeted focus on literacy. It evolved from two previous surveys on literacy which we outline below.

In 1994 the Organization for Economic Co-operation and Development (OECD) approached seven countries—including Canada— to create a comparable literacy baseline that could be useful across national, linguistic, and cultural boundaries. This initial survey was to be named the International Adult Literacy Survey (IALS). The main purpose of the survey was to find out how well adults used printed information to function in society. The results from this first installment demonstrated a strong link between literacy and a country’s economic potential. Thereafter, the list of countries expanded to 16 and two more installments of the survey were carried out in 1996 and 1998.

A second attempt to measure literacy came in 2003 through the International Adult Literacy and Skills Survey (IALSS). Following the legacy of the literacy surveys of the 1990s, the IALSS (2003-2006) continued as a seven-country initiative that sought to collect information on nationally representative samples of adults aged 16-65. Participants were interviewed at home using psychometric tests that measured prose, document literacy, numeracy, and problem-solving skills. Like the previous surveys, the main purpose of this iteration was to find out how well adults used printed information to function in society. An added goal of this survey was to maintain continuity with the previous surveys so as to be able to trace literacy trajectories over time.

In light of these two previous literacy surveys, Statistics Canada piloted the Programme for the International Assessment of Adult Competencies (PIAAC) in 2010 and rolled out the final survey in 2011. Like the previous surveys, the PIAAC is an OECD-led initiative meant to collect information on individuals’ literacy, numeracy, and problem-solving skills. Nevertheless, in this iteration, the survey expanded to include 27 countries and their populations’ workplace skills, educational backgrounds, professional attainments, and ability to use information and communications technology. In total, approximately 49,000 Canadians were surveyed in 2012.

Alternatively, the second option for users interested in literacy data is the Programme for International Student Assessment (PISA; 2000/2009). This survey is part of a concerted effort on the part of 65 countries, coordinated by the Organization for Economic Cooperation and Development (OECD), to collect data of all 15-year-olds who are about to complete their mandatory schooling. The survey is a multi-cycle rotating test of reading, mathematics, science, and problem-solving skills.In Canada, the latest cycle saw approximately 20,000 15-year-olds participate from more than 850 schools. These schools comprised both English and French boards from all Canadian provinces.

Due to its rotational nature, users looking particularly for literacy data—rather than mathematics and science—are encouraged to begin their search with the 2000 and 2009 cycles. Moreover, users should also note the specific literacy framework used to conceptualize reading and writing skills (Knighton, Brochu, and Gluszynski, 2010: 14). We reproduce this framework in Section Second Language Education and Literacy of this tab.

Ethnocultural surveys

Besides the six core databases examined above, there are also approximately 40 other surveys at Statistics Canada that include language questions. For language policy researchers the document entitled Statistics Canada Data Sources on Official-Language Minorities(PDF, 840 KB) (Lafrenière, 2013) lists these surveys while also outlining the strengths and limitations of each one. It also provides the conceptual framework Statistics Canada uses to examine language knowledge, language use, and language learning. Further still, it conveniently provides both nominal and operational definitions of most language variables.

Of these 40 surveys, we note six in particular which we think complement the primary language-related surveys. Particularly so in matters related to ethnocultural diversity, immigration, aboriginal peoples, and economic participation of the various linguistic groups. These six surveys listed below provide a good starting point that will appeal to a wide audience--all the while still retaining several language questions.

Ethnic Diversity Survey (EDS)
Longitudinal Immigration Database (IMDB)
Longitudinal Survey of Immigrants to Canada (LSIC)
Aboriginal Peoples Survey (APS)
Aboriginal Children’s Survey (ACS)
Labour Force Survey (LFS)

As this brief introduction demonstrates, Statistics Canada collects and maintains a large number of databases relevant to language research. These may come in the form of the flagship census program or as various other cross-sectional, multi-cycle, or longitudinal surveys that touch on various language relevant subjects. In the next section, we overview the five language variables and the seven questions used at Statistics Canada since 1901.