Defining Privacy
A critical investigation of Canadian political discourse

Text-Analysis

3.3 Concordances

A concordance is another method of electronic text analysis. Concordances serve the purpose of bringing together, or concording, passages of text that help to show how a word is used in context (Howard-Hill 4). Concordance outputs are not limited to whole words, they can also be tailored to show lists of letters, phrases, suffixes, and parts of speech (nouns, verbs, etc.) (Adolphs 5; McEnery and Hardie 35).

Introduction

The most common format for a concordance is known as a Key Word in Context, or KWIC, and it is arranged so that all instances of a search item are in the middle of the page (Adolphs 52; Baker 71; Tognini-Bonelli 13). This search item is often referred to as a ‘node’, and all of the words on the left and right of the node are called the ‘span’. Descriptions of concordance data label the node as N, and the items on the sides as N-1, N-2, N+1, N+2, etc. (Adolphs 52), depending on their distance and position in relation to the node. Figure 3-4 is an example of a KWIC generated from the Hansard corpus where N = privacy.

Figure 3-4: Selection of 25 random concordance lines
Displaying 25 of 918 matches:
imply unacceptable. That is why the Privacy Commissioner’s office was notified.
 the matter to the attention of the Privacy Commissioner of Canada. I also aske
table. That is why we called in the Privacy Commissioner and called in the RCMP
able. That is why we brought in the Privacy Commissioner. That is why we brough
ese victims and when will they take privacy protection seriously? (1455) Hon. D
 is why we took steps to inform the Privacy Commissioner of Canada and to bring
ystems to make sure that Canadians’ privacy is protected. That is why we have e
 happened. We have also advised the Privacy Commissioner of the situation. We h
. Speaker, the government takes the privacy of Canadians extremely seriously. T
ely unacceptable. The Office of the Privacy Commissioner has been notified and 
nment takes extremely seriously the privacy of Canadians and the loss by the de
ion. We will continue to do so. The privacy commissioner is investigating this.
ed before, the government takes the privacy of Canadians extremely seriously- S
mentioned, the government takes the privacy of Canadians extremely seriously. T
p greater chances for fraud. As the Privacy Commissioner now conducts her inves
g Bob Zimmer Access to Information, Privacy and Ethics Chair: Pierre-Luc Dussea
stioned Conservative legislation on privacy concerns, we were accused of standi
006. According to the Office of the Privacy Commissioner, this is one of the la
he Information Commissioner and the Privacy Commissioner. I will try to delinea
the Information Commissioner or the Privacy Commissioner. Each of them are offi
ve seen the value of an independent Privacy Commissioner working on behalf of a
a government agency. Canada’s first Privacy Commissioner, Inger Hansen, was wit
ssion at first, and then, under the Privacy Act, became an independent officer 
tion Commissioner, Auditor General, Privacy Commissioner, we say the Parliament
neral, Information Commissioner and Privacy Commissioner are all examples of of

Just above the KWIC is a line stating that this particular list contains 25 instances out of a total of 918 matches. This means that the concordance program found a word frequency count for ‘privacy’ totaling 918 occurrences in this search. The node word, privacy, is found in the centre of the page and the total sentence span is equal to 79 characters (including letters, punctuation and spaces).

It is immediately apparent the potential that concordance outputs have for the generation of hypotheses about corpora (Adolphs 51). The nature of the concordance format provides a convenient layout for examining word or phrase use in context, along with the identification of trends or patterns in language use (Stubbs, Text and Corpus Analysis xviii). The example in Figure 3-4 shows 16 occurrences of the word ‘privacy’ in relation to the word ‘Commissioner’, one instance of the phrase ‘Privacy Act’, and one instance of the phrase ‘Access to Information, Privacy and Ethics Chair’. Of the remaining seven instances, when the words to the right of the node are examined, four include the phrase ‘privacy of Canadians’, two include the lemma ‘protect’, and the remaining instance contains the word ‘concern’. A lemma is a base-word from which other words can be constructed, even though they may differ in form or spelling (Baker, Hardie and McEnery 104; Sinclair, Corpus, Concordance, Collocation 41). ‘Protection’, and ‘protected’ are both variations on the lemma ‘protect’.

Figure 3-5: Selection of 25 concordance lines sorted alphabetically at N+1
stioned Conservative legislation on privacy concerns, we were accused of standi
are specifically intended to reduce privacy concerns and to increase accountabi
 identifying information because of privacy concerns. However, some files dealt
f law and strikes a balance between privacy concerns and investigations that ca
of the victims clearly outweigh the privacy concerns for the offenders. Bill C-
the government plans to address the privacy concerns of Canadians who have been
ringe on civil liberties. It raises privacy concerns that ought to be referred 
 Act. This would be just to protect privacy concerns of people who may have bee
DA allows for mediated solutions to privacy conflicts that can give both indivi
 verify that operational, legal and privacy considerations are met. With regard
 supposed to be protecting personal privacy data, but we see that is creating a
fear they have been hacked and that privacy data has been breached, it has to b
as once seen as the world leader in privacy data. Our Privacy Commissioner is d
ment seems to think that losing the privacy data of one million Canadian senior
powers and the authority to protect privacy data from hacking, how does she com
 which is address the protection of privacy data in the age of big data, with t
ith the NDP to protect the right to privacy declared victory yesterday when Bil
 that this bill follows the court’s privacy directives. Some of the bill’s word
at avoids litigation when resolving privacy disputes. PIPEDA also provides the 
arding individuals is a significant privacy enhancement. This dual approach wil
 hear from civil society groups and privacy experts and other jurisdictions whe
were repeated calls by Internet and privacy experts and civil society groups to
 only from civil society groups and privacy experts but from those familiar wit
 There have been concerns raised by privacy experts, by digital experts, that t
ns, in particular legal experts and privacy experts, raising concerns with the 

While concordances can be investigated manually in this manner, they can also be rearranged alphabetically on either side of the node. Figure 3-5 shows a sample of right node alphabetization. The concordance can be further sorted based on a selective number of objective criteria (Tognini-Bonelli 13). Using Figure 3-4 as an example, all of the lines containing the phrase ‘Privacy Commissioner’ have been filtered out as they were deemed unnecessary to this particular analysis. Alternatively, adding a second word to the concordance search (within a span of one or two words) can help identify particular themes of usage (Adolphs 55).

History

While computers make the production of concordances much easier, their history pre-dates the electronic age. Early concordance work was produced with the intention of studying quotations, allusions and figures of speech in literature, not everyday language (Sinclair, Corpus, Concordance, Collocation 42). What is considered to be the first concordance was hand-compiled for the Latin Vulgate Bible by Hugh of St Cher with the assistance of over five hundred monks in 1230 (McEnery and Hardie 37). Father Roberta Busa compiled the first automated concordance, a project which began in 1951 (Hockey; McEnery and Hardie 37), and by the 1960’s scholars were beginning to see the value of concordances for the purpose of textual and literary analysis. The first generation of concordancers were held on large mainframe computers and used at a single site (McEnery and Hardie 37). They were generally only able to process non-accented characters from the Roman alphabet; accented characters would be replaced by a pre-determined sequence of characters, although these were not standardized and differed from site to site (Hockey; McEnery and Hardie 38). Early concordancers also had difficulty locating the exact location of the citations in the text, as the raw textual information was stored on punch cards or tape. Variant spellings of words and the production of lemmatized lists were also problematic (Hockey).

The nature of the programming involved to create concordance outputs at this time required the assistance of a computer programmer or engineer, something that was not accessible to all scholars (McEnery and Hardie 38). The second-generation of concordancers solved this issue, as they were available as software packages on IBM-compatible PCs (McEnery and Hardie 39). While these concordance programs suffered from many of the same limitations as earlier concordancers, they made electronic text analysis more accessible (McEnery and Hardie 39). Since the inception of automated concordancing in the 60s, the methods, accessibility and scope has drastically improved. Currently, concordance programs exist as downloadable software, web-based applications, and packages of pre-made code for those interested in computer programming.

Theory

While the production of concordance outputs is essentially another method in the practice of electronic text analysis, this does not mean the technique is one of complete objectivity. Corpus data is not an ontological reality; it is constructed and delimited by the researcher in an attempt to gather meanings about the discourse under study (Teubert 4). In other words, although the corpus exists and is tangible in many ways, it is not a stand-in for the reality of the Parliament. It is a representation of reality that takes its own form and becomes an object in and of itself. Concordances provide the opportunity to examine language in context, and the structured nature of the output helps to ensure that analysts do more than pick examples that meet their preconceptions of the data (Stubbs, Text and Corpus Analysis 154). Yet the theoretical intention of the researcher is still present at every stage, from search choice to interpretation (Stubbs, Text and Corpus Analysis 154). What concordance outputs provide is the ability to present quantitative evidence of electronic text analysis that can be examined by all readers (Stubbs, Text and Corpus Analysis 154).

Concordances are what Stubbs refers to as “second-order data” (Words and Phrases 66). First-order data is the corpus, or what can be called the ‘raw data’; this data is too large for accurate observation and analysis, leading to the creation of second-order data, which is comprised of the word frequencies and concordance output (Stubbs, Words and Phrases 66). A large corpus generates a large amount of concordance lines, and although these can be managed through sampling, further statistical processing can be done to create what Stubbs calls third-order data, which are known as collocates (Stubbs, Words and Phrases 67).

Words in the English language have a tendency to appear with other words (Stubbs, Words and Phrases 17), giving phrases or groups of words a meaning that transcends the value of each individual word if considered separately (Sinclair, Corpus, Concordance, Collocation 104).Collocates are words that co-occur with other words, and lists of these words can be generated algorithmically, accompanied by statistics that determine their significance (Stubbs, Words and Phrases 29).

In terms of this research, collocational statistics were generated but not used, simply because they did not provide any compelling or new evidence to support what had already been discovered through the frequency and concordance analysis. Notably, both Danielsson (112) and Wermter and Hahn (791) have come to the same conclusion regarding the usefulness of collocational data, arguing that frequency statistics alone provide strong enough evidence to support claims about language use.

The Hansard Concordances

A corpus as large as Hansard does not allow for the inspection of every concordance line, and there are many instances that are not worthy of inspection, such as the multiple instances of “Privacy Commissioner” in Figure 3-4. Sampling and alphabetical sorting make the manual inspection of concordance outputs easier and more efficient. That being said, Sinclair makes a valid point in saying that regardless of the thoroughness of the study, there will always be data left over to perform an even more comprehensive study (Corpus, Concordance, Collocation 65). Concordance analysis, much like word frequency calculation, has the purpose of identifying patterns of interest in the corpus that can be highlighted for further study.

Sampling

A preliminary method of reviewing concordance output consists of simply scanning down the list and noting any observable patterns. The concordances are produced in order, which in a sense, becomes a timeline of the node word as it has been used in the corpus from the beginning to the end of the measurement period.

When faced with a large corpus such as Hansard, Sinclair suggests a methodical sampling method to make the analysis more manageable. This involves dividing the number of instances of the word by the number of concordance lines desired, using 25 concordance lines as a general standard (Sinclair, Reading Concordances xviii). For example, if there are 5000 instances of a word and 25 concordance are lines required, then 5000 is divided by 25 for a total of 200. This total is the gap between selections, meaning that 25 lines from every 200 lines should be sampled. Starting at concordance line no. 1, the first 25 concordance lines are selected, then lines 201 through 225, then 401 through 425 and so on until the last instance, in this example, no. 4801 (Sinclair, Reading Concordances xviii). The Hansard corpus was sorted in this manner, both by year and by Session of Parliament. This resulted in groups of seven to 18 concordance samples for each year, and 14 to 21 samples for each Parliament (depending, of course, on the frequency of ‘privacy’ for each section). Each concordance sample contained 25 lines.

Alphabetical Sorting

Once the samples were generated, the resulting concordance lines were sorted alphabetically. The lines were sorted on the right node at position N+1, the first word to the right of ‘privacy’, see Figure 3-5 for an example of this type of sorting. This position yielded the highest amount of duplicate lines for omission, those lines being: Privacy Act; Privacy Commissioner; and Access to Information, Privacy and Ethics. The concordance lines containing those phrases were omitted because they did not accurately represent the pattern of the use of the word ‘privacy’ as a means of determining its meaning. Each sample was then examined to determine any thematic patterns of word use.

Figure 3-6: Selection of concordance lines with a ‘personal’ context
 sexual images of themselves in the privacy of their own home for their own per
 some other country determining the privacy of my personal information. It then
f their own home or a doctor in the privacy of their own doctor’s office. Only
Figure 3-7: Selection of concordance lines about ‘privacy and people’
isadvantage because it respects the privacy of veterans and their families. One 
 to do their job and protecting the privacy of law-abiding Canadians. Everyone 
t out that those same hunters whose privacy the government wants to protect als
Figure 3-8: Selection of concordance lines about ‘privacy and rights’
an while respecting the charter and privacy rights of Canadians. I also believe
n the civil liberties and rights to privacy of Canadians by passing information
t is trying to roll back Canadians’ privacy rights is not constitutional. Does
Figure 3-9: Selection of concordance lines with a ‘positive’ or ‘negative’ context
Positive context
line rules for business. Protecting privacy is good for Canadians, good for bus
n the bill that we think strengthen privacy protection for Canadians, including

Negative context
s a remarkable breach of Canadians’ privacy by their own government. Not only w
h as FATCA, which could violate the privacy of thousands—if not tens of thousan

Interpretation

Answering the research question asked of this section, the concordance output from the Hansard corpus identified the following patterns regarding the use of the word ‘privacy’: privacy is something personal and can imply ownership, information, or space (Figure 3-6); privacy affects certain groups of people, including Canadians, veterans, taxpayers, children, travelers, women, hunters, and law-abiding citizens (Figure 3-7); and privacy has something to do with rights, in the context of human rights, civil rights, constitutional rights, the Charter, and freedom of speech (Figure 3-8).

Grammatically, privacy is something that can be referenced in a negative or a positive light, and these phrases consist most commonly of verbs like breach and violate, or protect and strengthen (Figure 3-9); and privacy is often used as the first word in a phrase with nouns, such as ‘privacy interests’ or privacy obligations’ (Figure 3-10).

Figure 3-10: Selection of concordance lines with ‘privacy’ as a phrase
t talk to people? There are serious privacy interests at stake, as well as the
as to where Joe Smith works, due to privacy situations. However, we can certain
. The first item is, do we know our privacy obligations? Some businesses are bu
nging for us to understand the full privacy implications of Bill C-11, such as

While there were certainly outliers in the samples collected, including phrases like “privacy on the other hand” or “privacy screen”, the overwhelming majority of examples fell into one or more of the previous categories.

Key Observation

In terms of the specific phrases identified in the previous section on frequency calculations, a closer look at the phrase ‘privacy rights’ shows that it is often used in conjunction with the phrase ‘Canadians’, or more interestingly, ‘law-abiding Canadians’ (shown in Figure 3-11). As it was discussed in Chapter 2, ‘privacy rights’ is not necessarily an accurate term, as there is no specific right to privacy in Canada. The connection between ‘privacy rights’ and ‘law-abiding Canadians’ is especially interesting, given that the judgment in R. v. Spencer ruled that privacy protections apply to all Canadians, even when they’ve clearly broken the law.

Figure 3-11: Selection of concordance lines with the phrase ‘law-abiding Canadians’
gling. I assure the member that the privacy rights of law-abiding Canadians are
r, I can assure the member that the privacy rights of law-abiding Canadians are
blic Safety, CPC): Mr. Speaker, the privacy rights of law-abiding Canadians are
ers. The minister claimed that “the privacy rights of law-abiding Canadians are
 something. This motion defends the privacy rights of law-abiding Canadians, an
ocrats do not support violating the privacy rights of law-abiding Canadians. Wh

Again, while it is hard to speculate on specific reasons for these trends without investigating the corpus more thoroughly, the concordance data provides yet another layer upon which to focus the investigation in the next chapter.

Top of Page Home