Austen Said:

Patterns of Diction in Jane Austen's Major Novels


by Laura White

Beginnings (back to top)

This project began when David Moberly, then an M.A. student at the University of Nebraska-Lincoln [UNL] (and currently a Ph.D. student at the University of Minnesota) approached me at the suggestion of Steve Ramsay (Susan J. Rosowski Associate Professor of English, UNL) about my interest in data-mining Jane Austen. Our first idea, checking Austen’s words against those of the Anglican liturgy and the King James Bible, was a bust, as Austen tended not to use the exact words of liturgy or scripture to avoid irreverence. But with the help of Brian Pytlik Zillig of UNL’s Center for Digital Research in the Humanities [CDRH] and Professor of Libraries, and his brainchild, TokenX, a text visualization, analysis, and play tool, we began to look for patterns of speech in Austen, first with Pride and Prejudice [PP]. David coded all the dialogue of the novel, at which point we began to see how interesting results of this sort could be.

The size of Austen’s corpus, only six completed major novels, seems to augur against successful data-mining or “distant reading,” as an ordinary person can read all six for him or herself and presumably note important features of diction without the aid of computational analysis. And yet we found even at this early stage that there were tantalizing results of the sort which ordinary readers would indeed miss. TokenX works through creating frequency tables which sort for the unique utterances of a given character (or type of character, such as “female” or “male”). We invite you to play with the “Word Frequencies” section of the site to see what kind of interesting results emerge. In some cases, one can make a kind of impressionistic picture of a given character through the list of their unique word choices; for instance, here are some of the words Darcy uniquely uses (in these cases, twice each): accusations, carelessness, catch, consulting, defiance, existing, explaining, faithful, indirect, lessen, liberally, meanly, motives, offers, purposely, repetition, school, and support. Words unique to the narrator are perhaps less surprising, as many of them are words that set the stage or describe action, such as “followed” (25 times), “breakfast” and “listened” (20 times each), and “pause” and “seated” (15 times each). Much later in our project, Steve Ramsay would find that running similarity algorithms on the vocabulary of the characters in PP reveals that the narrator, Elizabeth, and Darcy are more similar to each other in the words they use than they are to any other character. The surprise is that Mr. Collins also shares an enormous amount of vocabulary with this group, only slightly less than Elizabeth shares with Jane, her own sister. Users may wish to create wordclouds from the frequency tables of unique utterances to create a rendering of a given character, such as Darcy’s.

Traits (back to top)

From the beginning, we aimed to search not just for individual speech but patterns of speech among classes of characters. We assigned traits of gender, age, marital status, class, occupation, and kind of character. The categories within the traits are listed below.


Assigning gender is unproblematic in Austen’s novels. Some of the more interesting results arise through searching by gender. For instance, we found that in Pride and Prejudice no female character uses the word “matrimony” while no male character uses the word “wedding” (female characters use this word eight times).


While “out” in Austen’s day described only young women of marriageable age, we found it a handy (and short) code for both marriageable male and female characters (the chief actors in most of the novels). Age is sometimes hard to discern—for instance, is Mrs. Weston of Emma [E] “young married” or “middle-aged”? We have coded her as “young married,” partly because she becomes pregnant and has a child within the action of the story. One would think that “deceased” would not be a necessary category, but there are points in which now-dead characters speak through indirect diction within the speech of the narrator or that of living characters.

Marital Status

Many characters marry during the course of the action, and so we have coded them as “married during action” or “mda” (examples include Mr. Elton in Emma and Maria and Mr. Rushworth in Mansfield Park [MP]). Users may wish to play with this designation if they are more interested in one stage of the character’s life over the other (for instance, one can search for Mr. Elton alongside unmarried characters or married ones).


As many critics have noted, Austen’s world is focused on the gentry and professional classes, with few aristocrats and a limited depiction of servants and other members of the working classes. Working-class speech, we found, was very rare. We assigned married female characters the class status and occupation status of their husbands, as reflects the social laws of Austen’s day; Mrs. Gardiner, for instance, has the class standing of her husband, a lawyer, and thus belongs to the professional class. However, assigning class rank for married women based on their husband’s class becomes more problematic with characters who marry within the course of the novel (for instance, Lydia Bennet of PP is coded as “landed gentry” even though for several chapters at the end of the novel she has become a member of the professional classes, married as she is to an army officer, worthless though he might be).


The range of occupations of Austen’s speaking characters is limited: clergyman (Austen’s clergymen are always Anglican priests), doctor, nurse, lawyer, naval officer, army officer, servant, and farmer. Most of her characters do not have any occupation at all, including almost all of the women.

Kind of Character

This trait is perhaps the most open to readerly interpretation. In most cases, assigning the hero and heroine is readily done. We judged that both Marianne and Elinor Dashwood in Sense and Sensibility [SS] as well as Elizabeth and Jane Bennet in PP are heroines; likewise, Colonel Brandon and Edward Ferrars (SS) and Mr. Darcy and Mr. Bingley (PP) are coded as heroes. There is one cad in each novel. There are more fools than any other sort of character; “neutral” was assigned to those relatively rare characters such as Mrs. Weston (E) or Mrs. Gardiner (PP) who are reasonably wise and decent people. We have included the category of “heir” because five of the six novels feature a major character who is marked as being the heir of a substantial property and/or title (Henry Tilney in Northanger Abbey [NA], Mr. Collins [PP], Edward Ferrars [SS], Tom Bertram in Mansfield Park [MP], and Mr. Elliot [P]).

Questions left hanging: The Turn to FID (back to top)

Coding direct speech can tell us a lot about Austen’s technique. However, much of the novels are rendered in indirect speech and free indirect discourse [FID]. In FID, the narrator renders not merely the point of view of a given character (focalization) but gives the flavor of a character’s speech or thought. In the most direct form of FID, the narrator ventriloquizes for the character; in other words, the reader senses that by changing third person pronouns to first person ones, one would have a rendering of the character’s direct speech or thought. For instance, when Colonel Brandon first calls on Elinor and Marianne in London, Elinor’s question to him in FID can readily be turned into direct dialogue: “. . . she asked if he had been in London ever since she had seen him last” becomes (inferentially) “she asked,“ ‘if you have been in London ever since I saw you last.’”

Outside of direct dialogue, free indirect discourse is the most common, economical, and sophisticated way novels (and other texts) relay information about thoughts and speech. For instance, early in Pride and Prejudice we learn what Mr. Bingley thought of the Meryton assembly at which he meets the Bennet family: “Bingley had never met with pleasanter people or prettier girls in his life; every body had been most kind and attentive to him, there had been no formality, no stiffness, he had soon felt acquainted with all the room; and as to Miss Bennet, he could not conceive an angel more beautiful” (Ch. 4). The narrator is speaking, but the language is really Bingley’s, including his use of clichés and hyperbole (e.g., “an angel more beautiful”). One cannot account for diction in Austen’s novels without paying attention to indirect speech, especially FID. Bingley needs to have credit for saying “an angel more beautiful,” not merely because it is his language but also because it is plainly not the language the narrator would choose for herself. Because FID always blends the speech of the narrator with that of a character, it creates opportunities for tremendous narrative flexibility, as narrators can vary how closely they mimic the language of their characters, and can make ironic points at the character’s expense even as they seem to let their characters speak or think for themselves. FID’s use in the eighteenth-century novel (British, German, French, etc.) was rudimentary—one finds some employment of FID in Goethe, for instance, and in Fanny Burney and Samuel Richardson, but no sustained or complex use. As many scholars have noted, it took Jane Austen, writing six novels between 1811 and 1817, to discover and exploit the full potential of FID, including FID’s capacity to display complex ironies. Austen’s discovery of what FID could do was comparable in the history of the novel to the discovery of the atomic bomb in the history of warfare; thereafter, things were never the same, and FID became a basic feature of the novel as genre.

One cannot account for all the intricacies of Austen’s use of diction by simply coding direct dialogue and leaving all the rest to the narrator, because the narrator is often speaking in the voice of her characters. Furthermore, without coding all forms of indirect speech, one cannot answer questions such as which character’s diction most resembles that of the narrator, or even to which gender the narrator’s diction aligns most closely. After all, within everything we had coded initially as “narrator” there were multitudinous points at which both the narrator and a given character are speaking at once (approximately 20%-30% of the narrator’s language is FID). (You can look at the “Novel Visualizations” section of the site to get a color-coded view of these different kinds of diction in the novel.) Accounting for all of the words in Austen’s novels thus requires marking shared speech, both indirect speech and FID. Marking FID is a special challenge due to its nature.

Free indirect speech, free indirect discourse involves both a character's speech and the narrator's comments or presentation. Famously utilized by James Joyce, free indirect discourse is a more comprehensive method of representation--one which many times makes indistinguishable the thoughts of the narrator and the thoughts of a character. Thus, the method typically privileges the past tense, yet cannot be discerned through merely grammatical indicators.

--International Society for the Study of Narrative

Focalization (back to top)

Throughout, one of the greatest challenges for the coder of Austen’s diction is distinguishing between focalization and FID. Focalization occurs when the narrator renders a character’s point of view, feelings, motives, or thought. All FID is a kind of focalization, but not all focalization is FID. To be FID, the speech must have something of the flavor of the exact words the character would have used in either speech or thought; FID must be a kind of ventriloquism by the narrator. Focalizations which only convey a character’s feelings are often not FID, as in this passage from Emma: “Part of [Mrs. Weston’s] meaning was to conceal some favourite thoughts of her own and Mr. Weston’s on the subject, as much as possible. There were wishes at Randalls respecting Emma’s destiny, but it was not desirable to have them suspected” (Ch. 5). These sentences speak to Mrs. Weston’s motives, hoping to conceal from Emma that she wishes Emma would marry Frank Churchill, but she would not have said to herself something like “part of my meaning is to conceal some favourite thoughts of my own.” Similarly, when Emma leaps to insult Miss Bates, the narrator’s comment, “Emma could not resist,” is focalization without FID. What is described is the flash of succumbing to temptation; the words in Emma’s mind are presumably not “I cannot resist” but the clever and devastating thing she actually says to Miss Bates, “‘Ah! Ma’am, there may be a difficulty. Pardon me, but you will be limited as to number, only three at once.’” The task of distinguishing between focalization and FID is made thornier when the narrator is describing the thoughts or speech of her wittier heroines, because it is sometimes difficult to discern between the narrator’s ironies and those of her characters. Distinguishing FID for Catherine Morland (NA) was much easier, as Catherine’s mind is no match for that of Austen’s narrators.

Scaling (back to top)

Free indirect discourse wavers in its intensity. Sometimes it seems as if one could simply replace the third person pronouns with first person ones and one would have the character’s direct speech, as in the earlier example of Elinor’s question to Colonel Brandon. In other cases, the FID seems less fully direct ventriloquism of a character; instead the narrator’s speech has something of the flavor of or, in some cases, only hints at a character’s speech. To complicate the situation, the same passage of FID can change in intensity, as in this example, in which Elizabeth’s thoughts about Darcy are rendered in FID, but the central statement, “In that case he would return no more,” seems most plainly rendered exactly as Elizabeth would have thought the idea (we show the coding here to show how we expressed the scaling of FID intensity):

        <said who="aus.001.nar_eliz" direct="false" aloud="false"><certainty locus="name" target="aus.001.nar_eliz" degree=".5"/> If he had been wavering
              before as to what he should do, which had often seemed likely, the advice and entreaty of so near a relation might settle every doubt, and
              determine him at once to be as happy as dignity unblemished could make him.</said>
         <said who="aus.001.nar_eliz" direct="false" aloud="false"><certainty locus="name" target="aus.001.nar_eliz" degree="1"/> In that case he would
              return no more.</said>
         <said who="aus.001.nar_eliz" direct="false" aloud="false"><certainty locus="name" target="aus.001.nar_eliz" degree=".7"/> Lady Catherine might see
              him in her way through town; and his engagement to Bingley of coming again to Netherfield must give way.</said>

We found scaling to be the most difficult part of this project, for though FID clearly varies in intensity, ascribing it to a mathematical scale seems to offer a false scientific precision. We encourage readers to look at “Novel Visualization” for a visual record of the scaling, where more intense colors represent more intense FID.

Novel Visualization (back to top)

This section of the site allows the user to see patterns of indirect speech and FID, including the intensity of scale of FID, at a glance. You can scroll through the pages of a given novel, noting where indirect speech is most predominant and where it ebbs. You can also see exactly how we coded individual passages, and if you disagree with our interpretation of a given passage, we would be glad to hear your case for a different one (contact Laura White at

Almost half of Austen’s FID is introduced by a kind of formula: “character” “verb expressing thought or speech” “that” (e.g., “Mr. Gardiner added in his letter, that they might expect to see their father at home on the following day” [PP, Vol. III, Ch. 10]). These introductions, where they occur, have been coded throughout and can be seen in the novel visualizations.

Personography (back to top)

The personography of Austen’s novels would be very simple if we were limited only to speaking characters and the narrator. But once one turns one’s attention to indirect speech and FID, the personography widens considerably. Sometimes living characters cite the dead; sometimes a group of people, such as Meryton society, render judgment through the narrator in FID; and sometimes characters speak in chorus in indirect speech. There are some occasions in which indirect speech is boxed within indirect speech within direct speech, and other complex configurations (for instance, when the narrator renders Elizabeth’s thought in FID while incorporating within that FID the direct speech of Colonel Fitzwilliam). See Adventures in FID for examples of complex indirect speech.

Adventures in FID (back to top)

Here, courtesy of Carmen Smith, are some examples of the many complex forms of diction Austen employs relative to indirect diction and/or FID. The reader does not notice these complexities, generally, just as we do not pay much attention to occasions of indirect speech or even FID in our own lives—they seem natural, like water to a fish. However, coding for indirect speech forces philological precision, as much as it can be attained. Ultimately, this project depends on professional judgment about who speaks what and to what degree. As Roland Barthes noted in S/Z,

Irony acts as a signpost, and thereby it destroys the multivalence that might be expected from quoted discourse. A multivalent text can carry out its basic duplicity only if it subverts the opposition between true and false, if it fails to attribute quotations (even seeking to discredit them) to explicit authorities, … in short, if it coldly and fraudulently abolishes quotation marks which … distribute the ownership of the sentences to their respective proprietors, like subdivisions of a field. (44-5)

Austen’s use of FID certainly “abolished quotation marks” which “distribute the ownership of the sentences to their respective proprietors”; we look to make judgments which put the marks back in, as it were, at least heuristically. We rely on informed inference throughout, especially in complex cases, a reliance which in turn depends on the interpretation of characters’ mindstates.

Direct dialogue within direct dialogue (here, changing the speaker from Mrs. Bennet to Mrs. Bennet-as-Mrs. Long, and then in the second example from Darcy to Darcy-as-Elizabeth, yet in both cases remaining ‘aloud’):

‘And, my dear Jane, I never saw you look in greater beauty. Mrs. Long said so too, for I asked her whether you did not. And what do you think she said besides? “Ah! Mrs. Bennet, we shall have her at Netherfield at last.” She did indeed.’ (Vol. III, Ch. 12)

‘Your reproof, so well applied, I shall never forget: “had you behaved in a more gentleman like manner.” Those were your words.’ (Vol. III, Ch. 16; Darcy reciting past-Elizabeth to present-Elizabeth)

Characters not otherwise afforded presence (no direct dialogue in the book) expressing themselves through FID; here, the gardener of Pemberley and Darcy’s dead father, respectively:

Mr. Gardiner expressed a wish of going round the whole Park, but feared it might be beyond a walk. With a triumphant smile, they were told, that it was ten miles round. (Vol. III, Ch. 1)

‘My excellent father died about five years ago; and in his will he particularly recommended it to me, to promote his advancement in the best manner that his profession might allow, and if he took orders, desired that a valuable family living might be his as soon as it became vacant.’ (Vol. II, Ch. 12)

FID by a large, unnamed collection of people, here, Meryton society:

The Bennets were speedily pronounced to be the luckiest family in the world, though only a few weeks before, when Lydia had first run away, they had been generally proved to be marked out for misfortune. (Vol. III, Ch. 13)

Very short instance of FID inserted into narration, sometimes with a length of only one or two words (easy to miss!):

Lydia was exceedingly fond of him. He was her dear Wickham on every occasion… (Vol. III, Ch. 9)

Direct quotation from another, in this scene physically absent, character within FID-thought (here, Elizabeth’s):

…He had ruined for a while every hope of happiness for the most affectionate, generous heart in the world; and no one could say how lasting an evil he might have inflicted. [par] “There were some very strong objections against the lady,” were Colonel Fitzwilliam’s words, and these strong objections probably were, her having one uncle who was a country attorney, and another who was in business in London. (Vol. II, Ch. 10)

Word usage (“said she”) indicates direct dialogue, but contextually, this is clear FID (not said aloud as indicated by third-person usage) camouflaged as direct dialogue; in other words, the direct dialogue here constitutes FID:

…Elizabeth was determined to make no effort for conversation with a woman, who was now more than usually insolent and disagreeable. “How could I ever think her like her nephew?” said she, as she looked in her face. (Volume III, Chapter XIV)

Some examples of typical introductions of FID in Austen:

‘That’ phrase indicating FID beforehand, sometimes also with ‘that’ omitted but implied:

Mr. Gardiner added in his letter, that they might expect to see their father at home on the following day… (Volume III, Chapter VI)

Entire sentence contextually leading into FID without direct cues:

On such encouragement to ask, Elizabeth was forced to put it out of her power, by running away. [par] But to live in ignorance on such a point was impossible; or at least it was impossible not to try for information… (Volume III, Chapter IX)

Elizabeth did not know what to make of it. Had she not seen him in Derbyshire… (Volume III, Chapter XI)

Inserted phrase, generally synonymous with “character thought”:

Jane’s delicate sense of honour would not allow her to speak to Elizabeth privately of what Lydia had let fall; Elizabeth was glad of it; – till it appeared whether her inquiries would receive any satisfaction, she had rather be without a confidante. (Volume III, Chapter IX)

Regular introduction of FID with ‘that’ phrase, yet occurring within a letter by a different character, with a consequent change of speaker:

‘His own father did not long survive mine, and within half a year from these events, Mr. Wickham wrote to inform me that, having finally resolved against taking orders, he hoped I should not think it unreasonable for him to expect some more immediate pecuniary advantage…’ (Vol. II, Ch. 12)

Elaborate description introducing FID contained in prepositional clause; particularly difficult to ascertain where introduction ends and FID begins:

The vague and unsettled suspicions which uncertainty had produced of what Mr. Darcy might have been doing to forward her sister’s match… (Volume III, Chapter X)

Text (back to top)

Austen’s manuscripts, unlike (say) Walt Whitman’s, are relatively unproblematic, with only a few genuine cruxes. The texts were created by comparing two previously digitized, open-source editions — one from The University of Adelaide, the other from Project Gutenberg. When differences occurred that were not obvious Optical Character Recognition errors, a third source, noted in the metadata for each text, was consulted.

There are a few kinds of words or passages in the novels which are not coded as either character or narrator speech: chapter or volume markings (e.g., “Volume II, Chapter I”), quotations (e.g., from Shakespeare, Gray, or nursery rhymes), and putative documents (e.g., the passage on the Elliots in the Baronetage from Persuasion or the newspaper clipping about Mrs. Rushworth’s running off with Henry Crawford in Mansfield Park). Letters are treated throughout as direct speech from the letter-writer, though letters can and often do contain both indirect speech and free indirect speech (so coded).

Going forward:  FID in the novel (back to top)

If FID can be found computationally, the scholarly world would potentially be significantly advanced in terms of understanding narrative in general computationally. That said, FID is notoriously hard to track. The scholarship on FID has confirmed what we suspected, that FID has no grammatical markers as such (see Further Reading). However, having coded all of Austen’s major novels for FID, we are in a position to test that and other propositions about FID in the novel, especially as Austen’s innovations make her novels a linchpin of sorts in the history of FID and the history of the novel. We plan to deploy the encoded novels against the eighteenth-century novel (through the huge number of novels available digitally in ECCO, Eighteenth-Century Collections Online) and against the nineteenth-century novel (through the capacious hoard of Chadwick-Healey’s Nineteenth-Century Fiction Collection). We expect that our work will have several important outputs and objectives: 1) determining if FID can be found computationally in Austen; 2) determining if FID can be found computationally in novels more generally; and 3) providing the means to analyze the development of FID in the novel as genre. This last is the most ambitious of the three ambitious goals.

Further Reading on FID and interpreting character’s mindstates (back to top)

Banfield, Ann. Unspeakable Sentences: Narration and Representation in the Language of Fiction. Boston: Routledge and Kegan Paul, 1982.

Bray, Joe. “The Source of ‘Dramatized Consciousness’: Richardson, Austen, and Stylistic Influence.” Style 35.1 (2001): 18-29.

Bühler, Willi. Die ‘erlebte Rede’ im englischen Roman: ihre Vorstufen und ihre Ausbildung im Werke Jane Austens. Zurich: Max Niehans Verlag: 1937. In German.

Cohn, Dorrit. Transparent Minds: Narrative Modes for Presenting Consciousness in Fiction. Princeton: Princeton UP, 1978.

Dillon, George L. and Frederick Kirchoff. “On the Form and Function of Free Indirect Style.” Poetics and Theory of Literature 1.3 (1976): 431-40.

Ferguson, Frances. “Jane Austen, Emma, and the Impact of Form.” Modern Language Quarterly, vol. 61, no. 1, Mar. 2000, pp. 157-80.

Finch, Casey and Peter Bowen. “‘The Tittle-Tattle of Highbury’: Gossip and the Free Indirect Style in Emma.” Representations 31 (1990): 1-18.

Flavin, Louise. “Free Indirect Discourse and the Clever Heroine of Emma.” Persuasions 13 (1991): 50-7.

Fletcher, Angus, and Mike Benveniste. “A Scientific Justification for Literature: Jane Austen’s Free Indirect Style as Ethical Tool.” Journal of Narrative Theory 43.1 (2013): 1-18.

Goldman, Alvin. Stimulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford: Oxford UP, 2006.

Gunn, Daniel. “Free Indirect Discourse and Narrative Authority in Emma.” Narrative 12 (2004): 35-54.

Hale, Dorothy. “Fiction as Restriction: Self-Binding in New Ethical Theories of the Novel.” Narrative 15 (2007): 187-206.

Horstkotte, Silke. “Seeing or Speaking: Visual Narratology and Focalization, Literature to Film.” Narratology in the Age of Cross-Disciplinary Narrative Research. Eds. Sandra Heinen and Roy Sommer. Berlin: de Gruyter, 2009. 170-92.

Keen, Suzanne. “Readers’ Temperaments and Fictional Character.” New Literary History 42 (2011): 295-314.

---. Empathy and the Novel. Oxford: Oxford UP, 2010.

Nazar, Hina. “The Imagination Goes Visiting: Jane Austen, Judgment, and the Social.” Nineteenth Century Literature 59 (2004): 145-78.

Neumann, Anne Waldron. “Characterization and Comment in Pride and Prejudice: Free Indirect Discourse and ‘Double-Voiced’ Verbs of Speaking, Thinking, and Feeling.” Style 20.3 (Fall 1986): 364-94.

Page, Norman. The Language of Jane Austen. Oxford: Basil Blackwell, 1972.

Pascal, Roy. The Dual Voice: Free Indirect Speech and Its Functioning in the Nineteenth Century European Novel. Manchester: Manchester UP, 1977.

Phelan, James. Reading People, Reading Plots: Character, Progression, and the Interpretive of Narrative. Chicago: U of Chicago P, 1989.

Poovey, Mary. The Proper Lady and the Woman Writer: Ideology as Style in the Works of Mary Wollstonecraft, Mary Shelley, and Jane Austen. Chicago: U of Chicago P, 1985.

Shaw, Harry E. Narrating Reality: Austen, Scott, Eliot. Ithaca: Cornell UP, 1999.

Starr, G. Gabrielle. “Evolved Reading and the Science(s) of Literary Study.” Critical Inquiry 38 (2012): 418-25.

Tandrup, Birthe. “Free Indirect Style and the Critique of the Gothic in Northanger Abbey.” In The Romantic Heritage: A Collection of Critical Essays. Ed. Karsten Engelberg. Copenhagen: U of Copenhagen, 1983. 81-92.

Tone, Anne. “A ‘Said He’ or a ‘Said She’: Speech Attribution in Austen’s Fiction.” Persuasions 34 (2012): 140-49.

Vermeule, Blakey. Why Do We Care About Literary Characters? Baltimore: Johns Hopkins UP, 2009.

Wajsberg, Jeffrey. Jane Austen’s Free Indirect Style: A Linguistic Ethnography. M.A. thesis. Vancouver: U of British Columbia, 2012.