Austen

About

Project Team
Download the Data
Technical Information

Project Team

Laura White, Principal Investigator, John E. Weaver Professor of English

Carmen Smith, Associate Investigator, Ph.D. candidate, English

Brian Pytlik Zillig, Professor, Libraries and Fellow, CDRH

Laura Weakly, Metadata Encoding Specialist, CDRH

Karin Dalziel, Digital Design/Development Specialist, CDRH

Stephen Ramsay, Susan J. Rosowski Associate University Professor, English, and Fellow, CDRH (on project 2011-13)

Matthew Jockers, Assoc. Professor, English, and Fellow, CDRH (on project 2013-present)

Jessica Dussault, Programmer, CDRH

Encoding by

Carmen Smith
Laura Weakly
Danielle Metcalf, M.A., English
David Moberly, M.A., English
Stephanie Camerone, Ph.D. student, English
Samantha Adrales, B.A. student, English
Julie Ward, B.A. English

Download the Data

Novel TEI-XML

The TEI below contains the markup that was used to power the visualizations.

Pride and Prejudice Persuasion Northanger Abbey Sense and Sensibility Emma Mansfield Park

Technical Information

Data

Documents follow the Text Encoding initiative P5 standard.

Technologies

Framework: Ruby on Rails 4
Search and Browse functionality: Apache Solr
Data Querying and Transformation: XSLT Scripts (powered by Saxon 9 HE)
Data Indexing: Ruby Scripts

Process

Data Creation

First all documents were encoded following a sample encoding developed by Laura Weakly. More on this process can be found in the background section. XSLT was then used to transform the documents into HTML which preserved many of the elements of the TEI, including speaker and FID information. For the visualizations, CSS and Javascript was written to highlight various aspects of the markup. For the document search, an XSLT script was written to convert the TEI XML into the Solr XML ingest format, with each <said> tag representing one document in the Solr index. We automatically numbered the <said> so that results can be viewed in document order.

For the word frequencies, XSLT scripts were written to create a unique word list for each character and trait.

Website Creation

The first proof of concept version of Austen Said was developed in Apache Cocoon, before creating the current iteration in Ruby on Rails. Cocoon allowed for dynamic XSLT transformations, while files in the Ruby on Rails version are preprocessed whenever files change.

Austen Said:

Patterns of Diction in Jane Austen's Major Novels