  1. Approach
  2. Data Mining
  3. Data Editing
  4. Display & Analysis
  5. Further Research


Basic Ideas

  • following the tradition of structuralist approaches in Literary Studies (Barthes 1972, Lotman 1977, Titzmann 1977, etc.)
  • basing it on automated data analysis
  • long-term objective: provide structural data which can be used, for example, to describe different compositional types of plays


Different Styles of Structural Composition

Examples: two plays written by Goethe

network graph of Goethe's Iphigenie auf Tauris network graph of Goethe's Götz von Berlichingen
Iphigenie auf Tauris (1787) Götz von Berlichingen (1773)


The Digital Spectator

  • combining Literary Studies with Social Network Analysis (many corresponding publications since early 2000s, see Bibliography)
  • specific definition of structure (inspired by Solomon Marcus, 1973): two characters are linked to each other if both are performing a speech act in a given segment of a play (act, scene)


465 Network Graphs

Poster of 465 drama networks

At a glance: 465 German-language dramas from 1731 to 1929 (figshare).



Data Mining → Data Editing → Display & Analysis

Data Mining


  • TextGrid Repository: biggest TEI-tagged corpus of German literary texts (contains 666 dramatic texts, cf. blog post)
  • workflow optimised to work with problematic data (faulty TEI, bad OCR, etc.)

Data Mining

DLINA Corpus 15.07 (»Codename Sydney«)

  • included texts only from 1731 to 1929
  • excluded texts following these criteria:
    • translations of foreign-language play
    • texts w/o actual speakers (e.g., pantomime plays)
    • fragments
    • plays with very defective markup
  • result: 465 dramatic texts (Sydney corpus)

Data Editing

Extracting Structural Data

  • left the original TEI files untouched and only extracted the data we were interested in
  • introduction of intermediary format ("zwischenformat", XML, cf. blog post):
    • validated against a specific RNG schema
    • zwischenformat file created for each drama
    • stores metadata, structural data, documentation

Data Editing

Editing Process

Extracted structural data was still full of bugs:

  • Errors due to automated conversion:
    • OCR errors
    • ...
  • Intrinsic problems:
    • variation of character names
    • ...

Complete editing rules including examples can be found on our blog.

Data Editing


Correction of structural bugs with crowd-editing approach:

Screenshot Gamification

Display & Analysis

One homepage for each of the 465 dramas linking to four types of visualisation + source files
(all individual pages listed here):

  • networks (sticky-node and static)
  • matrixes
  • amounts
  • intermediary format files

Display & Analysis

Example: G. E. Lessing's "Emilia Galotti" (1772)

Analysis, thumbnail 1 Analysis, thumbnail 2 Analysis, thumbnail 3 Analysis, thumbnail 4

Display & Analysis

Skit: The biggest chatterboxes in German literature

List of most talkative characters in German theatre plays

Cf. corresponding blog post.

Display & Analysis

Network size (median) by decade (1730–1930):

Network size (median) by decade

Cf. blog post "200 Years of Literary Network Data".

Display & Analysis

Network density (mean) by genre and century:

Network density (mean) by genre and century

Upcoming blog post "Network Values by Genre".

Further Research

  • more statistical data
  • bigger (German-language) corpus
  • foreign-language corpora
  • to sum it all up: using literary network data to evaluate and contribute to traditional Literary Studies

