UbuWeb | UbuWeb Papers


JABBER:The Jabberwocky Engine
Neil Hennessy

From OL3: open letter on lines online (2000)



"This is privileged information. It places the poet in the same vanguard of research as physics, molecular chemistry, and pure mathematics." Chris Dewdney

Specifications

The JABBER engine begins with a screen full of floating letters. When a letter comes into contact with another letter, a calculation occurs to determine whether they bond according to the likelihood that they would appear contiguously in the English lexicon. Letters accumulate in this fashion until the screen is filled with floating nonsense words.

Poetry

Kurt Schwitter’s Ursonate sound poem is ostensibly nonsense, however it still sounds German. Even when trying to create nonsense, we unconsciously fall into the comfortable linguistic patterns of our mother tongue. When we hear language that we recognize as our own, the semantic is only one effect that signals familiarity. If common syllables of English are combined in a random fashion, the results are still recognizable as English.

The most prominent example of this phenomenon in English is Lewis Carroll’s "Jabberwocky", which also provides the most common model for the creation of neologisms:

‘Twas brillig and the slithy toves

Did gyre and gambol in the wabe;

All mimsy were the borogoves,

And the mome raths outgrabe.

Alice is bewildered by the poem, and turns to Humpty-Dumpty to help her interpret the hard words. She asks him what slithy means, and Humpty-Dumpty responds: "Well, ‘SLITHY’ means ‘lithe and slimy’. ‘Lithe’ is the same as ‘active’. You see it’s like a portmanteau–there are two meanings packed up into one word." The portmanteau word is a mixture of bits and pieces of multiple words that are put together in a manner that conforms to the rules of English word formation. The words are pronounceable and familiar, although their meaning must be guessed at through a whimsical etymology (a ’patymology, perhaps).

That Humpty-Dumpty must also define ‘lithe’, one of the constituent words of ‘slithy’, is significant in that it demonstrates that the effect of neologism is present whenever we encounter any word we don’t know. The words of our language have been naturalised so that we have a comfortable relation to a fixed set of letter combinations. Neologism disrupts this familiarity. JABBER codifies and automates the disruption of neologism so that each iteration of the generator provides a further estrangement from our language. The unfamiliar smuggled into our language in the guise of the familiar.

JABBER falls somewhere between the sound poetry of Schwitters and the nonsense poetry of Carroll: the words are assembled from bits of other words as in "Jabberwocky", but like the Ursonate the words are constructed with no inherent semantic value.

 

Chemistry

The analogy between the formation of words and the formation of chemical compounds has a long tradition in Western philosophy. In the Theatatus Socrates relates: "Methought that I too had a dream, and I heard in my dream that the primeval letters or elements out of which you and I and all other things are compounded, have no reason or explanation." Although the letter has no reason or explanation, laws exist to combine them to form compounds. The chemical analogy is embedded within the Greek language: "The Greek word for alphabet, stoicheia, also carries the meaning elements with all the cosmological associations of that term. For the Greeks, the letters had an atomistic and elemental character. The letters were indecomposable: there was no smaller, more significant, or more basic elements of the cosmic order. It was from these units that the material form of the universe, and the natural world, was constructed" (Drucker 111).

Lucretius makes a similar analogy, but to different ends: "At a key moment in De Rerum Natura Lucretius draws the analogy between atoms and letters. In Book One he explains: ‘basic bodies take a certain structure, / And have defined positions, and exchange / Their blows in certain ways. The same bodies, / With only a slight change in their structure, / Are capable of forming wood or fire. / Like letters in the words for these same things, / Ignes and lignum: with slight transpositions, / They can be nominated ‘flames’ or ‘beams’’ Atoms then are to bodies what letters are to words: heterogeneous, deviant, and combinatory" (McCaffery, Rasula 532).

JABBER falls somewhere between the determined world of Plato and the indeterminate world of Lucretius: the indivisible letters constantly enact Lucretius’ minimal swerve of the clinamen in their random meanderings, yet when they collide they follow the laws of combination like the compounded elements of Plato. JABBER uses letter-atoms to form the word-molecules of a looking-glass world whose language is English spoken through a fun-house telephone.

Artificial Intelligence

The 40 phonemes of English are represented by combinations of 26 letters, and these combinations are governed by transcriptions of spoken language. This over-simplifies the complex historical process of orthography, but it is the implicit assumption of an ahistorical investigation of orthography: "Serres argues that all laws for combining (foedera coniunctorum) only arise after the fact of combining (coniuncta foederum) so that, in effect, the detection of order is simply the hindsight of chaos: ‘The laws of nature come from conjugation; there is no nature but that of compounds. In the same way, there are the laws of putting together letter-atoms to produce a text. These laws, however, are only federation. The law repeats the fact itself: while things are in the process of being formed, the laws enunciate the federated’" (Bök 95 n.14).The laws for combining letters inhere as latent information in the words in the lexicon: information that can be retrieved.

Once the laws are discovered, new words can be synthesised. Serres provides an uncanny description of JABBER in operation: "The alphabetical proto-cloud is without law and the letters are scattered at random, always there as a set in space, as language; but as soon as a text or speech appears, the laws of good formulation, combination, and conjugation also appear" (qtd. Bök 84). The lexicon contains the laws of good combination, and JABBER learns and then applies those laws to the alphabetical proto-cloud to fill in the gaps between the words in the dictionary.

RACTER is one of the most famous Natural Language Generators, and the author of the only book entirely written by a computer, The Policeman’s Beard is Half-Constructed. JABBER operates similarly to RACTER: whereas RACTER combines words according to grammatical rules to produce sentences, JABBER combines letters according to lexical rules to produce words.

Mathematics

Algorithm

The mathematical foundation of JABBER rests on probability theory. In the simple case when two letters collide, whether they bond or not is determined by a probabilistic calculation. If the letter ‘a’ collides with the letter ‘s’, the probability that they bond to form

‘as’ is given by the conditional probability

P(‘as’) = P(s|a)

which is the probability that an ‘a’ would follow an ‘s’ in the lexicon (read "the probability of s given a"). When the two letters collide, a random number between 0 and 1 is generated, and if the number is less than or equal to the conditional probability for the two-letter combination, the letters bond.

When strings of length > 1 collide, the probability that they bond is conditioned on three letters. When two strings a =a1a2…an and b=b1b2…bm collide, the probability that they bond to form ab is

P(ab) = P(b1|an-1an)P(an|b1b2)

For example, if the strings ‘ab’ and ‘ju’ collide, the probability that they would form the string ‘abju’ is equal to the product of the probability that ‘j’ would follow ‘ab’, and the probability that ‘b’ would precede ‘ju’. So the probability over an entire string w =a1a2…an is given by

P(w) = P(a1)P(a2|a1) P(a2|a3a4)P(a3|a1a2) P(a3|a4a5)…= P(ai|ai-2ai-1)P(ai|ai+1ai+2)

The true normalized conditional probabilities were too small to be used in practice, so the probabilities are relativized against the greatest occurrence. For example, the normalized conditional probability for the string ‘as’ is given by

P(s|a) = as/(aa+ab+ac+ad+ae+af+ag+…+aw+ax+ay+az)

where as is the number of occurrences of the string ‘as’ in the lexicon, aa the number of ‘aa’, etc. The relativized probability for the string ‘as’ as used in JABBER is given by

P(s|a)/argmax(aa,ab,ac,ad,ae,af,ag,…,aw,ax,ay,az)

where argmax is the greatest value of aa,ab,…,az .

In order to determine when a string w is a complete word, when |w|>3, if a1a2a3 is a valid prefix in the lexicon, and an-2an-1an is a valid suffix in the lexicon, then the word is complete. For example, if the strings ‘spri’ and ‘ggle’ collide and bond to form ‘spriggle’, since ‘spr’ appears as a prefix in the lexicon (‘spray’, ‘sprint’, ‘spry’) and ‘gle’ appears as a suffix in the lexicon (‘eagle’, ‘haggle’, ‘wiggle’), ‘spriggle’ is a complete word.

If a string reaches length > 7 and it is not a complete word (does not have a valid prefix and suffix), the string separates into its constituent letters. This places a bound on how long words can get without becoming complete. If this bound was not in place, strings could accumulate to indefinite lengths before becoming complete words.

Compound words are created from two complete words, where each have
length < 6. So, if the complete words ‘erag’ and ‘isme’ collide, they can bond to form ‘eragisme’.

Analysis

The JABBER engine should be able to generate any words in the English lexicon as well as any properly formed portmanteau words, and should be incapable of generating any text with letter patterns that do not appear in the English lexicon. In order to analyse the performance of JABBER, we will use ‘Jabberwocky’ itself as our source text. In Through the Looking-Glass Alice first encounters ‘Jabberwocky’ in a mirror:

sevot yhtils eht dna ,gillirb sawT’

ebaw eht ni elbmig dna eryg diD

,sevogorob eht erew ysmim llA

.ebargtuo shtar emom eht dnA

At first unable to understand what she reads, she eventually discovers how the poem is to be read: "She puzzled over this for some time, but at last a bright thought struck her. ‘Why, it's a Looking-glass book, of course! And if I hold it up to a glass, the words will all go the right

way again.’" The nonsense of ‘Jabberwocky’ is well-formed, but intuitively the inverse ‘Ykcowrebbaj’ would seem likely to contain letter-patterns not found in English.

For a source text S, with words w1,w2,...,wm where wj = aj1aj2aj3...ajn the probability of the words in S being produced by JABBER is given by:

P(S)= P(aji|aji-2aji-1)P(aji|aji+1aji+2)

where m is the number of words in S. Note that this gives the probability that the letters would bond, given that they have collided in the desired order. Let the first stanza of ‘Jabberwocky’ be S1 and the first stanza of ‘Ykcowrebbaj’ be S2, then the probabilities calculated using the conditional probability tables in the JABBER engine are

P(S1) = 8.5921e-060

P(S2) = 0

For S1, the probability of the text being produced works out to be quite small, but it is still possible. Although it would seem that the extremely small figure would imply that the first stanza of ‘Jabberwocky’ is highly unlikely to appear, note that the calculation is the probability that given that the necessary collisions occur, that each word in the stanza would be generated during the same iteration of the engine, which is highly unlikely. It would be a similar outcome to running the engine and finding ‘Jabberwocky’ floating around on-screen. What is important is that every word in S1 can be generated.

In the second case, the probability turned out to be 0. When examining the necessary calculations, several letter-patterns in ‘Ykcowrebbaj’ never appear in the English lexicon: ‘yht’, ‘eht’, ‘lbm’, ‘rgt’, all of which are malformed according to standard English letter-patterns. The probability for each of these is 0, so when they are multiplied with the rest of the probabilities, the probability for the entirety of S2 is 0.

Works Cited

Bök, Christian. ‘Pataphysics: The Poetics of an Imaginary Science. North York: York University, 1997.

Carroll, Lewis. Through the Looking Glass. www.gutenberg.org

Drucker, Johanna. The Alphabetic Labyrinth: The Letters in History and the Imagination. London: Thames & Hudson, 1995.

McCaffery, Steve and Jed Rasula, eds. Imagining Language: An Anthology. Cambridge: The MIT Press, 1998.

Plato. Theatatus. www.gutenberg.org



OL3: open letter on lines online | UbuWeb