Example 6: Man Bites Dog
April 21, 2018Imagine that you open up your neighborhood newsletter and read the following headline:
Man Bites Dog!
You do a double-take. That can't be right. Surely, they meant to write "Dog bites man." Was this an editorial failure?
Your mental anguish over this headline stems from a conflict between syntax and semantics. Or put another way, the order of these three words is promoting an interpretation that is different than the one you would favor if these words were unordered. That is probably a good way to think about syntax, in general: our linguistic capacity for organizing words into syntactic structures based on their order serves to promote some interpretations of the words over others.
In the examples on this page, we are going to use Etcetera Abduction to interpret the meaning of natural language words, first by ignoring their order. We'll see that syntax is superfluous in some cases, and that it is entirely possible to construct structured representations of the meaning of sentences by ignoring syntax altogether - but sometimes we'll get it wrong, as in the simple example "Man Bites Dog."
To begin, we need some way to represent an unordered set of words as our "observables." One way to do this is to have each word be its own literal, each represented as a predicate with no arguments. Technically, would call this a "propositional logic" representation.
;; the observables (man) (bites) (dog)
The important thing to note here is that the input observations in our interpretation problem is the set of words in the sentences. In Etcetera Abduction, we are searching for the most probable set of assumptions that logically entail the observations. So we're looking for the most probable set of assumptions that logically entail these words. The interpretation of this sentence, then, is a set of assumptions. The "meaning" of the sentence explains why these words are observed.
For this particular trio of words, there are a variety of interpretations that account for their cooccurence. At least one of them, however, posits some "biting" event, where some man and some dog are both participants. To encode this meaning, we'll need first-order logic. Indeed the concepts of biting, man, and dog are different literals than the ones we have articulated as observations. The words can be represented as zero-argument propositions, but the concepts that explain them are going to be relational.
Here I provide the core concepts that explain the words - namely a man, a dog, and a biting event - each encoded using eventuality notation.
;; why word bites? maybe bite' (if (and (bite' e x y) (etc1_word_bites 0.1 e x y)) (bites)) ;; why bites'? why not! (if (etc0_bites 0.1 e x y) (bite' e x y)) ;; why word man? may be a man (if (and (man' e x) (etc1_word_man 0.1 e x)) (man)) ;; why man'? why not! (if (etc0_man 0.1 e x) (man' e x)) ;; why word dog? maybe a dog (if (and (dog' e x) (etc1_word_dog 0.1 e x)) (dog)) ;; why dog'? why not! (if (etc0_dog 0.1 e x) (dog' e x))
With these six axioms in place, we get the beginning of a structured representation of three propositional observations:
So far we have three new eventualities ($4, $2, and $6), one agent of biting ($5), one patient of biting ($3), one dog ($1), and one man ($7). What we would rather have is a more parsimonious interpretation, where the entity doing the biting is either the dog ($5 is equal to either $1 or $7), with the thing bitten being the other one. Facilitating these unifications will be the central goal of the axioms we write next, so that the best interpretation is a fully-connected graph.
For our first attempt, let's add some commonsense knowledge about dogs and men. One of the big fears of many dog owners is that their dog might bite someone. It's one of the things dogs occasionally do, and is the reason that people want to see dogs on short leashes. The following three axioms encode this bit of commonsense knowledge.
;; why a dog? maybe a dog biting a person (if (and (person' e1 a) (bite' e2 d a) (etc1_dog 0.2 e e1 e2 d a)) (dog' e d)) ;; why person? why not! (if (etc0_person 0.1 e x) (person' e x)) ;; why person? maybe a man (if (and (man' e1 x) (etc1_person 1.0 e e1 x)) (person' e x))
It is the first of these three axioms that encodes the core bit of commonsense knowledge: When a person gets bitten, its possible that there was a dog to blame. Or put another way, when you observe a dog, it might be that it might be engaged in some person-biting. The second axiom gives a prior for persons. The third axiom is somewhat interesting, though. It states that whenever you have a man, then that man is always a person (probability of 1.0). This axioms is encoding a simple bit of taxonomic knowledge. When forward-chaining on this axiom, we ascend the taxonomy: a man is a type of person. When backward-chaining on this axiom, we are descending this same taxonomy: if its a person, it might be a man.
With these three axioms in place, we have a new most-probable interpretation, where we envision some dog ($6) and some man ($4), where it is the dog that is biting the man.
Not bad. We successfully constructed a structured first-order representation that captures the meaning of a sentence of three (unordered) words.
Back in the 1970s and 1980s, there were a number of natural lanugage processing researchers who thought that syntax (derived from word order) was over-rated as a focus of research effort, and that natural language understanding was better approached as a commonsense reasoning problem. The example above provides at least one approach toward deep interpretation without using syntax. You might apply the same approach to understand stories that have no syntax whatsoever, like: "Driving. Night. Raining. Curve. Crash. Coma." Also, this approach seems useful in understanding nonstandard grammatical constructions, e.g., coercing a non-transitive verb into a transitive one, as in "The dog barked the cat up the tree."
Of course the main problem here is that we got the wrong interpretation! The headline was "Man Bites Dog!" Reading this headline does indeed conjure up some vision of some man and some dog, but it is the man that is doing the biting, not the dog. How do we get the right interpretation?
The second thing we can try is to simply provide more commonsense knowledge. Just like we know that dogs are prone to biting people, we know that people are prone to biting food. Indeed, people bite into food around three meals a day. If we can envision the dog as a type of food, then we can get the right interpretation. Here are three axioms that get the job done:
;; why man'? Maybe eating food (if (and (food' e1 y) (bite' e2 x y) (etc1_man 0.2 e e1 e2 x y)) (man' e x)) ;; why food? why not! (if (etc0_food 0.1 e x) (food' e x)) ;; why food? maybe a dog (if (and (dog' e1 x) (etc2_dog 0.9 e e1 x)) (food' e x))
Adding these axioms opens up some new interpretations of our three words. Still, the "correct" interpretation that we sought is only #2 in the list of most probable. We use the "--solution" (or "-s") flag when we want to graph solutions further down the list, as follows:
$ python -m etcabductionpy -i man-bites-dog-v1.lisp -g -s 2
It is not a totally outlandish idea; there are certainly some countries in the world where dogs are eaten as a type of food. Still, the likelihood that a given dog is food is probably not 90 percent, as we stated in our axiom, above. The whole approach seems somewhat wonky, though. There must be an easier way to have the man be the agent of the bite event.
Version 2: The sequence "man bites dog"
In natural lanuage, the order of the words provide you with a lot of information that points you toward the intended interpretation. It is from the order of our three words that we can infer that the man is the agent of the bite, and the dog is the thing that was bitten.
To get Etcetera Abduction to do the right thing, we need to make this ordering part of the observation. While our simple no-argument propositions were good enough to represent an unordered set of input words, specifying both a word and its order requires first-order logic. There are lots of representational options, but one simple way is to posit that there are 3 constants, W1 W2 and W3, which are the three words. Then, the labels for the different words and their sequential order can be predications on these constants, as follows:
;; "man bites dog" version 2 ;; The observables (man W1) (bites W2) (dog W3) (seq W1 W2 W3)
Just as we did in version 1, we'll need axioms that bridge the gap between the words and the entities that they refer to. This time, however, we can make explicit that these words refer to these imagined entities, using a "ref" relation.
;; why word man? maybe refering to a man (if (and (man' e x) (ref w x) (etc1_word_man 0.1 e w x)) (man w)) ;; why man'? why not! (if (etc0_man 0.1 e x) (man' e x)) ;; why word bites? Maybe refering to a biting eventuality (if (and (bite' e x y) (ref w e) (etc1_word_bites 0.1 e x y w)) (bites w)) ;; why bites'? why not! (if (etc0_bite 0.1 e x y) (bite' e x y)) ;; why dog? Maybe refering to a dog (if (and (dog' e d) (ref w d) (etc1_word_dog 0.1 e w d)) (dog w)) ;; why dog'? why not! (if (etc0_dog 0.1 e x) (dog' e x)) ;; why ref? Why not! (if (etc0_ref 0.1 w c) (ref w c))
If we were to ignore the sequence literal (seq W1 W2 W3), we could use these axioms alone to get an unconnected interpretation that imagines a man, a dog, and a biting eventuality, looking like this:
But we don't want to ignore the sequence literal - that is where all the important information lies. To exploit it, we need a bit of syntactic knowledge. Actually, we need two parts:
- A sequence of three words might be an expression involving a monotransitive verb, i.e., where an agent does something to a single patient, where the first word refers to the agent, the second to the verb, and the third to the patient, and
- The 3-arity eventuality literal "bite" directly maps to a monotransitive expression.
Here's how I would express these two bits of syntactic knowledge:
;; why seq? maybe monotransitive construction (if (and (monotransitive e x y) (ref s1 x) (ref s2 e) (ref s3 y) (etc1_seq 0.1 e x y s1 s2 s3)) (seq s1 s2 s3)) ;; why monotransitive? maybe bite' (if (and (bite' e x y) (etc1_monotransitive 0.1 e x y)) (monotransitive e x y))
Adding back in our (seq W1 W2 W3) literal as an input, the best interpretation now makes all of the right unifications, leading us to interpret this input as meaning a man is doing the biting, and the dog is the thing that is bitten.
Great! But that seemed like a lot of work for a pretty simple semantic role labeling problem. Wouldn't it be a lot easier just to apply the Stanford Parser to this text?
Yes it would! Indeed, if you apply the Stanford Parser to this three-word sequence, you'll get an output that is essentially equivalent to what we've done here using abduction, particularly when you look at the Universal depedencies section.
Stanford Parser Your query: man bites dog Tagging: man/NN bites/VBZ dog/NN Parse: (ROOT (S (NP (NN man)) (VP (VBZ bites) (NP (NN dog))))) Universal dependencies: nsubj(bites-2, man-1) root(ROOT-0, bites-2) dobj(bites-2, dog-3)
Both the Stanford Parser and our Version 2 are finding structured representations that explain the ordered sequence of words, using very different methods. Still, there are some analogies to be drawn between the two algorithmic approaches, particularly around how the search is conducted.
If you were really ambitious, you could probably write a pretty good syntactic parser using Etcetera Abduction. The advantage of doing so is that you could provide an integrated account of language understanding that included syntax, semantics, and pragmatics all under the umbrella of logical abduction. This idea was one of the great achievements of Prof. Jerry Hobbs and his colleagues at SRI back in the 1990s, immortalized in the following famous NLP paper:
- Hobbs, Jerry R., Mark Stickel, Douglas Appelt, and Paul Martin, 1993. Interpretation as Abduction, Artificial Intelligence, Vol. 63, Nos. 1-2, pp. 69-142. pdf
But since then, data-driven statistical parsers like Stanford's have gotten really good at the syntax part of the puzzle. So much so that even the few researchers who are really interested in the idea of "interpretation as abduction" will opt to use a high-performance statsitical parser at the front-end of the language interpretation pipeline. The current favorite among language-logicians is probably the Combinatorial Categorial Grammar parsers coming out of the University of Edinburgh, but people have also had some luck converting the Stanford Parser's Universal Dependenices in a usable logical form for use as input literals. Give it a try!