Example 1: Chicken and Egg
July 11, 2017Imagine that you see both a chicken and an egg. How should you make sense of this situation? What is the most-probable explanation of what you are seeing?
Etcetera Abduction can do a couple of things for you in this problem. First and foremost, it can provide you with the most-probable set of assumptions that logically entail the observations -- the best explanation. Second, it can tell you what is the highest estimate of the joint probability of these two observations, given your knowledgebase.
To see this in action, let's write a simple input file.
To start with, we'll list the observables as literals in first order logic.
;; The observables (chicken C) (egg E)
Here we use uppercase "C" and "E" to signify that these arguments are constants, not variables. You can think of "C" here as representing one particular chicken that we're seeing, and "E" as represented one particular egg.
In Etcetera Abduction, every predicate in the knowledge base needs a prior probability. Since we've used the predicates "chicken" and "egg", we'll need to provide the prior probabilities associated with both. Here's the format:
;; Prior probabilities of the observables (if (etc0_chicken 0.001 x) (chicken x)) (if (etc0_egg 0.002 x) (egg x))
Here, these prior probabilities are expressed as axioms, or more specifically, as definite clauses. Definite clauses can be written this way, where there is one or more literals that serve as the "antecedents," and exactly one literal that serves as the "consequent." In these two axioms, literals for "chicken" and "egg" are provided as consequents. Importantly, the arguments to these two literals are both "x", which signifies a universally quantified variable. Basically, we're not encoding the specific prior probabilities of the chicken "C" or the egg "E", but rather the prior probabilities of observing any given chicken or egg.
The actual probabilities are included as constants (real numbered values) in the first argument position of the antecedent - the "etcetera literal" in these two axioms. Here, the prior probability of observing a chicken is given as 0.001 (p=0.1%), and an egg as 0.002 (p=0.2%). By convention, the predicates used in all etcetera literals in a knowledgebase begin with "etc", followed by an integer, an underscore "_", and then the predicate of the consequent. By further convention, all prior probabilities encoded in etcetera literals use the integer "0". With these conventions, we can readily guess that "etc0_chicken" and "etc0_egg" are encoding the prior probabilities of chickens and eggs.
If you'd like to use a different convention for naming your predications, go ahead. Required, however, is that the first argument in an etcetera literal is a real valued number between 0.0 and 1.0 -- the quantified probability. As well, you must include as arguments (in any order) all of the other universally quantified variables that appear anywhere else in the axiom. In these two cases, this is only the "x" variable, which appears in both consequents. If you leave one out by mistake, you can get some extremely hard-to-debug reasoning errors.
With just these axioms in place, we can already try to come up with some explanation for the observables. Let's evoke the program on our input file ("chicken-egg.lisp") and see what we get:
$ python -m etcabductionpy -i chicken-egg.lisp ((etc0_chicken 0.001 C) (etc0_egg 0.002 E))
The output is exactly one explanation, i.e., one set of assumptions that wholly entails the two observations. The explanation, however, is not very exciting: the best explanation given the knowledge base is to assume the etcetera literals that encode the prior probabilities, with the constants properly unified. And sure enough, if we assert these two literals as true and forward-chain through our knowledge base, we'll deduce the two observations exactly.
Our tool provides a nice way to visualize these proofs of the observables as ".dot" files, which can easily be converted to visual graphs using the "graphviz" utility. To output a .dot file for the most-probable solution, just provide the "-g" flag (or "--graph") on the command line.
$ python -m etcabductionpy -i chicken-egg.lisp -g digraph proof { graph [rankdir="TB"] n0 [label="etc0_chicken 0.001"]; n1 [label="etc0_egg 0.002"]; n2 [shape=box peripheries=2 label="(chicken C)"]; n3 [shape=box peripheries=2 label="(egg E)"]; n0 -> n2 n1 -> n3 {rank=same n2 n3} }
If you are on a mac, and you have the graphviz library installed, then you can pipe this output to a utility that opens it up as an .svg file in Safari, like this:
$ python -m etcabductionpy -i chicken-egg.lisp -g | ./util/dot2safari
In this visualization, the etcetera literals are the ovals, and the arrows point in to consequents. In these graphs, we've removed the arguments of the etcetera literals after the probabilities, because in real knowlegebases these lists can get pretty long, and that is not conducive to pretty-looking graphs. The important part is the numbers - these are the probabilities of the assumptions, both prior probabilities and conditional probabilities. In etcetera abduction, the highest estimate of the joint probability of the observables, given the knowledge base, is simply the product of these numbers. Here, we're assuming that the probability of etcetera literals are independent from one another, precisely because we cannot or choose not to model the conditions of the universe that they represent. However, we don't consider this a "naive" assumption, at least not in the naive-Bayes sense of the word, because if we knew more about joint and conditional probabilities - and encoded that knowledge in the knowledgebase - then Etcetera Abduction could find a better (higher) estimate. The more you know, the better your joint-probability estimates are going to be. Without this extra knowledge, the only estimate you can come up with is by assuming probabilistic independence of all the observations, which is exactly what is produced in the output above.
So now let's see what happens if we add a bit more knowledge to our knowledgebase. Let's add two more axioms. In the first, we'll provide some possible explanation for why we are seeing an egg: maybe it's from a hen! That happens around 10% of the time I see a hen (just a guess). That is, the conditional probability of seeing an egg, given a hen, is 0.1. We encode this conditional probability as follows:
;; Why egg? Maybe a hen (if (and (hen x) (etc1_egg 0.1 x y)) (egg y))
Again, we use lowercase letters to signify universally quantified variables in this definite clause. This time, the antecedent has two literals - one that assumes a hen, and another that encodes the conditional probability (p=0.1). By convention, we're using a unique larger-than-zero integer in the predicate of the etcetera literal, and still using the consequent predicate after the underscore. When I see this predicate "etc1_egg", I think of it as encoding the "first real explanation for why we might be seeing an egg", namely that the conditions of the universe are just right for a hen and an egg to co-occur. Also, this etcetera literal has two additional arguments after the numeric probability, namely the two universally quantified variables that appear elsewhere in the axiom ("x" for the hen, and "y" for the egg).
Next let's add some knowledge about chickens. About half of the time we have a chicken, it's a hen (or else it's a rooster). That is, the Pr(hen | chicken) = 0.5. BUT, this isn't the sort of probability we need just yet. Instead, we're interested in the conditional probability of a chicken given a hen. Well, I'm pretty sure that is 100% of the time (just a guess, again). We can encode this conditional certainty in the following manner:
;; Why chicken? Maybe it's a hen (if (and (hen x) (etc1_chicken 1.0 x)) (chicken x))
This axiom says that one of the ways we might observe a chicken is if, in fact, it is a hen. And 100% of the time, when we have a hen, it's a chicken.
But what about the other 50% of chickens that are roosters? We could make an axiom for roosters as well, if we like, but it's not going to help us in this problem. Still, this 50% number comes to play in another axiom that we absolutely must write, encoding the prior probability of hens. If the prior of a chicken was .1%, then the prior of a hen is going to be half that, or .05%. Here's the axiom for that:
;; The prior probabilities of assumed literals (if (etc0_hen 0.0005 x) (hen x))
With these three new axioms inserted into "chicken-egg.lisp", we're ready to run the interpretation again. This time, we end up with 5 solutions, and a new and different one is on top:
$ python -m etcabductionpy -i chicken-egg.lisp ((etc0_hen 0.0005 C) (etc1_chicken 1.0 C) (etc1_egg 0.1 C E)) ((etc0_chicken 0.001 C) (etc0_egg 0.002 E)) ((etc0_egg 0.002 E) (etc0_hen 0.0005 C) (etc1_chicken 1.0 C)) ((etc0_chicken 0.001 C) (etc0_hen 0.0005 $1) (etc1_egg 0.1 $1 E)) ((etc0_hen 0.0005 $1) (etc0_hen 0.0005 C) (etc1_chicken 1.0 C) (etc1_egg 0.1 $1 E)) 5 solutions.
Let's graph the most-probable solution to see what it looks like.
$ python -m etcabductionpy -i chicken-egg.lisp -g | ./util/dot2safari
Great! We found a better explanation for the chicken and the egg: the chicken must actually be a hen, and this hen is responsible for the egg. The joint probability of the observations is estimated to be p=0.00005 (0.0005 * 0.1 * 1.0), which is higher than our previous solution of considering only the priors, p=0.00002 (0.001 * 0.002). We've found an explanation that beats the priors, and a new estimate of the joint probability of seeing both a chicken and an egg.
The old best solution, the one with only the prior probabilities, is still in this list of solutions above, but now it is at the 2nd position. There are also three solutions listed in positions 3 through 5 that are actually worse than assuming the priors. The last two of these include a constant "$1", which we haven't seen before. These are called "skolem constants", and are used when a reasoning system needs to refer to specific entity in the domain that hasn't been given a proper name, like "C" or "E". These show up in Etcetera Abduction when a variable appears in an antecedent but not in the consequent. In this specific case, the system is assuming that some other unknown hen, but not the chicken "C", is responsible for the egg. A second chicken certainly would explain the observables, but two chickens are a lot less probable than one.