Example 10: Coin Flips, Confidence, and Classifiers

February 25, 2022

As we learned previously, etcetera abduction lacks the ability to reason about negation, which prohibits us from expressing "exclusive or" (XOR) relationships, e.g., the same person can be infected and healthy in the same interpretation.

However, there is a handy trick that you can use to approximate an XOR relationship using etcetera abduction, which we'll call the "XOR trick". This trick capitalizes on the fact that etcetera abduction uses exactly one axiom from the knowledge base to directly prove an observable in any given interpretation. If we craft our axioms just right, we can use the XOR trick to force etcetera abduction to pick exactly one from a set of options in a given interpretation.

Coin flips and snowstorms

Imagine you observe a coin being flipped:

(coinflip' E1 Coin1)

The probability of any such observation is its prior, and coin flips are very rare. But that is not what we're interested in. We'd like to interpret two possible worlds.

In World 1:

(coinflip' E1 Coin1) (heads' E2 Coin1)

In World 2:

(coinflip' E1 Coin1) (tails' E2 Coin1)

We have a bit of knowledge about this domain. Heads and tails each happen 50% of the time when there is a coin flip. Coin flips are pretty rare.

(if (etc0_coinflip 0.01 e coin)
    (coinflip' e coin))

(if (and (coinflip' e1 coin)
	 (etc1_heads' 0.5 e e1 coin))
    (heads' e coin))

(if (and (coinflip' e1 coin)
	 (etc1_tails' 0.5 e e1 coin))
    (tails' e coin))

These are two different interpretation problems, but they both have similar top interpretations. The most likely interpretation of World 1 is that the heads is explained by the coin flip, which is explained by its prior, and etcetera of 50%. Similarly for World 2.

We can force etcetera abduction to reason about both of these possibilities in a single interpretation problem by observing an XOR, where the only explanations for the XOR are the two options:

(coinflip' E1 Coin1) (xor_heads_tails' E2 Coin1)

(if (and (heads' e1 coin)
	 (etc1_xor_heads_tails 1.0 e e1 coin))
    (xor_heads_tails' e coin))

(if (and (tails' e1 coin)
    	 (etc2_xor_heads_tails 1.0 e e1 coin))
    (xor_heads_tails' e coin))

> python -m etcabductionpy -i coinflips.lisp
((etc0_coinflip 0.01 E1 Coin1) (etc1_heads' 0.5 $1 E1 Coin1) (etc1_xor_heads_tails 1.0 E2 $1 Coin1))
((etc0_coinflip 0.01 E1 Coin1) (etc1_tails' 0.5 $1 E1 Coin1) (etc2_xor_heads_tails 1.0 E2 $1 Coin1))
((etc0_coinflip 0.01 $2 Coin1) (etc0_coinflip 0.01 E1 Coin1) (etc1_heads' 0.5 $1 $2 Coin1) (etc1_xor_heads_tails 1.0 E2 $1 Coin1))
((etc0_coinflip 0.01 $1 Coin1) (etc0_coinflip 0.01 E1 Coin1) (etc1_tails' 0.5 $2 $1 Coin1) (etc2_xor_heads_tails 1.0 E2 $2 Coin1))
4 solutions.

Now when we interpret the coin flip and the XOR, the top 2 interpretations, equally likely, are that the heads was due to chance and the coin flip, and likewise for tails. The assumptions entail the coin flip, one of either heads or tails, and the XOR observation as well.

Similarly, we can force etcetera abduction to simultaneously interpret mutually exclusive observations that are not equally likely.

(xor_sunny_snowstorm' E1)

(if (etc0_sunny 0.6 e)
    (sunny' e))

(if (etc0_snowstorm 0.2 e)
    (snowstorm' e))
  
(if (and (sunny' e1)
	 (etc1_xor_sunny_snowstorm 1.0 e e1 coin))
    (xor_sunny_snowstorm' e))

(if (and (snowstorm' e1)
	 (etc2_xor_sunny_snowstorm 1.0 e e1 coin))
    (xor_sunny_snowstorm' e))

Notice the 1.0 probabilities in these axioms assigned to the etcetera literals. By our previous definitions, this is the conditional probability that it is one of either sunny or a snowstorm, given that it is sunny (and implicitly, sunny implies not snowstorm). Not super informative, but logically correct.

What is the best interpretation of this single observation? It will be dependent on the prior probabilities of sunny and snowstorm. Whichever one is more likely will be the top interpretation. The top interpretation of a single xor will always be the option with the largest prior probability. If more is known, an option with a lesser probability might be in the top interpretation.

> python -m etcabductionpy -i sunny-snowstorm.lisp
((etc0_sunny 0.6 $1) (etc1_xor_sunny_snowstorm 1.0 E1 $1 $2))
((etc0_snowstorm 0.2 $1) (etc2_xor_sunny_snowstorm 1.0 E1 $1 $2))
2 solutions.

A flash of light, maybe

What if we're uncertain about some observation? For example, you might have seen a flash of light in the dark, but maybe you imagined it. These are actually two different interpretation problems.

In World 1:

(dark' E1) (flash_of_light' E2 L)

In World 2:

(dark' E1)

You have some knowledge of this domain. You know that darkness sometimes happens. You konw a bit about flashes of light and how they might be due to fireflies. Plus you know that fireflies come out at night, but are otherwise very rare.

(if (etc0_dark 0.3 e)
    (dark' e))

(if (etc0_flash_of_light 0.01 e l)
    (flash_of_light' e l))

(if (and (firefly' e1 f)
	 (etc1_flash_of_light 0.8 e e1 l f))
    (flash_of_light' e l))

(if (etc0_firefly 0.01 e f)
    (firefly' e f))

(if (and (dark' e1)
	 (etc1_firefly 0.7 e e1 f))
    (firefly' e f))

In World 1, the flash of light might be a firefly (.8), because fireflies often come out in the dark (.7), and its occasionally dark (.3). In World 2, the best interpretation is that it is occasionally dark (.3), and so the probability of this interpretation is going to beat the best interpretation of World 1.

Can we force etcetera abduction to consider both worlds simultaneously? Yes, by using the XOR trick.

(dark' E1) (xor_flash_of_light' E2 L)

(if (and (flash_of_light' e1 l)
	 (etc1_xor_flash_of_light 1.0 e e1 l))
    (xor_flash_of_light' e l))

(if (etc2_xor_flash_of_light 1.0 e l)
    (xor_flash_of_light' e l))

> python -m etcabductionpy -i flashoflight.lisp
((etc0_dark 0.3 E1) (etc2_xor_flash_of_light 1.0 E2 L))
((etc0_dark 0.3 E1) (etc1_firefly 0.7 $3 E1 $1) (etc1_flash_of_light 0.8 $2 $3 L $1) (etc1_xor_flash_of_light 1.0 E2 $2 L))
((etc0_dark 0.3 $4) (etc0_dark 0.3 E1) (etc1_firefly 0.7 $3 $4 $1) (etc1_flash_of_light 0.8 $2 $3 L $1) (etc1_xor_flash_of_light 1.0 E2 $2 L))
((etc0_dark 0.3 E1) (etc0_flash_of_light 0.01 $1 L) (etc1_xor_flash_of_light 1.0 E2 $1 L))
((etc0_dark 0.3 E1) (etc0_firefly 0.01 $3 $1) (etc1_flash_of_light 0.8 $2 $3 L $1) (etc1_xor_flash_of_light 1.0 E2 $2 L))
5 solutions.

The best explanation of the darkness and the xor is going to be the prior probability of dark (.3).

But what if you are really, really, really confident that you saw a flash of light? Maybe not 100% confident, but at least 99% confident? We would like some way of factoring in this confidence into the interpretation problem.

We need some way of tipping the scales of our interpretations in favor of World 1, but still allow for that 1% chance that we are wrong. We want some way of saying that the likelihood of World 1 is 99%, and World 2 is 1%, as follows:

In World 1:

(dark' E1) (flash_of_light' E2 L) (likelihood' E3 0.99)

In World 2:

(dark' E1) (likelihood' E3 0.01)

This looks promising, but we need some way of interpreting the likelihood observation with a prior probability that is exactly the same magnitude as the value in the likelihood literal. Fortunately we can do this with a simple variable substitution, as follows:

(if (etc0_likelihood pr e)
    (likelihood' e pr))

But how do we construct the XOR version, to force etcetera abduction to reason about both possible worlds as a single interpretation problem? We need the specific values representing our confidence to appear in antecedents of our axioms, but we don't necessarily want to hard-code these numbers into our knowledge base axioms, directly. One solution is include the confidence values in the XOR observation itself:

(dark' E1) (xor_flash_of_light' E2 L 0.99 0.01)

Then these confidence values can be substituted in the right places in each of the explanations for the XOR:

(if (and (flash_of_light' e1 l)
	 (likelihood' e2 pr1)
	 (etc1_xor_flash_of_light 1.0 e e1 e2 l pr1 pr2))
    (xor_flash_of_light' e l pr1 pr2))

(if (and (likelihood' e1 pr2)
	 (etc2_xor_flash_of_light 1.0 e e1 l pr1 pr2))
    (xor_flash_of_light' e l pr1 pr2))

> python -m etcabductionpy -i flashoflight2.lisp
((etc0_dark 0.3 E1) (etc0_likelihood 0.99 $4) (etc1_firefly 0.7 $3 E1 $2) (etc1_flash_of_light 0.8 $1 $3 L $2) (etc1_xor_flash_of_light 1.0 E2 $1 $4 L 0.99 0.01))
((etc0_dark 0.3 $3) (etc0_dark 0.3 E1) (etc0_likelihood 0.99 $5) (etc1_firefly 0.7 $2 $3 $4) (etc1_flash_of_light 0.8 $1 $2 L $4) (etc1_xor_flash_of_light 1.0 E2 $1 $5 L 0.99 0.01))
((etc0_dark 0.3 E1) (etc0_likelihood 0.01 $1) (etc2_xor_flash_of_light 1.0 E2 $1 L 0.99 0.01))
((etc0_dark 0.3 E1) (etc0_flash_of_light 0.01 $1 L) (etc0_likelihood 0.99 $2) (etc1_xor_flash_of_light 1.0 E2 $1 $2 L 0.99 0.01))
((etc0_dark 0.3 E1) (etc0_firefly 0.01 $2 $3) (etc0_likelihood 0.99 $4) (etc1_flash_of_light 0.8 $1 $2 L $3) (etc1_xor_flash_of_light 1.0 E2 $1 $4 L 0.99 0.01))
5 solutions.

Now this single interpretation problem factors in the confidence in the two worlds, and the top interpretation is that there was a flash of light, because it has a high likelihood and because of a firefly, because fireflies come out in the dark, because dark happens from time to time.

There is one simplification we can make in XOR trick axioms, in order to clean up our formulation and avoid unnecessary backchaining. Since there is only one possible explanation for likelihood literals, and that probability is already specified, we can remove it as an antecedent and factor its uncertainty into the etcetera literals directly, swapping 1.0 for the likelihood value (1.0 * likelihood = likelihood). Our two XOR axioms are thus simplified as follows:

(if (and (flash_of_light' e1 l)
	 (etc1_xor_flash_of_light pr1 e e1 l pr2))
    (xor_flash_of_light' e l pr1 pr2))

(if (etc2_xor_flash_of_light pr2 e l pr1)
    (xor_flash_of_light' e l pr1 pr2))

Classifiers

With these preliminaries out of the way, we can now use the XOR trick to let us use etcetera abduction to intrepret the output of classifiers. A classifier assigns to an input a label from a set of mutually exclusive options. These assignments can be encoded as observations, as follows:

(assignment' E1 Classifier1 Input1 Label2)
(assignment' E2 Classifier1 Input2 Label4)
(assignment' E2 Classifier1 Input3 Label1)

A well-calibrated classifier can alternatively provide a confidence distribution over the set of possible labels for a given input. For example, a four-class classifier might provide the following distributions:

(four_class_distribution' E4 Classifier1 Input1 0.2 0.6 0.1 0.1)
(four_class_distribution' E5 Classifier1 Input2 0.1 0.1 0.1 0.7)
(four_class_distribution' E6 Classifier1 Input3 0.4 0.3 0.2 0.1)

Etcetera abduction can be forced to simultaneously consider all four possibilities for each input using the XOR trick, i.e., by treating each classification as a mutually exclusive explanation with its own likelihood.

(if (and (assignment' e1 classifier input Label1)
	 (etc1_four_class_distribution pr1 e e1 classifier input pr2 pr3 pr4))
    (four_class_distribution' e classifier input pr1 pr2 pr3 pr4))

(if (and (assignment' e1 classifier input Label2)
	 (etc2_four_class_distribution pr2 e e1 classifier input pr1 pr3 pr4))
    (four_class_distribution' e classifier input pr1 pr2 pr3 pr4))

(if (and (assignment' e1 classifier input Label3)
	 (etc3_four_class_distribution pr3 e e1 classifier input pr1 pr2 pr4))
    (four_class_distribution' e classifier input pr1 pr2 pr3 pr4))

(if (and (assignment' e1 classifier input Label4)
	 (etc4_four_class_distribution pr4 e e1 classifier input pr1 pr2 pr3))
    (four_class_distribution' e classifier input pr1 pr2 pr3 pr4))

The XOR trick also works if only a portion of the confidence distribution is observed, e.g. only the likelihoods of the top few class labels.

(top_two_distribution' E7 Classifier1 Input1 Label2 0.6 Label1 0.2)
(top_two_distribution' E8 Classifier1 Input2 Label4 0.7 Label1 0.1)
(top_two_distribution' E9 Classifier1 Input3 Label1 0.4 Label2 0.3)

(if (and (assignment' e1 classifier input firstlabel)
	 (etc1_top_two_distribution pr1 e e1 classifier input firstlabel secondlabel pr2))
    (top_two_distribution' e classifier input firstlabel pr1 secondlabel pr2))

(if (and (assignment' e1 classifier input secondlabel)
	 (etc2_top_two_distribution pr2 e e1 classifier input firstlabel pr1 secondlabel))
    (top_two_distribution' e classifier input firstlabel pr1 secondlabel pr2))

The explanation for any given assignment depends on the classifier. For example, maybe Classifier1 has been trained to classify the suit of a card drawn from a deck of poker cards, based on its photograph.

(if (and (spade' e1 input)
	 (etc1_Classifier1_assignment 1.0 e e1 input))
    (assignment' e Classifier1 input Label1))

(if (and (club' e1 input)
	 (etc2_Classifier1_assignment 1.0 e e1 input))
    (assignment' e Classifier1 input Label2))

(if (and (heart' e1 input)
	 (etc3_Classifier1_assignment 1.0 e e1 input))
    (assignment' e Classifier1 input Label3))

(if (and (diamond' e1 input)
	 (et4_Classifier1_assignment 1.0 e e1 input))
    (assignment' e Classifier1 input Label4))

Above, all of the conditional probabilities are 1.0, e.g. the probability that Classifier1 assigns Label1 to an input given that the input is a spade is 100%. However, they need not be all 100%, and if the actual confusion matrix is known for a particular classifier, then these probabilities should be used instead. For example, there is some small probability that a club is misclassified as a spade.

(if (and (club' e1 input)
	 (etc_Classifier1_club_mislabeled_as_spade 0.024 e e1 input))
    (assignment' e Classifier1 input Label2))

If the input is drawn from a complete deck of cards, the prior probability of each suit is equal. If some of the cards are missing, then the priors may be different.

(if (etc0_spade 0.26 e input)
    (spade' e input))

(if (etc0_club 0.24 e input)
    (club' e input))

(if (etc0_heart 0.27 e input)
    (heart' e input))

(if (etc0_diamond 0.23 e input)
    (diamond' e input))

For a single four_class_distribution observation, the probability of any given interpretation would be the product of the prior of the suit, the conditional probability of the label given the suit, and the confidence of the classifier.

In the case where the priors are all equal and the classifier is never confused, then the top interpretation will always make the label assignment with the highest confidence. If there are more observations, to include additional information about the situational context, then a less likely classifier assignment may be seen in the top interpretation.

Indeed, the only reason we would want to go through all of this trouble is to provide a mechanism for promoting lower-confidence class labels, when we know more about the situational context than the output of a single classifier. If you don't have any other information, just multiply a few numbers and be done with it! The opportunity here is to use etcetera abduction to interpret classifier output in knowledge-rich contexts.