home

Example 3: Lemon or Lime

August 1, 2018

While grocery stores typically stock green limes in their fresh produce section, a lime will actually turn yellow when fully ripe. When fully ripe, they look very similar to lemons, which turn yellow just as they begin to ripen. People who have both a lemon tree and a lime tree in their backyard need to learn other ways to tell them apart, else they run the risk of stirring up a lemon margarita. The natural approach is to be an empiricist, and learn to classify based on the conditional probabilities that connect limes and lemons to their features.

Here is a table that summarizes the features of 1000 observations of lemons and limes.

Fruit Count Yellow Not yellow Round Not round
Lime 600 550 50 300 300
Lemon 400 350 50 30 370
Total 1000 900 100 330 670

Given a table of this sort, we should be able to make an educated guess about whether we've got a lemon or a lime in our hand, based on whether it or not it is yellow and if it is round. Let's use Etcetera Abduction to automatically make the classification for us.

First, we'll establish the prior probabilities of our two class labels, lemon and lime. These numbers are simply the number of fruit in our sample of each type, divided by the total number (1000), encoded as follows:

;; Prior probabilities of class labels
(if (etc0_lime 0.6 x) (lime x)) ;; 600 / 1000
(if (etc0_lemon 0.4 x) (lemon x)) ;; 400 / 1000

Second, we'll calculate all of the relevant conditional probabilities. For each of the four features, we calculate a conditional probability of the feature given a class label. This is simply the fraction of class instances that have the feature.

;; conditional probabilities of the features given the class
(if (and (lime x) (etc1_yellow 0.9166666667 x)) ;; 550 / 600
    (yellow x))
(if (and (lime x) (etc1_not_yellow 0.08333333333 x)) ;; 50 / 600
    (not_yellow x))
(if (and (lemon x) (etc2_yellow 0.875 x)) ;; 350 / 400
    (yellow x))
(if (and (lemon x) (etc2_not_yellow 0.125 x)) ;; 50 / 400
    (not_yellow x))
(if (and (lime x) (etc1_round 0.5 x)) ;; 300 / 600
    (round x))
(if (and (lime x) (etc1_not_round 0.5 x)) ;; 300 / 600
    (not_round x))
(if (and (lemon x) (etc2_round 0.075 x)) ;; 30 / 400
    (round x))
(if (and (lemon x) (etc2_not_round 0.925 x)) ;; 370 / 400
    (not_round x))
      

Third, we'll add the prior probabilities of the features. Yellow is a lot more common than not yellow, and not round is more common than round - at least according to the table above.

;; Prior probabilities of features

(if (etc0_yellow 0.9 x) (yellow x))
(if (etc0_not_yellow 0.1 x) (not_yellow x))
(if (etc0_round 0.33 x) (round x))
(if (etc0_not_round 0.67 x) (not_round x))
	

Finally, we'll need to specify the observables. Let's imagine that we are staring at four different fruit, each with their own unique combination of features, as follows:

;; Observables: Four different fruit

(yellow FRUIT1)
(not_round FRUIT1)

(yellow FRUIT2)
(round FRUIT2)

(not_yellow FRUIT3)
(not_round FRUIT3)

(not_yellow FRUIT4)
(round FRUIT4)

Now let's ask Etcetera Abduction to give us an interpretation, and then graph it out.

$ python -m etcabductionpy -i lemon-lime.lisp -g | ./util/dot2safari
proof n0 etc0_not_round 0.67 n8 (not_round FRUIT1) n0->n8 n1 etc0_not_round 0.67 n9 (not_round FRUIT3) n1->n9 n2 etc0_not_yellow 0.1 n10 (not_yellow FRUIT3) n2->n10 n3 etc0_not_yellow 0.1 n11 (not_yellow FRUIT4) n3->n11 n4 etc0_round 0.33 n12 (round FRUIT2) n4->n12 n5 etc0_round 0.33 n13 (round FRUIT4) n5->n13 n6 etc0_yellow 0.9 n14 (yellow FRUIT1) n6->n14 n7 etc0_yellow 0.9 n15 (yellow FRUIT2) n7->n15

Uh oh. That doesn't look good. This is not at all what we were hoping for.

According to this graph, the most probable interpretation does not assume that these fruits are lemons or limes. Instead, the most probable interpretation is that these features of roundness and yellowness sometimes happen in the world, and that its best just to accept this diversity without further explanation. That is, there is no particular reason to assume that these fruits belong to particular lime or lemon fruit categories. Or at least, we didn't express our desire to classify these fruits to the software.

But our axioms look correct. Surely one of the possible interpretations has the classifications that we are looking for. Let's take a look at the second-most probable interpretation, using the -s (--solution) flag.

$ python -m etcabductionpy -i lemon-lime.lisp -s 2 -g | ./util/dot2safari
proof n0 etc0_lime 0.6 n9 (lime FRUIT2) n0->n9 n1 etc0_not_round 0.67 n10 (not_round FRUIT1) n1->n10 n2 etc0_not_round 0.67 n11 (not_round FRUIT3) n2->n11 n3 etc0_not_yellow 0.1 n12 (not_yellow FRUIT3) n3->n12 n4 etc0_not_yellow 0.1 n13 (not_yellow FRUIT4) n4->n13 n5 etc0_round 0.33 n14 (round FRUIT4) n5->n14 n6 etc0_yellow 0.9 n15 (yellow FRUIT1) n6->n15 n7 etc1_round 0.5 n16 (round FRUIT2) n7->n16 n8 etc1_yellow 0.9166666667 n17 (yellow FRUIT2) n8->n17 n9->n16 n9->n17

Well, that is a little bit better. In the second-most probable interpretation, one of our fruits (FRUIT2) has been identified as a lime. That is a fine start, but what about the others? If you start graphing out all of the next-most-probable solutions, you'll find that it takes quite a while to find exactly the interpretation that you are looking for. You'll find some solutions where one fruit is classified, others where that same fruit has a different class altogether, others where there are two fruits that are classified, and still others where multiple classes are assigned to the same fruit!

So this really gets at the heart of the matter: interpretation is not the same thing as classification. In classification problems like this, we want exactly one class assigned to each instance - no more, no less. Any solution that doesn't meet this criteria is inadmissible.

At this point, you might be inclined to try to knowledge-engineer your way out of this problem. Can't we just add a few axioms to our knowledge base to constrain the set of possible solutions to only those that are admissible? Good idea, at least in principle. What we wish we could express is something like the following:

;; Anything with these features must be a lemon or a lime
(if (or (round x) (not_rount x) (yellow x) (not_yellow x))
    (xor (lemon x)
         (lime x)))

Unfortunately, this isn't going to work. Why? This axiom is not a definite clause in first-order logic, and that is the only kind of axiom that etcabductionpy can handle. The disjunction in the antecedent (the 'or' part) requires some ability to express negation, which isn't supported. An equally hard part is in the consequent, where we want to express an exclusive-or relationship (the 'xor' part): either the x is a lime or a lemon, but not both. This too requires disjunction and negation, neither of which is supported by the current software.

There are other tricks you might try, but none of them are great. You could try removing all of the prior probabilities of the features - leaving only solutions where the features are explained by class labels. This sort-of works, except you are still left with solutions where the fruit are assigned two classes.

What we really want is some way to filter out the bad solutions, leaving only solutions where each fruit is assigned exactly one class label. Unfortunately, the software doesn't provide you with an easy way to do this filtering. However, we could write a short program to do this filtering for us -- we just need to figure out how to retain only admissible solutions.

For this particular problem, there is a very hackish way we can do this with a one-line shell script. Using the awk command, we can filter out all solutions that don't have exactly four occurences of etc0. By convention, etc0 is used to explain things by their prior probability - used both for class labels and features. As a consequence of our particular knowledge base, solutions containing exactly four of these prior probabilities will be exactly those where these priors are each assigned to class labels for the four fruit, and not to any of the features. Here's a bit of code that finds all solutions, assigns a consequtive number to each solution, and then filters out all of the ones that do not have exactly four prior probabilities.

$ python -m etcabductionpy -i lemon-lime.lisp -a | cat -n | awk 'gsub(/etc0/,"etc0") == 4'
	
   251	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT3) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT4) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT2) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_yellow 0.875 FRUIT1))
   392	((etc0_lemon 0.4 FRUIT3) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT1) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT2) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT1) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3))
   853	((etc0_lemon 0.4 FRUIT1) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT3) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT2) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT1) (etc2_yellow 0.875 FRUIT1))
  1067	((etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT3) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT1) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT2) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT1) (etc1_yellow 0.9166666667 FRUIT2))
  2591	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT3) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT2) (etc1_round 0.5 FRUIT2) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT1))
  2776	((etc0_lemon 0.4 FRUIT3) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT2) (etc1_not_round 0.5 FRUIT1) (etc1_round 0.5 FRUIT2) (etc1_yellow 0.9166666667 FRUIT1) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT4))
  3135	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT3) (etc0_lime 0.6 FRUIT4) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT4) (etc2_not_round 0.925 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_round 0.075 FRUIT2) (etc2_yellow 0.875 FRUIT1) (etc2_yellow 0.875 FRUIT2))
  3361	((etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT3) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT1) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_round 0.075 FRUIT2) (etc2_yellow 0.875 FRUIT2))
  3362	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT3) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_round 0.5 FRUIT2) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_round 0.925 FRUIT1) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT1))
  3600	((etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT2) (etc0_lime 0.6 FRUIT3) (etc1_not_round 0.5 FRUIT1) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_round 0.5 FRUIT2) (etc1_yellow 0.9166666667 FRUIT1) (etc1_yellow 0.9166666667 FRUIT2) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT4))
  4032	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT2) (etc0_lime 0.6 FRUIT3) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT4) (etc2_not_round 0.925 FRUIT1) (etc2_round 0.075 FRUIT2) (etc2_yellow 0.875 FRUIT1) (etc2_yellow 0.875 FRUIT2))
  4265	((etc0_lemon 0.4 FRUIT2) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT3) (etc0_lime 0.6 FRUIT4) (etc1_not_round 0.5 FRUIT1) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT4) (etc1_round 0.5 FRUIT4) (etc1_yellow 0.9166666667 FRUIT1) (etc2_round 0.075 FRUIT2) (etc2_yellow 0.875 FRUIT2))
  5601	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT3) (etc0_lemon 0.4 FRUIT4) (etc2_not_round 0.925 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT2) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT1) (etc2_yellow 0.875 FRUIT2))
  5715	((etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT3) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT1) (etc1_not_round 0.5 FRUIT1) (etc1_yellow 0.9166666667 FRUIT1) (etc2_not_round 0.925 FRUIT3) (etc2_not_yellow 0.125 FRUIT3) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT2) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT2))
  5972	((etc0_lemon 0.4 FRUIT1) (etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT3) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc2_not_round 0.925 FRUIT1) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT2) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT1) (etc2_yellow 0.875 FRUIT2))
  6056	((etc0_lemon 0.4 FRUIT2) (etc0_lemon 0.4 FRUIT4) (etc0_lime 0.6 FRUIT1) (etc0_lime 0.6 FRUIT3) (etc1_not_round 0.5 FRUIT1) (etc1_not_round 0.5 FRUIT3) (etc1_not_yellow 0.08333333333 FRUIT3) (etc1_yellow 0.9166666667 FRUIT1) (etc2_not_yellow 0.125 FRUIT4) (etc2_round 0.075 FRUIT2) (etc2_round 0.075 FRUIT4) (etc2_yellow 0.875 FRUIT2))

Good! We are left with exactly 16 admissible solutions, which is equal to 24, or two possible solutions for each of the four fruit. So every possible combinations of classifications is represented in the list of all interpretations.

Furthermore, we can see that it is solution number 251 that is the most-probable of those that are admissible. Let's see what that particular one looks like.

$ python -m etcabductionpy -i lemon-lime.lisp -a -s 251 -g | ./util/dot2safari
proof n0 etc0_lemmon 0.4 n12 (lemon FRUIT1) n0->n12 n1 etc0_lemmon 0.4 n13 (lemon FRUIT3) n1->n13 n2 etc0_lime 0.6 n14 (lime FRUIT2) n2->n14 n3 etc0_lime 0.6 n15 (lime FRUIT4) n3->n15 n4 etc1_not_yellow 0.08333333333 n22 (not_yellow FRUIT4) n4->n22 n5 etc1_round 0.5 n20 (round FRUIT2) n5->n20 n6 etc1_round 0.5 n23 (round FRUIT4) n6->n23 n7 etc1_yellow 0.9166666667 n21 (yellow FRUIT2) n7->n21 n8 etc2_not_round 0.925 n16 (not_round FRUIT1) n8->n16 n9 etc2_not_round 0.925 n18 (not_round FRUIT3) n9->n18 n10 etc2_not_yellow 0.125 n19 (not_yellow FRUIT3) n10->n19 n11 etc2_yellow 0.875 n17 (yellow FRUIT1) n11->n17 n12->n16 n12->n17 n13->n18 n13->n19 n14->n20 n14->n21 n15->n22 n15->n23

Now that is much better! The final result? The best admissible interpretation of the observables is that we have two lemons (FRUIT1 and FRUIT3) and two limes (FRUIT2 and FRUIT4).

For those of you who have taken an introductory Artificial Intelligence course in your life, some of this may seem strangely familiar. Indeed, we have just used Etcetera Abduction to model a Naive Bayes Classifier. In general, a true Bayesian classifier tries to find the maximum joint probability of the observable features along with some number of possible class labels. In the Naive Bayes variant, the observation of any given feature given a class label is assumed to be probabilistically independent from that of any other feature. This is a wildly inappropriate assumption in many domains, even when talking about the yellowness and roundness of fruit. Still, Naive Bayes classification tends to perform quite well in practice for many problems, and is a standard baseline for comparison when considering fancier machine learning approaches.

The important thing to note is that we're not doing something analogous to Naive Bayes, but rather computing our answer in exactly the same way as it is done in every Naive Bayes implementation, at least mathematically speaking. Naive Bayes algorithms typically iterate through each of the possible class labels, taking the product of the prior probability of the class and all of the conditional probabilities of the observable features given the class. For each of the four fruit in the graphic above, this is exactly what is computed by multiplying the numbers in the three ovals above each pair of observations (three assumptions per fruit). The oval in the top center is the prior probability of the class (lemon or lime), and the ovals on either side are the conditional probabilities of the observable feature given the class. If you multiply these three numbers, you'll get exactly the same number computed in implementations of naive Bayes for a candidate class. If in addition we knew the true joint probability of the observables (the ignored denominator in Naive Bayes), we could divide our product by this joint probability to compute the (Naive) probability of the class. In classification tasks, this denominator is ignored because it is both unknown (typically) and constant across possible class labels. Accordingly, Naive Bayes implementations are just looking for the maximum product in order to make its class label prediction. Etcetera Abduction ranks all possible interpretations by probability, so the interpretation with the most likely class labels (for all four fruit at once!) shows up as the top result among the admissible solutions.

So, is Etcetera Abduction just a complicated way of doing Naive Bayes classifications? No, but you can certainly use Etcetera Abduction in this manner if you so choose. But you shouldn't. It would be widely inefficient to do so, as you would be exploring many, many, many solutions in the search space that are not actually admissible for your particular needs. If you are really only interested in classification, then you are much better off using a traditional classification algorithm.