On May 6 at the end of the lecture I talked about how to misunderstand probability; Space aliens, coincidences (no notes or video for this). Granin story from Lecture notes 17. The takehome message: When you want to assess the statistical significance of an observation (say the alignment value between two strings), it is wrong to just compute the probability that the particular observation was produced by chance (under some random model - for example between two randomly generated strings with the same character frequencies as the two given strings), and then reject the null hypothesis (that the observation was produced by chance, by some random process) if the probability of the observed value is `low' (user defined). It is wrong to do that, but it is very commonly done. Instead, one has to compute the total probability of all the possible outcomes whose individual probabilities are less than or equal to the probability of the observation. That total quantity, called a p-value generally, is what must be low in order to reject the null hypothesis. Alternatively, one can compute the total probability of all the possible outcomes with values higher (if a higher value suggests more biological association) than or equal to the observed value. Here is an extreme example of this error: Suppose we flip a fair coin 100 times, and record its specific ordered sequence of heads and tails. The probability of that specific sequence is (1/2)**100, which is an extremely small number. But can we conclude from the fact that the probability is so small, that the sequence was not produced by a random process, i.e., it was produced by some non-random force or influence (space aliens or a strange distortion of local gravity)? We cannot, if we realize that *every* specific sequence of 100 flips has that same tiny probability, so that the sum of the probabilities of all the outcomes (specific ordered sequences of heads and tails) whose individual probability is equal to or smaller than the probability of the observed sequence is 1! That is, we are guaranteed in this experiment (coin flipping) to produce a specific observation that has an extremely low probability, so when such a low probability event is observed, we cannot use the fact that it has a low probability to conclude that some force of nature was involved, i.e., to reject the null hypothesis. Mistakes of this kind, in using probability, are common and the granin story (although a bit more involved) is one example. But it could be worse. Those mistakes at least take place in the context of experiments where one can enumerate all the possible outcomes and have a sensible random model, so a precise and meaningful null hypothesis can be stated. Then in principle, the correct kind of p-value calculations can be done, even if they are complex or not actually practical. A worse, but related mistake, is to say that an event has very low probability, and therefore conclude that it must be due to a non-random force, in a context of events where we don't know the set of possible events, or the probabilities of each event. In those cases, it is meaningless to use this hypothesis testing approach - it gets us nowhere. Question: What is the probability of a duck? What is the probability of life? What can it possibly mean to say that the probability of life is low?