What are my chances of being pregnant if the over-the-counter pregnancy test turns out to be positive? What are my chances of getting cancer if I smoke? Or what are my chances of having cancer if my mammogram is negative? Bayes rule can be used to answer....
Brush up on Conditional Probability
Conditional probability is when an event occurring, assuming that one or more other events have already occurred. If two events are independent of each other, then P(B|A) = P(B). On the other hand if event B is dependent on event A, then P(B|A) is as below.
P(B|A) = P(A intersect B)/P(A)
NOTE: P(A and B) is the same as P(A intersect B).
Example: Out of 1000 people, Democratic Male = 200; Democratic Female = 300; Republican Male = 300 and Republican Female = 200.
A = Being a democrat and B being a women
P(A and B) = 300/1000 = 0.3 = 30%
P(B|A) = P(A and B)/ P(A) = 0.3/0.5 = 0.6 = 60%
Bayes Rule
Update the probability of happening of an event given a new piece of evidence.
For example, in 2011, there were 98 pregnancies for every 1,000 women (9.8%) aged 15–44 in the United States. 88% of the pregnancies have been positively detected by the over-the-counter pregnancy tests, while 95% of negative responses of these tests have been identified as not pregnant. Given that a test is positive, what are my chances of being pregnant?
Terminology
Prior probability/ Base Rate: P(Preg=T) - Pregnant women= 9.8%
Posterior probability: P(Preg=T|Test = Pos) - Given a pregnancy test is positive, what is the probability of being pregnant?
Likelihood/ Sensitivity: P(Test = Pos|Preg=T) - Given a woman is pregnant, what is the probability of the test beign positive?
Evidence/Marginal Likelihood: P(Test=Pos) - total probability of observing the evidence (i.e.Probability of having a test positive)
Specificity: P(Test = Neg|Preg = F)- given a woman is not pregnant, what is the probability of the test being negative?
'Pr' = Pregnant 'not Pr' = not pregnant 'Pos' = Test is positive 'Neg' = Test is negative
P(Pos | Pr) = P(Pos and Pr)/ P(Pr)
P(Pr| Pos) = P(Pr and Pos)/ P(Pos)
But, P(Pos and Pr) = P(Pr and Pos)
Therefore, P(Pr|Pos) = P(Pos | Pr) * P(Pr) / P(Pos)
When the denominator (P(Pos)) is not available, we can calculate it by
P(Pos) = P( Pos |Pr) P(Pr) + P( Pos | not Pr) P(not Pr)
where P( Pos |not Pr) = 1 - P(Neg|not Pr)
# Function to calculate Bayes Rule in Python
def calcProbBayesRule(prob_prior, prob_sensitivity, prob_evidence = None, prob_specificity = None):
if prob_evidence is None:
if prob_specificity is None:
raise ValueError ('prob_specificity cannot be None when prob_evidence is None')
else:
prob_not_prior = 1 - prob_prior
prob_evidence = (prob_sensitivity * prob_prior) +
((1-prob_specificity) * prob_not_prior)
prob_posterior = prob_sensitivity * prob_prior/prob_evidence
else:
prob_posterior = prob_sensitivity * prob_prior/prob_evidence
return (str(round(prob_posterior * 100,2)) + '%')
Example 1: When the denominator is known
Cancer and Smoking : 5% of the population has cancer and 10% of the population are smokers. Also 20% of the people with cancer are smokers. Given that a person is a smoker, what is the probability that he/she will get cancer?
P(C) = 0.05 P(S) = 0.1
P(S|C) = 0.2
P(C|S) = 0.2 * 0.05/0.1 = 0.1 (10%)
# Calculation in Python
prob_cancer_given_smoking = calcProbBayesRule(0.05, 0.2, prob_evidence= 0.1)
print(prob_cancer_given_smoking)
'10.0%'
Example 2: When the denominator is unknown
Breast cancer and mammograms: 1% of women have breast cancer. 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it). 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).
P(C) = 0.01 ; P(not C) =0.99
P( Test= T|C ) = 0.8 ; P(Test=F|C) = 1 - P( Test= T|C ) = 0.2
P(Test = T| not C) = 0.096 ; P(Test=F | not C) = 0.904
a) For a woman whose mammogram return positive, what is the probability of getting breast cancer?
P(C|Test = T) = P(Test = T|C) * P(C)/ P(Test=T)
Since P(Test=T) is not given, it is derived by,
P(Test = T) = P(Test=T|C) P(C) + P(Test=T|not C) P(not C)
P(Test = T) = (0.8 0.01)+ (0.096 0.99) = 0.103
P(C|Test = T) = 0.8 * 0.01/0.103 = 0.0776 (7.76%)
# Calculation in Python
prob_cancer_given_test_positive = calcProbBayesRule(0.01, 0.8, prob_specificity= 0.904)
print(prob_cancer_given_test_positive)
'7.76%'
Therefore, for a woman whose mammogram return positive there is only 8% chance of having cancer.
b) For a women whose mammogram return negative, what is the probability of getting cancer?
P(C| Test=F) = P(Test = F|C) * P(C)/ P(Test = F)
P(Test=F) = 1 - P(test=T) = 0.9
P(C| Test=F) = 0.2 * 0.01/0.9 = 0.0022 (o.22%)
# Calculation in Python
prob_cancer_given_test_negative = calcProbBayesRule(0.01, 0.2, prob_evidence= 0.9)
print(prob_cancer_given_test_negative)
'0.22%'
Therefore, women whose mammogram return negative, there is only 0.22% probability of getting cancer.
Pregnancy and over-the-counter tests: Referring to the example mentioned above,
P(Pr) = 0.098 P(not Pr) = 0.902
P(Pos | Pr) = 0.88 (Sensitivity = 88%)
P(Neg | not Pr) = 0.95 (Specificity = 95%)
P(Pos | not Pr) = 1 - P(Neg | not Pr) = 0.05
P(Pos) = (0.098 0.88) + (0.05 0.902) = 0.13134
P(Pr | Pos) = P(Pos | Pr) P(Pr)/ P(Pos) = 0.88 0.098/0.13134 = 0.6566= 66%
# Calculation in Python
prob_preg_given_test_pos = calcProbBayesRule(0.098, 0.88,prob_specificity= 0.95 )
print(prob_preg_given_test_pos)
65.66%
Therefore, by the percentage of pregnancies in USA in 2011, if the given over-the-counter test turned out to be positive, there is still only 66% chance of being pregnant, whether you like it or not!!!