To drive the point home, lets straightway get started with the below hypothetical dataset of smoker data across three Indian cities:
First, let’s convert it to a contingency table:
City | non-smoker | smoker | total |
delhi | 6 | 5 | 11 |
kolkata | 3 | 6 | 9 |
mumbai | 7 | 7 | 14 |
total | 16 | 18 | 34 |
Now, Joint probability of delhi AND non-smoker = P(delhi ∩non-smoker) = 6/34= 0.18
Similarly, for all the other combinations joint probabilities can be calculated as:
City | non-smoker | smoker | total |
delhi | 0.18 | 0.15 | 0.32 |
kolkata | 0.09 | 0.18 | 0.26 |
mumbai | 0.21 | 0.21 | 0.41 |
total | 0.47 | 0.53 | 1.0 |
Marginal probabilities are the probabilities lies in the margin of the above table. and the meaning is , the marginal probability of person randomly selected will be from delhi is 0.32 .
Conditional probability that a randomly selected non-smoker person is from delhi =
P(delhi | non-smoker) = 6/16=0.38
Similarly, for the other combinations the conditional probabilities could be calculated as:
index | City | non-smoker | smoker |
0 | delhi | 0.38 | 0.28 |
1 | kolkata | 0.19 | 0.34 |
2 | mumbai | 0.45 | 0.4 |
The concept of conditional /marginal / joint probability is important to test dependency of the variables, how? lets keep it for some other day.
Hi, I am Shibashis, a blogger by passion and an engineer by profession. I have written most of the articles for mechGuru.com. For more than a decades i am closely associated with the engineering design/manufacturing simulation technologies. I am a self taught code hobbyist, presently in love with Python (Open CV / ML / Data Science /AWS -3000+ lines, 400+ hrs. )