Actually, I’d like to take the opportunity presented here in discussing “data science” in Dr. Cliff’s video to plug Dr. Tom Carpenter of Seattle Pacific University. He taught one of the online courses in Microsoft’s 10-course/project series in Machine Learning and Artificial Intelligence on edX.org (no longer available, though). However, he has a complete online course for free on YouTube, which looks very similar to the Microsoft course in content (the one video I looked at is identical to the same topic in the Microsoft course). Tom Carpenter’s Data Science Research Methods Course [Full Course] - YouTube
What he did in his course was great. He’s an excellent, funny, and reasonably entertaining lecturer. But he didn’t immerse the students in math and data. Instead, his mission was more, “Don’t be fooled by numbers! Think about the numbers and how they were obtained. Think about experimental design! Could the numbers be misleading you?!” And he gave lots of often humorous examples in the Microsoft course of how you could be lead astray by numbers you want to believe in!
For example, he discussed and illustrated how wrong you could be from just taking surveys from customers who walk in the door, say, at Walmart. You’re more likely to get responses from folks who are very happy or very angry, leaving out folks in the middle who are, meh, I’m okay but just don’t want to be bothered by a survey today.
Relative to Dr. Cliff’s A/B testing, he emphasized if you really want to do it right, you need two cohorts. Those who test A first, then B. And the reverse, those who test B first, then are offered A to test. As possibly testing A and B in a certain order influences the outcome. And he discusses why do that as opposed to having one group test A, another group test B, and then just having each group rate each product independently for its features.
The lecture that I found really made me respect his teaching ability is the one on false positives and false negatives (it probably helps to digest the earlier material on statistical power first). The last part of the lecture illustrating parts of an outcome tree that you’re on with false positives and false negatives was the most illuminating.
Data Science Research Methods | False Positives and False Negatives - YouTube
The bottom line is that Dr. Cliff’s experiment with only 4 participants probably doesn’t have enough statistical power (which is related to how the likelihood of being fooled by random variation decreases with increasing number of subjects) to avoid being fooled by a false positive, e.g., Dr. Cliff happened to pick subjects who from their makeup and past experience prefer the premium whereas if it he did the same experiment 20 times over, the average result would be different.
Carpenter’s message is be sure you have a good experiment design that fairly addresses the question you’re looking for an answer to and then be sure you have sufficient statistical power (subject numbers) that you can say it’s unlikely the answers (the responses you got) could be explained by random variation. An interesting thing that he teaches is the more black and white with no variation in outcome, say, two choices are, the less subject numbers you need to claim you found a difference but when the perceived difference between A and B are much grayer with overlapping response variation, you need larger numbers of subjects to claim a statistically valid result.
He also discusses at length correlation vs. causation and how one had better not fall into the trap of blindly turning a correlation into a causation - giving examples of how messed up you can get going that route.
Edit_Update: Watched the whole 1st video out of Carpenter’s YouTube course and a big Microsoft logo flashed on the screen at the end of the video - so the course lectures are straight out of a Microsoft course that once cost $100 on edX.org to take (there were course lab projects and quizzes to take for the edX version, though, too). Great course for teaching that one just doesn’t see numbers and jump to a conclusion. I see that I gave his edX course a plug two years ago, even back then citing the false positives, false negatives lecture: What a Joke - Aspirin