Common mistakes while working with data:
- Cherry-picking: Selecting the results that fit your hypothesis and rejecting others
- Survivorship bias: Concluding an incomplete set of data, because that data has ‘survived’ some selection criteria
- Sampling bias: Trying to conclude the set of data that is not a part of the population under observation
- Ignoring Simpson paradox: A trend appearing in a subset but is the opposite of it when the whole data is considered
- Misunderstanding Correlation and causation (False causality: Falsely assuming when two events appear related that one must have caused the other)
- Asking the right questions:
- Consider ‘why’ instead of ‘what’ while doing analysis
- Not defining the problem to reach the goal (gathering the requirement and considering the constraints)
- Not cleaning and normalizing the data prior to analysis (also not including the outlier test)
- Focusing more on accuracy: Classic example of taking any skewed data and making your decision based on accuracy, which will in-turn be high.
- Inadequate domain or technical knowledge
- Using wrong visualization techniques
By: Rohit Benny Abraham