Common mistakes while working with data

Common mistakes while working with data:

  1. Cherry-picking: Selecting the results that fit your hypothesis and rejecting others
  2. Survivorship bias: Concluding an incomplete set of data, because that data has ‘survived’ some selection criteria
  3. Sampling bias: Trying to conclude the set of data that is not a part of the population under observation
  4. Ignoring Simpson paradox: A trend appearing in a subset but is the opposite of it when the whole data is considered
  5. Misunderstanding Correlation and causation (False causality: Falsely assuming when two events appear related that one must have caused the other)
  6. Asking the right questions:
  7. Consider ‘why’ instead of ‘what’ while doing analysis
  8. Not defining the problem to reach the goal (gathering the requirement and considering the constraints)
  9. Not cleaning and normalizing the data prior to analysis (also not including the outlier test)
  10. Focusing more on accuracy: Classic example of taking any skewed data and making your decision based on accuracy, which will in-turn be high.
  11. Inadequate domain or technical knowledge
  12. Using wrong visualization techniques

By: Rohit Benny Abraham