When starting to work with data, most people start with explainatory analysis instead of going through exploratory analysis. Psychology says that most people are focused on outcome rather than process, but in Data analytics its the process which is of utmost importance. Following are top three common bloopers:
- Lack of Curiosity : If you aren't curious enough to ask questions on your data, you can't see what you can find out to verify your assumptions with data. Hence, understanding your data thoroughly before moving on to more complicated analysis is essential. If stuck, try to look at similar projects with same domain and get understanding of business problem you're trying to solve.
- Data is Perfect to start with : Data is never clean as you would like it , always remember to check for errors and expect to find them. Example, person's age/body temperature cannot be negative. Also, don't compute summary measures without looking at distribution of data, this is most common mistake that aspiring data scientist's do.
- EDA is waste of time : Less concentration on data than on the algorithm, because you feel your algorithm will take care of accuracy at the last, but the quality of your final model will completely depend on the data your dealing with than the algorithm. Always understand your data with help of visualisations before making any informative decision.
These were some of the bloopers while working with data, while there are many mistakes to point out, but overcoming this top 3 pitfalls will help you gain better data and domain understanding.Gaurav Chavan