What skills do you need to be a data scientist? This episode of the DATAcated podcast covers the top five skills required to become a successful data scientist. These skills will also help those who are already data scientists to become even better. Host Kate Strachnyi talks with self-taught data scientist Matt Dancho, Founder of Business Science University. Listen to learn about his data scientist program and how you can apply those skills.
Listen to this episode on Anchor FM
You will want to hear this episode if you are interested in...
- Becoming a data scientist in 6 months [04:19]
- Choosing a programming language [07:06]
- Integrated Development Environments [11:18]
- Data wrangling [19:14]
- Data visualization [21:38]
- Machine learning [25:07]
- Cluster, regression, and classification [28:19]
- The importance of time series [30:03]
- Making models usable [34:00]
- Streamlining the process [41:36]
Python vs. R
When beginning the journey to becoming a data scientist, the first thing to decide is which programming language to learn. Typically that choice is either Python or R. In his research, Matt discovered that there are significantly more job postings that require Python than require R. While that information might be interpreted as a reason to learn Python, it doesn’t consider the people applying for these positions. For every thirty-two Python applicants, there is one R. Learning R provides a significant competitive advantage in the job market.
Machine learning
The popular way to start in machine learning is to spend five years learning theory, algorithms, math, and statistics. The smarter way is to apply machine learning by taking a machine learning model and experimenting with data. That process teaches algorithms, how to use them, and a little about stats. This real-world experience helps with the understanding of the pros and cons of various algorithms.
The problem with simply learning theory is that it’s never applied. That can leave someone feeling that it’s necessary to master a subject entirely before trying to apply it. The reality is that mastering the subject will happen much more quickly during the process of applying the knowledge. Mistakes will be made, of course, but it’s through those hurdles that deeper learning happens.
Learning efficiently
Matt does not recommend upskilling in Python and R simultaneously because it would be too code-heavy. The second language can always be picked up later. Something he has seen a lot of students struggle with is confidence in Python after a year of studying it. While learning, they doubt themselves and if they could ever become data scientists. Because of Matt’s background with Excel, he became much more comfortable when he switched to R.
Later on, Matt picked up Python because he needed to write some software in Python. He learned it a lot more quickly because he understood R. Focusing on key tasks and not trying to learn everything at once helps simplify and expedite the process of becoming a data scientist. Success rates are greatly improved by having a streamlined process.
Resources & People Mentioned
- Business Science University
- R Shiny
- Dash Overview
- Affiliate link for $500 OFF R-Track System (Expires on Friday)
- Cheat Sheet Link
- Recent Article on Skills
Connect with Matt Dancho
Connect with DATAcated
- http://www.datacated.com/
- DATAcated on LinkedIn: https://www.linkedin.com/company/datacated1/
- Kate on LinkedIn: https://www.linkedin.com/in/kate-strachnyi-data/
- DATAcated on Twitter: https://twitter.com/datacated_
- DATAcated on YouTube: https://www.youtube.com/datacated
Subscribe to the DATACATED On Air podcast