Mindset

That is a photo of a typical DS starter …

One person, many skills.

When I first started in Data Science I read enormous amount of blog posts that had to open me a secret path for becoming a successful data scientist.

A lot of choices. Credits to Upslush

Some advised to return to the college books. To rush through pages of linear algebra and calculus, probability and statistics. While in the background some screaming youtube blogger shouts that mathematics is not essential and you can better start practicing now! The same guy however tells that web scraping is an essential tool and you better master selenium. Your friend from computer science tells that without knowing algorithms, you are not a programmer and not knowing object oriented programming as Java/C++ makes you worse that his little bro who is a software engineer at Google. Your granny with huge industrial experince can command to learn Scala and starts with implementing a Spam Filter on Hadoop. From the news you hear that Data Scientists are worried about their workplaces and switching to Data engineering fields, where you should master not only Big Data tools but cloud architecture and DevOps.

Then people with strong opinion write that:

if you are not able to deploy your AI application you are nothing, if you are not able to write CNN from scratch you are super nothing.

Kagglers are motivating to dive into Kaggle, buy or rent GPU desktop and start tweaking hyper-parameters. Job postings require from you infinite skills.

Being just data analyst requires you to know language, culture of the country you are working in.

Did I mention communication, presentation skills that are just life essential?

So this is a starter. He wants to multitask and acquire all knowledge. To then, what? Be in a demand? Cooler?

Starter wants to be Aang, be born with all skills.

In reality starter is a baby Aang, he should still go and master his talents.

That process is slow, so many give up. But I see also another problem. That personally happened with me. For what I am studying? Who I want to be in the end. That should be clarified from the beginning. Below little tips but they are just to help; not to lead, persuade or just better read this.

So, will I….

  1. sit all night on a math equation and optimize it? Will I enjoy to see that changing lambda from 0.01 to 0.015 has made an 0.1 % improvement? Am I excited to read a new research paper and figure out new ways to tackle the problem of vanishing gradients? So if you are mathy, diligent, excited about new, want to be a part of a breakthrough in science, pick up a Machine learning or AI researcher path. Go to phd, go to a lab. However the same can be applied at home with GPU, Kaggle or just in a free time with research papers, nerdy colleagues.

2. Am I talky talky, social, a bit creative, with many interests like politics, sports, stocks, air pollution, etc. Am I enjoying documentation, writing, storytelling, visuals? And basic statistics skills (or read ‘Naked Statistics” book) are already in your skillset? Then data analyst is for you! You can omit hard programming, basics of pandas, SQL, bit of R is like the top tech skill that you will need. And Tableau or PowerBI will make you THE terminator.

I also think data analyst should have an empathy, social awareness, marketing skills in his pocket.

3. Data Engineer is a tank, strong programmer, database master. He knows big O notation, costs of having everything running on one thread. He would need to know one cloud platform, work with tools, be updated with latest technologies for ETL, data warehousing. He is behind scenes, but he delivers and manages the coming data and also participate in deployment. (Sorry do not know much about that Halks, but it just seems hard to be data engineer )

4. The Data Scientist is someone in the middle of all these. The person who want to master all above mentioned (maybe not so in depth). But the most crucial skill, he is able to identify what is the problem. Should I utilize all my army, computing power on iris dataset? Or just simple one layer neural network is enough? What are my confidence levels in which I should work in? Is this result enough, statistically significant, should I go deeper? What were the units of my variables? How much missing data is enough to still apply that algorithm. Why I am sure that results are not biased? Can I explain my model or it is just a black box? Are my stakeholders happy with having blackbox model? Or maybe random forests, even with lower accuracy, but with more explainability will be more useful as we will see in what directions we can drive our marketing campaigns. One of the most important is not to make type 3 error. Solving a wrong question.

Data Scientist should know how dirty is the data, how it smells. He should know the disadvantages of main algorithms and know when and where and what can work better. He should avoid biases, cognition tricks and work with pure unbiased head. That requires a lot of experience, good EDA skills, good vizualization skills, CRITICAL thinking, a bit of sociology, physiology, wide range of interests, networking, being up to date with news in tech and society. He should be able to persuade, to criticize himself or others but do everything to solve a given problem. Sometimes it requires multivariate LSTM, sometimes one SQL command.

And for Senior DS, you should know how to manage teams, be patient to the new learners, inspire them.

(The post requires continuation, and it will be after I will dive more in DS world, will update you:)

Links to check further: Cassie Kozyrkov https://mlconf.com/blog/interview-with-cassie-kozyrkov-chief-decision-scientist-at-google-by-reshama-shaikh-program-committee-member/

https://towardsdatascience.com/focus-on-decisions-not-outcomes-bf6e99cf5e4f

Cassie is my hero! and try to read books about decision making that she lists.