While the exact definition of "Data Scientist" continues to elude us, the job requirements seem to heavily include machine learning skills. They also include a wide range of other skills, ranging from specific languages, frameworks, databases etc, to data cleaning, web scraping, visualizations, mathematical modeling and subject matter expertise. (This breakdown will be the subject of a future post, as I was having some trouble with my web scraper ;))
So for the typical "Data Scientist" role, many organizations want PhD level academic training plus an assortment of nuts and bolt programming or database skills. Most of these job requirements are like a rich and complex mix of "can't find the right candidate" (aka Unicorn). So, as an extension to the Data Science Venn Diagram V2.0, I thought it would be helpful to try to clarify and make some important distinctions regard Machine Learning skills.
Back in the 2002-2003 time frame, I spent a bunch of time trying to code my own Neural Networks. This was a very frustrating experience because bugs in these algorithms can be especially difficult to find and it took time away from what I really wanted to do, which is building applications using machine learning. So I decided back then to use well tested and fully debugged library algorithms over clunky home grown algorithms whenever possible. These days there are so many powerful and well tested ML libraries, why would anyone write one from scratch? The answer is, sometimes a new algorithm is needed.
First, some definitions will help clarify:
- ML Algorithm: A well defined, mathematically based tool for          learning from inputs.  Typically found in ML libraries.  Take          the example of sorting algorithms:  BubbleSort, HeapSort          InsertionSort, etc.  As a software developer, you do not want or          need to create a new type of sort.  You should know which works          best for your situation and use it.  The same applies to Machine          Learning:  Random Forests, Support Vector Machines, Logistic          Regression, Backprop Neural Networks etc, are all algorithms          which are well known, have certain strengths and limitations and          are available in many ML libraries and languages.  These are a          bit more complicated than sorting, so there is more skill          required to use them effectively.
 
- ML Solution: An application which uses one or more ML Algorithms to solve a business problem for an organization (business, government etc).
- ML Researcher/Scientist: PhD's are at the top of the heap. They have been trained to work on leading edge problems in Machine Learning or Robotics etc. These skills are hard won and are will suited for tackling problems with no known solution. When you have a new class of problems which require insight and new mathematics to solve, you need an ML Researcher. When they solve a problem a new ML Algorithm will likely emerge.
- ML Engineer:  Is a sharp software engineer with experience in          building ML Solutions (or solving Kaggle problems).  The ML Engineer's skills are different          from the ML Researcher.  There is less abstract mathematics and          more programming, database and business acumen involved.  An ML          Engineer analyzes the data available, the organizational          objectives and the ML Algorithms known to operate on this type          of problem and this type of data.  You can't just feed any data          into any ML Algorithm and expect a good result.  Specialized          skills are required in order to create high scoring ML          solutions.  These include: Data Analysis, Algorithm Selection,          Feature Engineering, Cross Validation, appropriate scoring and          trouble shooting the solution.
 
- Data Engineer:  A software engineer with platform and language          specific skills.  The Data Engineer is a vital part of the ML          Solution team.  This person or group does the heavy lifting when          it comes to building data driven systems. The are so many          languages, databases, scripting tools, operating systems each          with its own set of quirks, secret incantations and performance          gotchas.   A Data Engineer needs to know a broad set of tools and          be effective in getting the data extracted, scraped, cleaned,          joined, merged and sliced for input to the ML Solution.  Many of          the skills needed to manage Big Data, belong in the Data          Engineer category.
 
(Click Image to Enlarge)
 
    
 











