Data Science vs Data Engineering
The job of the Data Scientist is actually a fairly new trend, and yet other job titles are coming to us. “Is this really necessary?”, Some will ask. But the answer is clear: yes!
There are situations, every Data Scientist know: a recruiter calls, speaks about a great new challenge for a Data Scientist as you obviously claim on your LinkedIn profile, but in the discussion of the vacancy it quickly becomes clear that you have almost none of the required skills. This mismatch is mainly due to the fact that under the job of the Data Scientist all possible activity profiles, method and tool knowledge are summarized, which a single person can hardly learn in his life. Many open jobs, which are to be called under the name Data Science, describe rather the professional image of the Data Engineer.
Read this article in German:
“Data Science vs Data Engineering – Wo liegen die Unterschiede?“
What is a Data Engineer?
Data engineering is primarily about collecting or generating data, storing, historicalizing, processing, adapting and submitting data to subsequent instances. A Data Engineer, often also named as Big Data Engineer or Big Data Architect, models scalable database and data flow architectures, develops and improves the IT infrastructure on the hardware and software side, deals with topics such as IT Security , Data Security and Data Protection. A Data Engineer is, as required, a partial administrator of the IT systems and also a software developer, since he or she extends the software landscape with his own components. In addition to the tasks in the field of ETL / Data Warehousing, he also carries out analyzes, for example, to investigate data quality or user access. A Data Engineer mainly works with databases and data warehousing tools.
A Data Engineer is talented as an educated engineer or computer scientist and rather far away from the actual core business of the company. The Data Engineer’s career stages are usually something like:
- (Big) Data Architect
- BI Architect
- Senior Data Engineer
- Data Engineer
What makes a Data Scientist?
Although there may be many intersections with the Data Engineer’s field of activity, the Data Scientist can be distinguished by using his working time as much as possible to analyze the available data in an exploratory and targeted manner, to visualize the analysis results and to convert them into a red thread (storytelling). Unlike the Data Engineer, a data scientist rarely sees into a data center, because he picks up data via interfaces provided by the Data Engineer or provides by other resources.
A Data Scientist deals with mathematical models, works mainly with statistical procedures, and applies them to the data to generate knowledge. Common methods of Data Mining, Machine Learning and Predictive Modeling should be known to a Data Scientist. Data Scientists basically work close to the department and need appropriate expertise. Data Scientists use proprietary tools (e.g. Tools by IBM, SAS or Qlik) and program their own analyzes, for example, in Scala, Java, Python, Julia, or R. Using such programming languages and data science libraries (e.g. Mahout, MLlib, Scikit-Learn or TensorFlow) is often considered as advanced data science.
Data Scientists can have diverse academic backgrounds, some are computer scientists or engineers for electrical engineering, others are physicists or mathematicians, not a few have economical backgrounds. Common career levels could be:
- Chief Data Scientist
- Senior Data Scientist
- Data Scientist
- Data Analyst oder Junior Data Scientist
Data Scientist vs Data Analyst
I am often asked what the difference between a Data Scientist and a Data Analyst would be, or whether there would be a distinction criterion at all:
In my experience, the term Data Scientist stands for the new challenges for the classical concept of Data Analysts. A Data Analyst performs data analysis like a Data Scientist. More complex topics such as predictive analytics, machine learning or artificial intelligence are topics for a Data Scientist. In other words, a Data Scientist is a Data Analyst++ (one step above the Data Analyst).
And how about being a Business Analyst?
Business Analysts can (but need not) be Data Analysts. In any case, they have a very strong relationship with the core business of the company. Business Analytics is about analyzing business models and business successes. The analysis of business success is usually carried out by IT, and many business analysts are starting a career as Data Analyst now. Dashboards, KPIs and SQL are the tools of a good business analyst, but there might be a lot business analysts, who are just analysing business models by reading the newspaper…
Basically, data scientists are the ones responsible for interpreting and making use of the Data Engineer’s reports in the data center because these scientists pick up data via interfaces provided by the Data Engineer or provide by other resources as you’ve mentioned. That sounds like they are an important part of the cogs to work properly. Additionally, being able to work not only with data but also with mathematical models to bring further light to those data is extremely jaw-dropping. I’ll try to read more about these data scientists as they sound so cool.
I like these differentiations very much! If I get you right, in a perfect world the data engineer takes also the responsibility of preparing (cleaning) the data and the data scientist is mostly occupied with the mining part.