Posts

My Desk for Data Science

In my last post I anounced a blog parade about what a data scientist’s workplace might look like.

Here are some photos of my desk and my answers to the questions:

How many monitors do you use (or wish to have)?

I am mostly working at my desk in my office with a tower PC and three monitors.
I definitely need at least three monitors to work productively as a data scientist. Who does not know this: On the left monitor the data model is displayed, on the right monitor the data mapping and in the middle I do my work: programming the analysis scripts.

What hardware do you use? Apple? Dell? Lenovo? Others?

I am note an Apple guy. When I need to work mobile, I like to use ThinkPad notebooks. The ThinkPads are (in my experience) very robust and are therefore particularly good for mobile work. Besides, those notebooks look conservative and so I’m not sad if there comes a scratch on the notebook. However, I do not solve particularly challenging analysis tasks on a notebook, because I need my monitors for that.

Which OS do you use (or prefer)? MacOS, Linux, Windows? Virtual Machines?

As a data scientist, I have to be able to communicate well with my clients and they usually use Microsoft Windows as their operating system. I also use Windows as my main operating system. Of course, all our servers run on Linux Debian, but most of my tasks are done directly on Windows.
For some notebooks, I have set up a dual boot, because sometimes I need to start native Linux, for all other cases I work with virtual machines (Linux Ubuntu or Linux Mint).

What are your favorite databases, programming languages and tools?

I prefer the Microsoft SQL Server (T-SQL), C# and Python (pandas, numpy, scikit-learn). This is my world. But my customers are kings, therefore I am working with Postgre SQL, MongoDB, Neo4J, Tableau, Qlik Sense, Celonis and a lot more. I like to get used to new tools and technologies again and again. This is one of the benefits of being a data scientist.

Which data dou you analyze on your local hardware? Which in server clusters or clouds?

There have been few cases yet, where I analyzed really big data. In cases of analyzing big data we use horizontally scalable systems like Hadoop and Spark. But we also have customers analyzing middle-sized data (more than 10 TB but less than 100 TB) on one big server which is vertically scalable. Most of my customers just want to gather data to answer questions on not so big amounts of data. Everything less than 10TB we can do on a highend workstation.

If you use clouds, do you prefer Azure, AWS, Google oder others?

Microsoft Azure! I am used to tools provided by Microsoft and I think Azure is a well preconfigured cloud solution.

Where do you make your notes/memos/sketches. On paper or digital?

My calender is managed digital, because I just need to know everywhere what appointments I have. But my I prefer to wirte down my thoughts on paper and that´s why I have several paper-notebooks.

Now it is your turn: Join our Blog Parade!

So what does your workplace look like? Show your desk on your blog until 31/12/2017 and we will show a short introduction of your post here on the Data Science Blog!

 

Data Science vs Data Engineering

The job of the Data Scientist is actually a fairly new trend, and yet other job titles are coming to us. “Is this really necessary?”, Some will ask. But the answer is clear: yes!

There are situations, every Data Scientist know: a recruiter calls, speaks about a great new challenge for a Data Scientist as you obviously claim on your LinkedIn profile, but in the discussion of the vacancy it quickly becomes clear that you have almost none of the required skills. This mismatch is mainly due to the fact that under the job of the Data Scientist all possible activity profiles, method and tool knowledge are summarized, which a single person can hardly learn in his life. Many open jobs, which are to be called under the name Data Science, describe rather the professional image of the Data Engineer.


Read this article in German:
“Data Science vs Data Engineering – Wo liegen die Unterschiede?“


What is a Data Engineer?

Data engineering is primarily about collecting or generating data, storing, historicalizing, processing, adapting and submitting data to subsequent instances. A Data Engineer, often also named as Big Data Engineer or Big Data Architect, models scalable database and data flow architectures, develops and improves the IT infrastructure on the hardware and software side, deals with topics such as IT Security , Data Security and Data Protection. A Data Engineer is, as required, a partial administrator of the IT systems and also a software developer, since he or she extends the software landscape with his own components. In addition to the tasks in the field of ETL / Data Warehousing, he also carries out analyzes, for example, to investigate data quality or user access. A Data Engineer mainly works with databases and data warehousing tools.

A Data Engineer is talented as an educated engineer or computer scientist and rather far away from the actual core business of the company. The Data Engineer’s career stages are usually something like:

  1. (Big) Data Architect
  2. BI Architect
  3. Senior Data Engineer
  4. Data Engineer

What makes a Data Scientist?

Although there may be many intersections with the Data Engineer’s field of activity, the Data Scientist can be distinguished by using his working time as much as possible to analyze the available data in an exploratory and targeted manner, to visualize the analysis results and to convert them into a red thread (storytelling). Unlike the Data Engineer, a data scientist rarely sees into a data center, because he picks up data via interfaces provided by the Data Engineer or provides by other resources.

A Data Scientist deals with mathematical models, works mainly with statistical procedures, and applies them to the data to generate knowledge. Common methods of Data Mining, Machine Learning and Predictive Modeling should be known to a Data Scientist. Data Scientists basically work close to the department and need appropriate expertise. Data Scientists use proprietary tools (e.g. Tools by IBM, SAS or Qlik) and program their own analyzes, for example, in Scala, Java, Python, Julia, or R. Using such programming languages and data science libraries (e.g. Mahout, MLlib, Scikit-Learn or TensorFlow) is often considered as advanced data science.

Data Scientists can have diverse academic backgrounds, some are computer scientists or engineers for electrical engineering, others are physicists or mathematicians, not a few have economical backgrounds. Common career levels could be:

  1. Chief Data Scientist
  2. Senior Data Scientist
  3. Data Scientist
  4. Data Analyst oder Junior Data Scientist

Data Scientist vs Data Analyst

I am often asked what the difference between a Data Scientist and a Data Analyst would be, or whether there would be a distinction criterion at all:

In my experience, the term Data Scientist stands for the new challenges for the classical concept of Data Analysts. A Data Analyst performs data analysis like a Data Scientist. More complex topics such as predictive analytics, machine learning or artificial intelligence are topics for a Data Scientist. In other words, a Data Scientist is a Data Analyst++ (one step above the Data Analyst).

And how about being a Business Analyst?

Business Analysts can (but need not) be Data Analysts. In any case, they have a very strong relationship with the core business of the company. Business Analytics is about analyzing business models and business successes. The analysis of business success is usually carried out by IT, and many business analysts are starting a career as Data Analyst now. Dashboards, KPIs and SQL are the tools of a good business analyst, but there might be a lot business analysts, who are just analysing business models by reading the newspaper…