Data must have a certain quality and history
Data Science Blog: What prerequisites must be fulfilled to ensure that predictive analyses work adequately for customer behaviour?
The data must, of course, have a certain quality and history to recognize trends and cycles. Often, however, one can also create an advantage by using additional new data sources. Experience and creativity are enormously important to understand what is possible and how to improve the quality of our work, or whether something only increases the noise.
Data Science Blog: What external data sources do you need to integrate? How do you handle unstructured data?
As far as external data sources are concerned, we are very spoiled here in England. We use about 10,000 different signals on average, and which vary depending on the question. These might include signals that show the composition of the population, local traffic information, the proximity of sights, hospitals, schools, crime rates and many more. The influence of each signal is also different for each problem. So, a high number of pick pocketing incidences can be a positive sign of the vibrancy of an area, and that people carry a lot of cash on average. For a fast food retailer with a presence in the city centre, for example, this could have a positive influence on a decision to invest in a new outlet in the area, in another area the opposite.
Data Science Blog: What possibilities does data science provide for forensics or fraud detection?
Every customer is surrounded by thousands of data signals and produces and transmits more by through his behaviour. This enables us to get a pretty good picture about the person online. As every kind of person also has a certain behavioural pattern (and this also applies to fraudsters) it is possible to recognise or predict these patterns in time.
Data Science Blog: What tools do you use in your work? When do you rely on proprietary software or on open source?
This depends on what stage we are in the process and the goal defined. We differentiate our team into different groups: Our Data Wranglers (who are responsible for extracting, generating and processing the data) work with other tools than our Data Modellers. Basically our tool kit covers the entire range of SQL Server, R, Python, but sometimes also Matlab or SAS. More and more, we are working with cloud-based solutions. Data visualization and dashboards in Qlik, Tableau or Alteryx are usually passed on to other teams.
Data Science Blog: What does your working day as a data scientist look like from after the morning café until the end of the evening?
My role is perhaps best described as the player’s coach. At the beginning of a project, it is primarily about working with the client to understand and develop the project. New ideas and methods have to be developed. During a project, I manage the teams and knowledge transfer; the review and the questioning of the models are my main tasks. In the end I do the final sign-off of the project. Since I often run several projects at different stages at the same time, it is guaranteed never boring.