March 2021

Why Is Physical Security Vital for Data Security?

March 30, 2021/in Data Security/by Shannon Flynn

Modern businesses hold on to an increasing amount of sensitive and sometimes confidential data. As a result, they’ve had to invest in new technology and practices to keep that data safe.

Many of these businesses, when developing their data security or cybersecurity protocols, focus on the security of their hardware, software and business network. Prioritizing these assets is essential — however, if physical security gets left behind, even the best digital tech may not keep a company’s data safe.

There’s practically no stopping someone with physical access to your data storage from stealing info or compromising your business network.

This is why companies that prioritize digital security also need to carefully consider physical security — and what may happen when physical security is neglected.

Physical Access Can Allow Criminals to Bypass Even the Best Digital Security

It’s almost impossible to protect any device from a physical attack. If a hacker has sustained access to device hardware, they’ll be able to breach its defenses eventually — potentially giving them access to the information on that device, as well as any stored security credentials.

Devices that are digitally secured but not physically secured — like a laptop left behind in a coffee shop, or an IoT sensor in an unlocked case — can provide a valuable vector of attack for hackers. In some cases, that vector may be all they need to create serious trouble for a company.

In some cases, poor building security may enable hackers to sneak into server rooms or gain access to off-site devices, like IoT sensors. Often, hackers also gain access to hardware either by theft — for example, swiping a laptop left sitting in a coffee shop — or by using social engineering to gain remote access.

Even large devices that are rarely moved or accessed by staff — like servers in a data center — can be at risk.

This is why large, high-budget data centers often have what’s colloquially called a mantrap — a set of two interlocking doors, somewhat like an airlock, that one has to pass through to reach the server hardware. These doors serve as a final access check for the data center and help to minimize the risk of unauthorized server access.

These threats aren’t an abstraction — hackers and other criminals have used physical access to steal data in the past.

In 2015, for example, hackers stole five servers from the offices of a British charity, PlanUK. Those servers contained a wealth of information on donators, including names, addresses, bank account numbers and sort codes.

In 2018, the theft of a laptop exposed the data of more than 43,000 patients of the West Virginia-based Coplin Health System — part of the reason that laptop theft is ranked the number one cause of health data breaches.

Valuable Hardware and Essential Systems May Be at High Risk

Hackers may also use physical attack vectors if they need to gain access to critical infrastructure, which may otherwise be air-gapped from internet-connected systems and impossible to attack with digital-only methods.

This is part of why major physical security manufacturers dedicate entire product lines to physical security for nuclear power plants, for example, or airports or international organizations — and why those kinds of institutions take physical security so seriously.

Enterprise-grade computer hardware can also be very valuable — making that hardware a major target. While you may expect criminals to be driven more by data or network access than by the resale value of your servers, theft for resale or reuse has happened before.

In 2018, for example, Icelandic criminals stole 600 bitcoin-mining servers in one of the biggest tech heists on record. Rising cryptocurrency prices may encourage some criminals to plan similar heists of powerful hardware. Owners of data centers, rendering farms and other facilities with high-value hardware should be aware of these risks, as well as how good physical security is necessary to keep their hardware safe.

Using Physical Security to Complement Your Digital Security Planning

Without strong physical security practices, your data can be vulnerable — even if you have a great digital security plan in place.

Hackers, when faced with strong cyber defenses, sometimes turn to physical attacks to gain access to critical hardware. In other cases, they may also be after the hardware for sale or personal use.

Even a basic physical security plan — one that involves ID verification and access control — can go a long way in complementing a digital security strategy and keeping data safe.

Support Vector Machines for Text Recognition

Hand Written Alphabet recognition Using Support Vector Machine

March 20, 2021/in Artificial Intelligence, Data Science, Data Science Hack, Machine Learning, Main Category, R Statistics/by Mohan Rai

We have used image classification as an task in many cases, more often this has been done using an module like openCV in python or using pre-trained models like in case of MNIST data sets. The idea of using Support Vector Machines for carrying out the same task is to give a simpler approach for a complicated process. There are some pro’s and con’s in every algorithm. Support vector machine for data with very high dimension may prove counter productive. But in case of image data we are actually using a array. If its a mono chrome then its just a 2 dimensional array, if grey scale or color image stack then we may have a 3 dimensional array processing to be considered. You can get more clarity on the array part if you go through this article on Machine learning using only numpy array. While there are certainly advantages of using OCR packages like Tesseract or OpenCV or GPTs, I am putting forth this approach of using a simple SVM model for hand written text classification. As a student while doing linear regression, I learn’t a principle “Occam’s Razor”, Basically means, keep things simple if they can explain what you want to. In short, the law of parsimony, simplify and not complicate. Applying the same principle on Hand written Alphabet recognition is an attempt to simplify using a classic algorithm, the Support Vector Machine. We break the problem of hand written alphabet recognition into a simple process rather avoiding usage of heavy packages. This is an attempt to create the data and then build a model using Support Vector Machines for Classification.

Data Preparation

Manually edit the data instead of downloading it from the web. This will help you understand your data from the beginning. Manually write some letters on white paper and get the photo from your mobile phone. Then store it on your hard drive. As we are doing a trial we don’t want to waste a lot of time in data creation at this stage, so it’s a good idea to create two or three different characters for your dry run. You may need to change the code as you add more instances of classes, but this is where the learning phase begins. We are now at the training level.

Data Structure

You can create the data yourself by taking standard pictures of hand written text in a 200 x 200 pixel dimension. Alternatively you can use a pen tab to manually write these alphabets and save them as files. If you know and photo editing tools you can use them as well. For ease of use, I have already created a sample data and saved it in the structure as below.

Image Source : From Author

You can download the data which I have used, right click on this download data link and open in new tab or window. Then unzip the folders and you should be able to see the same structure and data as above in your downloads folder. I would suggest, you should create your own data and repeat the process. This would help you understand the complete flow.

Install the Dependency Packages for RStudio

We will be using the jpeg package in R for Image handling and the SVM implementation from the kernlab package. Also we need to make sure that the image data has dimension’s of 200 x 200 pixels, with a horizontal and vertical resolution of 120dpi. You can vary the dimension’s like move it to 300 x 300 or reduce it to 100 x 100. The higher the dimension, you will need more compute power. Experiment around the color channels and resolution later once you have implemented it in the current form.

# install package "jpeg"

install.packages("jpeg", dependencies = TRUE)

# install the "kernlab" package for building the model using support vector machines

install.packages("kernlab", dependencies = TRUE)

Load the training data set

# load the "jpeg" package for reading the JPEG format files

library(jpeg)

# set the working directory for reading the training image data set

setwd("C:/Users/mohan/Desktop/alphabet_folder/Train")

# extract the directory names for using as image labels

f_train<-list.files()

# Create an empty data frame to store the image data labels and the extracted new features in training environment

df_train<- data.frame(matrix(nrow=0,ncol=5))

Feature Transformation

Since we don’t intend to use the typical CNN, we are going to use the white, grey and black pixel values for new feature creation. We will use the summation of all the pixel values of a image and save it as a new feature called as “sum”, the count of all pixels adding up to zero as “zero”, the count of all pixels adding up to “ones” and the sum of all pixels between zero’s and one’s as “in_between”. The “label” feature names are extracted from the names of the folder

# names the columns of the data frame as per the feature name schema

names(df_train)<- c("sum","zero","one","in_between","label")

# loop to compute as per the logic

counter<-1

for(i in 1:length(f_train))

{

setwd(paste("C:/Users/mohan/Desktop/alphabet_folder/Train/",f_train[i],sep=""))

data_list<-list.files()

for(j in 1:length(data_list))

{

temp<- readJPEG(data_list[j])

df_train[counter,1]<- sum(temp)

df_train[counter,2]<- sum(temp==0)

df_train[counter,3]<- sum(temp==1)

df_train[counter,4]<- sum(temp > 0 & temp < 1)

df_train[counter,5]<- f_train[i]

counter=counter+1

}

# Convert the labels from text to factor form

df_train$label<- factor(df_train$label)

Support Vector Machine model

# load the "kernlab" package for accessing the support vector machine implementation

library(kernlab)

# build the model using the training data

image_classifier <- ksvm(label~.,data=df_train)

Evaluate the Model on the Testing Data Set

# set the working directory for reading the testing image data set

setwd("C:/Users/mohan/Desktop/alphabet_folder/Test")

# extract the directory names for using as image labels

f_test <- list.files()

# Create an empty data frame to store the image data labels and the extracted new features in training environment

df_test<- data.frame(matrix(nrow=0,ncol=5))

# Repeat of feature extraction in test data

names(df_test)<- c("sum","zero","one","in_between","label")

# loop to compute as per the logic

for(i in 1:length(f_test))

{

temp<- readJPEG(f_test[i])

df_test[i,1]<- sum(temp)

df_test[i,2]<- sum(temp==0)

df_test[i,3]<- sum(temp==1)

df_train[counter,4]<- sum(temp > 0 & temp < 1)

df_test[i,5]<- strsplit(x = f_test[i],split = "[.]")[[1]][1]

}

# Use the classifier named "image_classifier" built in train environment to predict the outcomes on features in Test environment

df_test$label_predicted<- predict(image_classifier,df_test)

# Cross Tab for Classification Metric evaluation

table(Actual_data=df_test$label,Predicted_data=df_test$label_predicted)

I would recommend you to learn concepts of SVM which couldn’t be explained completely in this article by going through my free Data Science and Machine Learning video courses. We have created the classifier using the Kerlab package in R, but I would advise you to study the mathematics involved in Support vector machines to get a clear understanding.

Data Security for Data Scientists & Co. – Infographic

March 9, 2021/in Data Science News, Data Security, Education / Certification, Gerneral, Insights, Main Category/by Benjamin Aunkofer

Data becomes information and information becomes knowledge. For this reason, companies are nowadays also evaluated with regard to their data and their data quality. Furthermore, data is also the material that is needed for management decisions and artificial intelligence. For this reason, IT Security is very important and special consulting and auditing companies offer their own services specifically for the security of IT systems.

However, every Data Scientist, Data Analyst and Data Engineer rarely only works with open data, but rather intensively with customer data. Therefore, every expert for the storage and analysis of data should at least have a basic knowledge of Data Security and work according to certain principles in order to guarantee the security of the data and the legality of the data processing.

There are a number of rules and principles for data security that must be observed. Some of them – in our opinion the most important ones – we from DATANOMIQ have summarized in an infographic for Data Scientists, Data Analysts and Data Engineers. You can download the infographic here: DataSecurity_Infographic

Data Security for Data Scientists, Data Analysts and Data Engineers

Download Infographic as PDF

Infographic – Data Security for Data Scientists, Data Analysts and Data Engineers

In-memory Caching in Finance

March 5, 2021/in InMemory, Insights, Main Category/by Edward Huskin

Big data has been gradually creeping into a number of industries through the years, and it seems there are no exceptions when it comes to what type of business it plans to affect. Businesses, understandably, are scrambling to catch up to new technological developments and innovations in the areas of data processing, storage, and analytics. Companies are in a race to discover how they can make big data work for them and bring them closer to their business goals. On the other hand, consumers are more concerned than ever about data privacy and security, taking every step to minimize the data they provide to the companies whose services they use. In today’s ever-connected, always online landscape, however, every company and consumer engages with data in one way or another, even if indirectly so.

Despite the reluctance of consumers to share data with businesses and online financial service providers, it is actually in their best interest to do so. It ensures that they are provided the best experience possible, using historical data, browsing histories, and previous purchases. This is why it is also vital for businesses to find ways to maximize the use of data so they can provide the best customer experience each time. Even the more traditional industries like finance have gradually been exploring the benefits they can gain from big data. Big data in the financial services industry refers to complex sets of data that can help provide solutions to the business challenges financial institutions and banking companies have faced through the years. Considered today as a business imperative, data management is increasingly leveraged in finance to enhance processes, their organization, and the industry in general.

How Caching Can Boost Performance in Finance

In computing, caching is a method used to manage frequently accessed data saved in a system’s main memory (RAM). By using RAM, this method allows quick access to data without placing too much load on the main data stores. Caching also addresses the problems of high latency, network congestion, and high concurrency. Batch jobs are also done faster because request run times are reduced—from hours to minutes and from minutes to mere seconds. This is especially important today, when a host of online services are available and accessible to users. A delay of even a few seconds can lead to lost business, making both speed and performance critical factors to business success. Scalability is another aspect that caching can help improve by allowing finance applications to scale elastically. Elastic scalability ensures that a business is equipped to handle usage peaks without impacting performance and with the minimum required effort.

Below are the main benefits of big data and in-memory caching to financial services:

Big data analytics integration with financial models
Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
Real-time stock market insights
As data volumes grow, data management becomes a vital factor to business success. Stock markets and investors around the globe now rely on advanced algorithms to find patterns in data that will help enable computers to make human-like decisions and predictions. Working in conjunction with algorithmic trading, big data can help provide optimized insights to maximize portfolio returns. Caching can consequently make the process smoother by making access to needed data easier, quicker, and more efficient.
Customer analytics
Understanding customer needs and preferences is the heart and soul of data management, and, ultimately, it is the goal of transforming complex datasets into actionable insights. In banking and finance, big data initiatives focus on customer analytics and providing the best customer experience possible. By focusing on the customer, companies are able to Ieverage new technologies and channels to anticipate future behaviors and enhance products and services accordingly. By building meaningful customer relationships, it becomes easier to create customer-centric financial products and seize market opportunities.
Fraud detection and risk management
In the finance industry, risk is the primary focus of big data analytics. It helps in identifying fraud and mitigating operational risk while ensuring regulatory compliance and maintaining data integrity. In this aspect, an in-memory cache can help provide real-time data that can help in identifying fraudulent activities and the vulnerabilities that caused them so that they can be avoided in the future.

What Does This Mean for the Finance Industry?

Big data is set to be a disruptor in the finance sector, with 70% of companies citing big data as a critical factor of the business. In 2015 alone, financial service providers spent $6.4 billion on data-related applications, with this spending predicted to increase at a rate of 26% per year. The ability to anticipate risk and pre-empt potential problems are arguably the main reasons why the finance industry in general is leaning toward a more data-centric and customer-focused model. Data analysis is also not limited to customer data; getting an overview of business processes helps managers make informed operational and long-term decisions that can bring the company closer to its objectives. The challenge is taking a strategic approach to data management, choosing and analyzing the right data, and transforming it into useful, actionable insights.