Geschriebene Artikel über Big Data Analytics

Leveraging Data Science for Vaccine Access and Administration

As people across the world become eligible for receiving their doses, governments and pharmaceutical companies must act efficiently to provide those vaccines. Alongside distribution, people need more information about these newer vaccines. As a solution for both of these obstacles, data science is a useful tool for vaccine development, distribution, and access throughout the COVID-19 pandemic.

From the initial stages of social distancing and vaccine development to broadening access to the vaccines, data science uses information from countless resources to provide evidence-based, actionable recommendations. Governments and health care providers then act upon this data to help the public and move towards the global goal of eliminating the virus.

Vaccine Development

Since the pandemic began, vaccines have been a sign of hope and a return to a new normalcy. However, getting to effective vaccine distribution first required using data science to develop the doses themselves.

The COVID-19 vaccines have been some of the fastest-developed inoculations in history, which is partly because of the efforts of data scientists. Using machine learning, researchers were able to analyze the sequences of strains of the virus and establish what parts a vaccine would best respond to. Specifically, the sequences had to be those that would be less likely to mutate in the future and less likely to cause an adverse reaction in humans with an injection.

Machine learning helped scientists predict and theorize about which proteins would be the best to work within the SARS-CoV-2 strains. From there, they proceeded with creating vaccines that are now in use all over the world, like Pfizer’s or Moderna’s.

Then, as vaccines become more available, governments again rely on data science to dictate eligibility. Data analytics systems take into account exposure risk, demographics, jobs, and health conditions, which have helped countries break up eligibility into phases.

Supply and Demand

Vaccine supply chains had rocky beginnings throughout the world. In Germany, residents faced shortages of doses, where demand far outweighed the available vials. This type of shortage is especially dangerous, as it can lead to an increase in cases or a full-on spike.

To avoid these uneven dynamics, data science can provide more accurate projections of how many vaccines regions will need on a weekly basis. Data science systems that use machine learning can account for the population that’s eligible and historical COVID-19 vaccination numbers. Then, as eligibility opens up, these systems predict how many vaccines a facility or county will need in the future.

Vaccine administrative organizations can then communicate better with the government to request the doses they will properly handle and use weekly.

In the United States, West Virginia is working with data science dashboards to identify who is most at risk of contracting the virus. Then, they can request the right amount of vials each week and give them to the people who need them the most.

Information Access

As new vaccines and government mandates come into play, residents all over the world need more information to feel safe and to know what they should do. Vaccine scams, for example, have increased with distribution. These scams will ask for personal information like a Social Security number or a form of payment.

To avoid these scams and learn about the vaccine options available, the public needs more access to information. Data science is again helpful to distribute this information.

Google has become a leader in information access with its Intelligence Vaccine Impact initiative. With this program, Google uses machine learning and artificial intelligence (AI) to process data regarding government policy changes, vaccine availability, eligibility, and demographics. That way, people know when they can receive a vaccine, the research and information that supports the vaccines, and what scams to avoid along the way.

Then, based on the vaccine information Google gathers, data scientists can provide a clearer trajectory of the pandemic globally and locally as vaccines help cases go down.

A Data-Driven Path Forward

Data science provides solutions for vaccine access, distribution, and administration. With these powerful dynamics in place, it’s clear that data will lead the world towards a healthier future. Based on evidence from the pandemic, data helps governments and health care providers offer the best solutions for eliminating the virus and protecting people everywhere.

The Role Data Plays in Customer Relationship Management

Image Source: Pixabay

The longevity of your business is dependent on your customer relationships. The depth of your customer relationships is connected to how well they’re managed. How well you work your customer relationships relies on the data you collect about them, like buying behaviors, values, needs, and lifestyle choices.

In addition to collecting quality customer data, organizing and effectively analyzing this data is vital to creating content, products, services, and anything else that resonates with your current and potential customers. Building an accurate customer journey and successfully using data in customer relationship management results in retaining customers and acquiring new ones to help drive more sales.

How is data used to manage customer relationships and navigate your customer’s journey with your business? Let’s first examine the difference between business analytics and data science and their respective roles in the effective use of data. We’ll then touch on the role data plays in creating your customer journey map and the best ways to use data to educate your CRM team.

Business Analytics vs. Data Science: What’s the Difference?

Business analytics and data science both play integral roles in the collection and usage of data. Knowing the difference between business analytics and data science makes building and maintaining customer relationships more productive. When you see the difference between data science and business analytics, you can make informed decisions about marketing techniques, sales funnels, software use, and other operational choices using your CRM team’s data.

What is business analytics?

Business analytics is the process of collecting data on overall processes, software, and other products and services, understanding it, and using it to make better business decisions. Data mining strategies and predictive analytics are among the tools and techniques used to execute the above actions.

Business analysts are adept in:

  • Assessing and organizing raw data.
  • Analyzing data about how to help a business improve.
  • Forecasting and identifying patterns and sequences.
  • Data visualization and conceptualizing business “big pictures.”
  • Overall information optimization.
  • Working with stakeholders to help them understand data.

Business analytics differs from data science in that it’s primarily used to establish how customer, software, system, tool, and technique data influence business growth. It transforms information into actionable steps that incite progress within a company and connection with current and potential customers.

What is data science?

Data science analyzes the collection process, the most streamlined and efficient ways to process data, and the most effective ways to communicate complex information to an analyst or company leader responsible for making critical business decisions. Data science takes advantage of emerging technologies like artificial intelligence, algorithms, and machine learning systems.

Data scientists are adept in:

  • Evaluating specific questions and customer inquiries to develop data-driven strategies.
  • Managing and analyzing critical information with the help of advanced technology.
  • Detecting useful statistical patterns.
  • Cleaning or “scrubbing” data.
  • Making data easier to understand through efficient organization.
  • Understanding how to leverage algorithms.

Data science differs from business analytics in that its primary focus is on collecting useful data and understanding different ways to best process and implement the collected data. Data science isn’t necessarily concerned with choosing the best ways to implement the lessons learned from data collection. Instead, it focuses on clearly communicating what the data means and listing implementation strategies, and leaving the decision up to business analysts and company leaders.

The Role Data Plays in Creating a Strong Customer Journey

According to Lucidspark, “A customer journey map is a diagram that illustrates how your customers interact with your company and engage with your products, website, and/or services.” It gives you a holistic view of a customer’s life-cycle with your business, from how they’re introduced to your brand, retained long enough to make a purchase, and nurtured into a long-term customer relationship.

The collection of customer data allows you to honestly know your audience and target them with personalized sales and marketing campaigns. It’s essential to understand how to collect data and interpret and implement it to create personal relationships with customers that result in loyalty and retention.

Data’s role in creating a robust customer journey map includes:

  • Tracking why individual visitors don’t become customers.
  • Identifying gaps in user experience.
  • Understanding why people connect with your brand the way they do.
  • Identifying similarities your long-term customers have.
  • Identifying similarities new visitors have.
  • Highlighting what messaging resonates most with your customers.
  • Identifying marketing channels responsible for the most conversions.
  • Pinpointing the best ways to communicate with your customers.
  • Showing which digital platforms they make the most use of.
  • Highlighting specific pain points, challenges, and problems common among your customers.

Your customer journey map gives specific details about your customers that you can use to construct an accurate representation of how they’d experience your brand. The more detailed the data, the more precise the customer journey. You can accurately map out:

  • What problem or pain points do they have?
  • Precisely what would bring them to researching a solution?
  • How they’d choose your product as the solution?
  • How they’d choose your brand to purchase that product?
  • How do they make purchases in general?
  • How they’d potentially purchase your product?
  • How they’d receive your product and experience it?
  • How they’d ultimately become a loyal customer?

The Best Ways to Use Data to Inform Your CRM Team

How can you use data to inform your customer relationship management team?

Using data effectively to inform your CRM team starts with functional CRM software. A CRM software can help seamlessly establish a relationship with customers by ensuring all the right information is collected, stored, and grouped correctly. The organizational structure a CRM software offers makes it easier for your CRM team to leverage collected data and shift strategies to accommodate customer needs and values.

CRM software is only as efficient as your team members are. Each team member plays an integral role in analytics and understands human behavior that drives data collection. Practical data analysis can inform your CRM team and help them with things like:

  • Tailoring your social media content.
  • Tracking spending habits and purchasing patterns.
  • Catering website structure towards increased customer satisfaction.
  • Boosting organic traffic.
  • Identifying what data to collect and how to manage it.
  • Establishing goals for your marketing efforts.
  • Understanding the most efficient communication channels for your customers.
  • Identifying what lifestyle behaviors could influence them completing a purchase with your business.
  • Exploring paid advertising avenues.
  • Segmenting your customer base.
  • Constructing ads for digital platforms.
  • Furthering your business’ online presence.
  • Personalizing customer experiences with your brand through specific written content and visual media.
  • Converting visitors to paying customers through sales techniques.

The ultimate goal for data collection and CRM software is customer acquisition and retention. Your CRM team can also use data to engage with customers consistently, streamline marketing funnels, and improve profitability with better products and services. They’re responsible for finding out what specific data is essential to collect, how it relates to moving your business forward, how you can turn the data into actionable strategies.

Other ways data collection informs your CRM team and helps shape overall business strategy include:

  • Bettering customer resolution strategies and implementing simple systems for addressing customer issues.
  • Choosing the most effective marketing media, brand colors, and overall business aesthetic.
  • Generating personalized purchase recommendations and suggestions and automating how they’re sent to customers.
  • Identifying opportunities for customer retention and gaps in acquisition techniques.
  • Achieving a deeper understanding of the lifestyle your ideal customer lives.
  • Improving your consumer database with a clean, simple, concise organizational structure.

You Can Maneuver Your Customer Journey Through Effective Use of Data in CRM

Data shows your CRM team how your customers are interacting with the brand and experiencing each stage of the buyer’s journey. Data can show you exactly who and what to focus on in building brand awareness. Tailor your messaging, aesthetics, events, engagement techniques, and marketing to your ideal customer’s behaviors.

By intentionally tracking essential customer data, you can implement the following potential solutions to maintain a healthy connection with each customer:

  • Changing your brand colors to those that evoke emotions specific to how you want visitors to experience your product or service.
  • Experimenting with different Call-to-action or CTA button colors and messaging.
  • Shifting media and image choices to those most popular with loyal customers.
  • Focusing solely on social media platforms most used by your target audience.
  • Using marketing messaging to speak directly to their current experiences.
  • Using the platforms your target audience gets their information from.
  • Establishing a preferred communication method.
  • Offering a variety of payment options that appreciate advanced technology.

Your CRM team can use data to make sales and marketing decisions that best align with company goals. Collecting and using customer relationship management data is the best practice for any business looking to define their customer’s journey with their brand clearly.

5 AI Tricks to Grow Your Online Sales

The way people shop is currently changing. This only means that online stores need optimization to stay competitive and answer to the needs of customers. In this post, we’ll bring up the five ways in which you can use artificial intelligence technology in an online store to grow your revenues. Let’s begin!

1. Personalization with AI

Opening the list of AI trends that are certainly worth covering deals with a step up in personalization. Did you know that according to the results of a survey that was held by Accenture, more than 90% of shoppers are likelier to buy things from those stores and brands that propose suitable product recommendations?

This is exactly where artificial intelligence can give you a big hand. Such progressive technology analyzes the behavior of your consumers individually, keeping in mind their browsing and purchasing history. After collecting all the data, AI draws the necessary conclusions and offers those product recommendations that the user might like.

Look at the example below with the block has a carousel of neat product options. Obviously, this “move” can give a big boost to the average cart sizes.

Screenshot taken on the official Reebok website

Screenshot taken on the official Reebok website

2. Smarter Search Options

With the rise of the popularity of AI voice assistants and the leap in technology in general, the way people look for things on the web has changed. Everything is moving towards saving time and getting faster better results.

One of such trends deals with embracing the text to speech and image search technology. Did you notice how many search bars have “microphone icons” for talking out your request?

On a similar note, numerous sites have made a big jump forward after incorporating search by picture. In this case, uploaded photos get analyzed by artificial intelligence technology. The system studies what’s depicted on the image and cross-checks it with the products sold in the store. In several seconds the user is provided with a selection of similar products.

Without any doubt, this greatly helps users find what they were looking for faster. As you might have guessed, this is a time-saving feature. In essence, this omits the necessity to open dozens of product pages on multiple sites when seeking out a liked item that they’ve taken a screenshot or photo of.

Check out how such a feature works on the official Amazon website by taking a look at the screenshots of StyleSnap provided below.

Screenshot taken on the official Amazon StyleSnap website

Screenshot taken on the official Amazon StyleSnap website

3. Assisting Clients via Chatbots

The next point on the list is devoted to AI chatbots. This feature can be a real magic wand with client support which is also beneficial for online sales.

Real customer support specialists usually aren’t available 24/7. And keeping in mind that most requests are on repetitive topics, having a chatbot instantly handle many of the questions is a neat way to “unload” the work of humans.

Such chatbots use machine learning to get better at understanding and processing client queries. How do they work? They’re “taught” via scripts and scenario schemes. Therefore, the more data you supply them with, the more matters they’ll be able to cover.

Case in point, there’s such a chat available on the official Victoria’s Secret website. If the user launches the Digital Assistant, the messenger bot starts the conversation. Based on the selected topic the user selects from the options, the bot defines what will be discussed.

Screenshot taken on the official Victoria’s Secret website

Screenshot taken on the official Victoria’s Secret website

4. Determining Top-Selling Product Combos

A similar AI use case for boosting online revenues to the one mentioned in the first point, it becomes much easier to cross-sell products when artificial intelligence “cracks” the actual top matches. Based on the findings by Sumo, you can boost your revenues by 10 to 30% if you upsell wisely!

The product database of online stores gets larger by the month, making it harder to know for good which items go well together and complement each other. With AI on your analytics team, you don’t have to scratch your head guessing which products people are likely to additionally buy along with the item they’re browsing at the moment. This work on singling out data can be done for you.

As seen on the screenshot from the official MAC Cosmetics website, the upselling section on the product page presents supplement items in a carousel. Thus, the chance of these products getting added to the shopping cart increases (if you compare it to the situation when the client would search the site and find these products by himself).

Screenshot taken on the official MAC Cosmetics website

Screenshot taken on the official MAC Cosmetics website

5. “Try It On” with a Camera

The fifth AI technology in this list is virtual try on that borrowed the power of augmented reality technology in the world of sales.

Especially for fields like cosmetics or accessories, it is important to find ways to help clients to make up their minds and encourage them to buy an item without testing it physically. If you want, you can play around with such real-time functionality and put on makeup using your camera on the official Maybelline New York site.

Consumers, ultimately, become happier because this solution omits frustration and unneeded doubts. With everything evident and clear, people don’t have the need to take a shot in the dark what will be a good match, they can see it.

Screenshot taken on the official Maybelline New York website

Screenshot taken on the official Maybelline New York website

In Closing

To conclude everything stated in this article, artificial intelligence is a big crunch point. Incorporating various AI-powered features into an online retail store can be a neat advancement leading to a visible growth in conversions.

What is Data Warehousing and Data Mining – Know the difference between Data Warehousing and Data Mining

Getting started

Before we start off with Data Warehousing and Data Mining, let us first set the ground for the same. This will help in understanding why we need them in the first place. By the end of the post, you would feel much more acquainted with the two topics at hand. So here goes!

With the exponential increase in the generation and consumption of the data, the organizations have to deal with a humungous amount of data at their end. We all have heard the talks about data being the new oil, which is rather turning out to be a reality. Data is considered to be an extremely valuable asset for every organization and they attempt to put it to good use. The data assists the organizations in making business decisions that will generate significant revenues. It helps in understanding the current requirements of the market which is vital for some organizations to stay in business. Thus, it is essential for organizations to store the data somewhere which can be later utilized for analytical purposes.

 

Introduction to Data Warehousing

Data Warehousing is a process to collect and manage data from a variety of sources. The data can also come from different departments of an organization like Finance, Marketing, etc. The idea of constructing a Data Warehouse is to be able to use the data for analytical purposes and make decisions based upon the analysis. Data warehouses are a pivotal component of any analytical and business intelligence operations at an organization. As the organizations can generate data at various sources, we might need to use different tools in order to store the data at a single source. The process of data warehousing generally involves ETL (Extract – Transform – Load) tools that help to extract the data from different sources, transform the data into a suitable format, and load the data into a single source. There are some tools like Google BigQuery, Amazon Redshift, etc that allow you to connect with a vast number of sources to store the data in one place. The data warehouses can be implemented on-premise as well as on the cloud. On-premise data warehouses are implemented on the local networks of the organization while cloud data warehouses are implemented over the internet. There is always a trade-off in making decisions as to which one to choose because there are multiple factors to be considered like scalability, initial investment, recurring costs, security, speed, etc.

General steps to implement a data warehouse

  1. Determining the business objectives.

Every organization can have different business objectives that define success in its terms. Some organizations are involved in a constantly changing market which will require a large number of sources while others might just need to use the data for better administration purposes. Hence the key step to initiate the creation of a data warehouse is to include the stakeholders and determine their business objectives.

  1. Analyzing and obtaining the information regarding the objectives.

Once the business objectives are decided, the information regarding those objectives needs to be obtained. The information can be obtained using any periodic report, or any CRM application, etc depending upon the organization. An extensive amount of interaction with all the supervisors attending to that information can be crucial for this process. Interacting with the people that are daily involved in this routine can serve a lot of information. These people tend to know the bits and pieces of the entire task and any information obtained from these people can lead to better implementation of the data warehouse. This step also helps in identifying the key performance indicators for the desired objectives.

  1. Identifying the concerned departments in the organization.

The key performance indicators can be different for different organizations. For example, in an organization that deals with the manufacturing of different products, if their objective were to increase the revenue then the number of units sold would be one of the key performance indicators. Based on these indicators, the involvement of the concerned departments proves to be significant.

  1. Create a layout of the data warehouse.

With all the key performance indicators discovered, create a layout of the data warehouse. It will help in providing an overview of the entire data warehouse and the data that will be stored in it. It will determine what key indicators are being stored in the data warehouse and whether all the indicators required for our objectives exist or not. As the data will be pulled in periodically, think about all of the investment costs including the hardware costs and recurring costs.

  1. Locating the sources of data and its transformation.

After finalizing the layout, we need to locate the sources of the data and figure out how the data can be extracted from it. The data can be in a CRM application or any database, we need to export the data or make use of an ETL tool that can connect with the data source. As the data comes from different sources, there is a need to consolidate all of the data. Also, there are high chances that the data is not clean and needs some transformation. In case some data may not be extracted then we need the reconsider the layout of the data warehouse. Many times, these two steps are performed in a parallel fashion.

  1. Implementation of the data warehouse layout

After the objectives are set in place, the concerned stakeholders are looped into the plan, the information is collected and analyzed, a layout for creating a data warehouse is planned, the data sources are located and transformed, now it is time to put all the things to work. After the data from the warehouse can be accessed, the data needs to be pulled from the sources periodically. We need to monitor the data warehouse continuously and check for any irregularities.

 

Introduction to Data Mining

Data Mining is a process to extract insightful information from a large amount of raw data. The intent of this process is to find some trends and patterns which would help organizations in making data-driven decisions. This process is one of the steps of KDD or Knowledge Discovery in Database. KDD also includes different sub-processes like data cleaning, data transformation, data pre-processing, etc. There are a number of tools that we can use for performing Data Mining – Tableau and Power BI being two of them. We can also make use of certain packages in Python and R languages to extract information. Data Mining helps in analyzing a huge amount of data in a quick amount of time. It is an essential step in any data science project because it provides some exploratory insights which might tell us which features are very important in prediction or which features provide very little information. Data Mining helps the organization in various ways like analyzing their market expenditure, resource management, fraud detection, etc. There are a lot of Data mining techniques which include Association rules, Classification, Clustering, Regression, Outlier Detection, etc.  It is a very cost-effective solution for the organization and it can regularly provide new information and analysis depending upon the skillsets. The process of data mining also begins with the understanding of the business and the data. Developing a thorough knowledge about the business and its related data is extremely pivotal for the analysts to be able to perform some operations on it.

Examples of Data Mining

  1. Ever got any recommendations on an e-commerce store when you are buying a product? Like when you buy a smartphone, the website will show you some phone cases or accessories. This is what is known as Basket Analysis. In this analysis, the buying patterns of the customers and what they tend to buy along with the other products are analyzed. It not only helps in an e-commerce website but also is implemented at any supermarket or grocery store. It can be done using Association based learning and creating some rules.
  2. Fraud detection is one of the most vital use-cases of Data Mining. Banks have a lot at stake due to the fraudulent transactions because they have to bear the losses for these transactions. Data Mining can help in analyzing the data and catching these fraudulent activities. Although it is a tedious task, the organizations can try to extract some patterns that will help in getting a hold of these fraudsters.

 

The link between Data Warehousing and Data Mining

Although we will mention the differences that lie between these two terms, let us see how data warehousing data mining is linked to each other. In fact, data warehousing and data mining work in conjunction with each other. As mentioned earlier in the article, we all know that data mining includes the extraction of useful information from tons of data. In order to perform data mining, from where will the analyst obtain the data? Their search ends at the data warehouse itself as it is a single source of contact for all their data needs. The data mining process is provided with all the information that is required for the analysis from the data warehouse. In many instances, the analytical team is able to extract some useful information from the data merged from two completely different departments of the organizations or even from different offices of the same department.

Distinguishing between Data Warehousing and Data Mining

Parameter Data Warehousing Data Mining
Process It is a process of storing data from multiple sources. It is a process of using different methods to analyze the raw data.

 

Ideology The idea behind it was to centralize all the sources of data into one location for ease of use in analytical processes. The idea behind it was to use the data to find some trends and patterns and help the organization in making good decisions.
Requirements To implement a data warehouse, we need to locate the different means of sources and how the data can be extracted from those sources. In order to perform data mining, a data warehouse needs to be implemented to be able to look at all the sources and then analyze the raw data.
Maintenance The data pipelines need to be maintained and monitored to prevent any loss of data. The methods used for extracting information need to be maintained and monitored in order to check if they provide any useful information or not.
Periodicity The data is extracted and stored in the data warehouse periodically. The data needs to be analyzed periodically for continuously extracting useful information.
Tools Tools used for this process include Google BigQuery, Amazon Redshift, etc Tools used for this process include Tableau, Power BI, etc.
Benefits Easy access to the historic data of the organization Helps in detecting any fraudulent operations, financial and market analysis, etc.

 

End Notes

Any organization that plans to use the data at hand for analytical purposes needs to implement a data warehouse and different data mining techniques. It requires a good amount of skillset and resources for getting good use of it. Another element that is also vital in the entire data analysis process is the interpretation of the analysis. One should be able to correctly interpret what the data is trying to tell you because all the decisions are based on these interpretations. Bad decisions could really cost organizations a fortune of money. But the decisions that are spot-on can make the organizations earn a fortune of money as well. This explains the increasing demand for different positions such as Data Engineers, Data Scientists, and Data Analysts.

This article was centered on giving its readers an overview of data warehousing and data mining. It mentioned different steps that are generally involved during the implementation of a data warehouse. It illustrated a couple of examples of Data Mining and explained how data warehousing and data mining are linked to each other. And lastly, provided some distinguishing parameters between data warehousing and data mining.

Bag of Words: Convert text into vectors

In this blog, we will study about the model that represents and converts text to numbers i.e. the Bag of Words (BOW). The bag-of-words model has seen great success in solving problems which includes language modeling and document classification as it is simple to understand and implement.

After completing this particular blog, you all will have an overview of: What does the bag-of-words model mean by and why is its importance in representing text. How we can develop a bag-of-words model for a collection of documents. How to use the bag of words to prepare a vocabulary and deploy in a model using programming language.

 

The problem and its solution…

The biggest problem with modeling text is that it is unorganised, and most of the statistical algorithms, i.e., the machine learning and deep learning techniques prefer well defined numeric data. They cannot work with raw text directly, therefore we have to convert text into numbers.

Word embeddings are commonly used in many Natural Language Processing (NLP) tasks because they are found to be useful representations of words and often lead to better performance in the various tasks performed. A huge number of approaches exist in this regard, among which some of the most widely used are Bag of Words, Fasttext, TF-IDF, Glove and word2vec. For easy user implementation, several libraries exist, such as Scikit-Learn and NLTK, which can implement these techniques in one line of code. But it is important to understand the working principle behind these word embedding techniques. As already said before, in this blog, we see how to implement Bag of words and the best way to do so is to implement these techniques from scratch in Python . Before we start with coding, let’s try to understand the theory behind the model approach.

 Theory Behind Bag of Words Approach

In simple words, Bag of words can be defined as a Natural Language Processing technique used for text modelling or we can say that it is a method of feature extraction with text data from documents.  It involves mainly two things firstly, a vocabulary of known words and, then a measure of the presence of known words.

The process of converting NLP text into numbers is called vectorization in machine learning language.A lot of different ways are available in converting text into vectors which are:

Counting the number of times each word appears in a document, and Calculating the frequency that each word appears in a document out of all the words in the document.

Understanding using an example

To understand the bag of words approach, let’s see how this technique converts text into vectors with the help of an example. Suppose we have a corpus with three sentences:

  1. “I like to eat mangoes”
  2. “Did you like to eat jellies?”
  3. “I don’t like to eat jellies”

Step 1: Firstly, we go through all the words in the above three sentences and make a list of all of the words present in our model vocabulary.

  1. I
  2. like
  3. to
  4. eat
  5. mangoes
  6. Did
  7. you
  8. like
  9. to
  10. eat
  11. Jellies
  12. I
  13. don’t
  14. like
  15. to
  16. eat
  17. jellies

Step 2: Let’s find out the frequency of each word without preprocessing our text.

But is this not the best way to perform a bag of words. In the above example, the words Jellies and jellies are considered twice no doubt they hold the same meaning. So, let us make some changes and see how we can use ‘bag of words’ by preprocessing our text in a more effective way.

Step 3: Let’s find out the frequency of each word with preprocessing our text. Preprocessing is so very important because it brings our text into such a form that is easily understandable, predictable and analyzable for our task.

Firstly, we need to convert the above sentences into lowercase characters as case does not hold any information. Then it is very important to remove any special characters or punctuations if present in our document, or else it makes the conversion more messy.

From the above explanation, we can say the major advantage of Bag of Words is that it is very easy to understand and quite simple to implement in our datasets. But this approach has some disadvantages too such as:

  1. Bag of words leads to a high dimensional feature vector due to the large size of word vocabulary.
  2. Bag of words assumes all words are independent of each other ie’, it doesn’t leverage co-occurrence statistics between words.
  3. It leads to a highly sparse vector as there is nonzero value in dimensions corresponding to words that occur in the sentence.

Bag of Words Model in Python Programming

The first thing that we need to create is a proper dataset for implementing our Bag of Words model. In the above sections, we have manually created a bag of words model with three sentences. However, now we shall find a random corpus on Wikipedia such as ‘https://en.wikipedia.org/wiki/Bag-of-words_model‘.

Step 1: The very first step is to import the required libraries: nltk, numpy, random, string, bs4, urllib.request and re.

Step 2: Once we are done with importing the libraries, now we will be using the Beautifulsoup4 library to parse the data from Wikipedia.Along with that we shall be using Python’s regex library, re, for preprocessing tasks of our document. So, we will scrape the Wikipedia article on Bag of Words.

Step 3: As we can observe, in the above code snippet we have imported the raw HTML for the Wikipedia article from which we have filtered the text within the paragraph text and, finally,have created a complete corpus by merging up all the paragraphs.

Step 4: The very next step is to split the corpus into individual sentences by using the sent_tokenize function from the NLTK library.

Step 5: Our text contains a number of punctuations which are unnecessary for our word frequency dictionary. In the below code snippet, we will see how to convert our text into lower case and then remove all the punctuations from our text, which will result in multiple empty spaces which can be again removed using regex.

Step 6: Once the preprocessing is done, let’s find out the number of sentences present in our corpus and then, print one sentence from our corpus to see how it looks.

Step 7: We can observe that the text doesn’t contain any special character or multiple empty spaces, and so our own corpus is ready. The next step is to tokenize each sentence in the corpus and create a dictionary containing each word and their corresponding frequencies.

As you can see above, we have created a dictionary called wordfreq. Next, we iterate through each word in the sentence and check if it exists in the wordfreq dictionary.  On its existence,we will add the word as the key and set the value of the word as 1.

Step 8: Our corpus has more than 500 words in total and so we shall filter down to the 200 most frequently occurring words by using Python’s heap library.


Step 9: Now, comes the final step of converting the sentences in our corpus into their corresponding vector representation. Let’s check the below code snippet to understand it. Our model is in the form of a list of lists which can be easily converted matrix form using this script:

Multi-head attention mechanism: “queries”, “keys”, and “values,” over and over again

This is the third article of my article series named “Instructions on Transformer for people outside NLP field, but with examples of NLP.”

In the last article, I explained how attention mechanism works in simple seq2seq models with RNNs, and it basically calculates correspondences of the hidden state at every time step, with all the outputs of the encoder. However I would say the attention mechanisms of RNN seq2seq models use only one standard for comparing them. Using only one standard is not enough for understanding languages, especially when you learn a foreign language. You would sometimes find it difficult to explain how to translate a word in your language to another language. Even if a pair of languages are very similar to each other, translating them cannot be simple switching of vocabulary. Usually a single token in one language is related to several tokens in the other language, and vice versa. How they correspond to each other depends on several criteria, for example “what”, “who”, “when”, “where”, “why”, and “how”. It is easy to imagine that you should compare tokens with several criteria.

Transformer model was first introduced in the original paper named “Attention Is All You Need,” and from the title you can easily see that attention mechanism plays important roles in this model. When you learn about Transformer model, you will see the figure below, which is used in the original paper on Transformer.  This is the simplified overall structure of one layer of Transformer model, and you stack this layer N times. In one layer of Transformer, there are three multi-head attention, which are displayed as boxes in orange. These are the very parts which compare the tokens on several standards. I made the head article of this article series inspired by this multi-head attention mechanism.

The figure below is also from the original paper on Transfromer. If you can understand how multi-head attention mechanism works with the explanations in the paper, and if you have no troubles understanding the codes in the official Tensorflow tutorial, I have to say this article is not for you. However I bet that is not true of majority of people, and at least I need one article to clearly explain how multi-head attention works. Please keep it in mind that this article covers only the architectures of the two figures below. However multi-head attention mechanisms are crucial components of Transformer model, and throughout this article, you would not only see how they work but also get a little control over it at an implementation level.

1 Multi-head attention mechanism

When you learn Transformer model, I recommend you first to pay attention to multi-head attention. And when you learn multi-head attentions, before seeing what scaled dot-product attention is, you should understand the whole structure of multi-head attention, which is at the right side of the figure above. In order to calculate attentions with a “query”, as I said in the last article, “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” Sooner or later, you will notice I would be just repeating these phrases over and over again throughout this article, in several ways.

*Even if you are not sure what “reweighting” means in this context, please keep reading. I think you would little by little see what it means especially in the next section.

The overall process of calculating multi-head attention, displayed in the figure above, is as follows (Please just keep reading. Please do not think too much.): first you split the V: “values”, K: “keys”, and Q: “queries”, and second you transform those divided “values”, “keys”, and “queries” with densely connected layers (“Linear” in the figure). Next you calculate attention weights and reweight the “values” and take the summation of the reiweighted “values”, and you concatenate the resulting summations. At the end you pass the concatenated “values” through another densely connected layers. The mechanism of scaled dot-product attention is just a matter of how to concretely calculate those attentions and reweight the “values”.

*In the last article I briefly mentioned that “keys” and “queries” can be in the same language. They can even be the same sentence in the same language, and in this case the resulting attentions are called self-attentions, which we are mainly going to see. I think most people calculate “self-attentions” unconsciously when they speak. You constantly care about what “she”, “it” , “the”, or “that” refers to in you own sentence, and we can say self-attention is how these everyday processes is implemented.

Let’s see the whole process of calculating multi-head attention at a little abstract level. From now on, we consider an example of calculating multi-head self-attentions, where the input is a sentence “Anthony Hopkins admired Michael Bay as a great director.” In this example, the number of tokens is 9, and each token is encoded as a 512-dimensional embedding vector. And the number of heads is 8. In this case, as you can see in the figure below, the input sentence “Anthony Hopkins admired Michael Bay as a great director.” is implemented as a 9\times 512 matrix. You first split each token into 512/8=64 dimensional, 8 vectors in total, as I colored in the figure below. In other words, the input matrix is divided into 8 colored chunks, which are all 9\times 64 matrices, but each colored matrix expresses the same sentence. And you calculate self-attentions of the input sentence independently in the 8 heads, and you reweight the “values” according to the attentions/weights. After this, you stack the sum of the reweighted “values”  in each colored head, and you concatenate the stacked tokens of each colored head. The size of each colored chunk does not change even after reweighting the tokens. According to Ashish Vaswani, who invented Transformer model, each head compare “queries” and “keys” on each standard. If the a Transformer model has 4 layers with 8-head multi-head attention , at least its encoder has 4\times 8 = 32 heads, so the encoder learn the relations of tokens of the input on 32 different standards.

I think you now have rough insight into how you calculate multi-head attentions. In the next section I am going to explain the process of reweighting the tokens, that is, I am finally going to explain what those colorful lines in the head image of this article series are.

*Each head is randomly initialized, so they learn to compare tokens with different criteria. The standards might be straightforward like “what” or “who”, or maybe much more complicated. In attention mechanisms in deep learning, you do not need feature engineering for setting such standards.

2 Calculating attentions and reweighting “values”

If you have read the last article or if you understand attention mechanism to some extent, you should already know that attention mechanism calculates attentions, or relevance between “queries” and “keys.” In the last article, I showed the idea of weights as a histogram, and in that case the “query” was the hidden state of the decoder at every time step, whereas the “keys” were the outputs of the encoder. In this section, I am going to explain attention mechanism in a more abstract way, and we consider comparing more general “tokens”, rather than concrete outputs of certain networks. In this section each [ \cdots ] denotes a token, which is usually an embedding vector in practice.

Please remember this mantra of attention mechanism: “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” The figure below shows an overview of a case where “Michael” is a query. In this case you compare the query with the “keys”, that is, the input sentence “Anthony Hopkins admired Michael Bay as a great director.” and you get the histogram of attentions/weights. Importantly the sum of the weights 1. With the attentions you have just calculated, you can reweight the “values,” which also denote the same input sentence. After that you can finally take a summation of the reweighted values. And you use this summation.

*I have been repeating the phrase “reweighting ‘values’  with attentions,”  but you in practice calculate the sum of those reweighted “values.”

Assume that compared to the “query”  token “Michael”, the weights of the “key” tokens “Anthony”, “Hopkins”, “admired”, “Michael”, “Bay”, “as”, “a”, “great”, and “director.” are respectively 0.06, 0.09, 0.05, 0.25, 0.18, 0.06, 0.09, 0.06, 0.15. In this case the sum of the reweighted token is 0.06″Anthony” + 0.09″Hopkins” + 0.05″admired” + 0.25″Michael” + 0.18″Bay” + 0.06″as” + 0.09″a” + 0.06″great” 0.15″director.”, and this sum is the what wee actually use.

*Of course the tokens are embedding vectors in practice. You calculate the reweighted vector in actual implementation.

You repeat this process for all the “queries.”  As you can see in the figure below, you get summations of 9 pairs of reweighted “values” because you use every token of the input sentence “Anthony Hopkins admired Michael Bay as a great director.” as a “query.” You stack the sum of reweighted “values” like the matrix in purple in the figure below, and this is the output of a one head multi-head attention.

3 Scaled-dot product

This section is a only a matter of linear algebra. Maybe this is not even so sophisticated as linear algebra. You just have to do lots of Excel-like operations. A tutorial on Transformer by Jay Alammar is also a very nice study material to understand this topic with simpler examples. I tried my best so that you can clearly understand multi-head attention at a more mathematical level, and all you need to know in order to read this section is how to calculate products of matrices or vectors, which you would see in the first some pages of textbooks on linear algebra.

We have seen that in order to calculate multi-head attentions, we prepare 8 pairs of “queries”, “keys” , and “values”, which I showed in 8 different colors in the figure in the first section. We calculate attentions and reweight “values” independently in 8 different heads, and in each head the reweighted “values” are calculated with this very simple formula of scaled dot-product: Attention(\boldsymbol{Q}, \boldsymbol{K}, \boldsymbol{V}) =softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})\boldsymbol{V}. Let’s take an example of calculating a scaled dot-product in the blue head.

At the left side of the figure below is a figure from the original paper on Transformer, which explains one-head of multi-head attention. If you have read through this article so far, the figure at the right side would be more straightforward to understand. You divide the input sentence into 8 chunks of matrices, and you independently put those chunks into eight head. In one head, you convert the input matrix by three different fully connected layers, which is “Linear” in the figure below, and prepare three matrices Q, K, V, which are “queries”, “keys”, and “values” respectively.

*Whichever color attention heads are in, the processes are all the same.

*You divide \frac{\boldsymbol{Q} \boldsymbol{K}} ^T by \sqrt{d}_k in the formula. According to the original paper, it is known that re-scaling \frac{\boldsymbol{Q} \boldsymbol{K}} ^T by \sqrt{d}_k is found to be effective. I am not going to discuss why in this article.

As you can see in the figure below, calculating Attention(\boldsymbol{Q}, \boldsymbol{K}, \boldsymbol{V}) is virtually just multiplying three matrices with the same size (Only K is transposed though). The resulting 9\times 64 matrix is the output of the head.

softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k}) is calculated like in the figure below. The softmax function regularize each row of the re-scaled product \frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k}, and the resulting 9\times 9 matrix is a kind a heat map of self-attentions.

The process of comparing one “query” with “keys” is done with simple multiplication of a vector and a matrix, as you can see in the figure below. You can get a histogram of attentions for each query, and the resulting 9 dimensional vector is a list of attentions/weights, which is a list of blue circles in the figure below. That means, in Transformer model, you can compare a “query” and a “key” only by calculating an inner product. After re-scaling the vectors by dividing them with \sqrt{d_k} and regularizing them with a softmax function, you stack those vectors, and the stacked vectors is the heat map of attentions.

You can reweight “values” with the heat map of self-attentions, with simple multiplication. It would be more straightforward if you consider a transposed scaled dot-product \boldsymbol{V}^T \cdot softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})^T. This also should be easy to understand if you know basics of linear algebra.

One column of the resulting matrix (\boldsymbol{V}^T \cdot softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})^T) can be calculated with a simple multiplication of a matrix and a vector, as you can see in the figure below. This corresponds to the process or “taking a summation of reweighted ‘values’,” which I have been repeating. And I would like you to remember that you got those weights (blue) circles by comparing a “query” with “keys.”

Again and again, let’s repeat the mantra of attention mechanism together: “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” If you have been patient enough to follow my explanations, I bet you have got a clear view on how multi-head attention mechanism works.

We have been seeing the case of the blue head, but you can do exactly the same procedures in every head, at the same time, and this is what enables parallelization of multi-head attention mechanism. You concatenate the outputs of all the heads, and you put the concatenated matrix through a fully connected layers.

If you are reading this article from the beginning, I think this section is also showing the same idea which I have repeated, and I bet more or less you no have clearer views on how multi-head attention mechanism works. In the next section we are going to see how this is implemented.

4 Tensorflow implementation of multi-head attention

Let’s see how multi-head attention is implemented in the Tensorflow official tutorial. If you have read through this article so far, this should not be so difficult. I also added codes for displaying heat maps of self attentions. With the codes in this Github page, you can display self-attention heat maps for any input sentences in English.

The multi-head attention mechanism is implemented as below. If you understand Python codes and Tensorflow to some extent, I think this part is relatively easy.  The multi-head attention part is implemented as a class because you need to train weights of some fully connected layers. Whereas, scaled dot-product is just a function.

*I am going to explain the create_padding_mask() and create_look_ahead_mask() functions in upcoming articles. You do not need them this time.

Let’s see a case of using multi-head attention mechanism on a (1, 9, 512) sized input tensor, just as we have been considering in throughout this article. The first axis of (1, 9, 512) corresponds to the batch size, so this tensor is virtually a (9, 512) sized tensor, and this means the input is composed of 9 512-dimensional vectors. In the results below, you can see how the shape of input tensor changes after each procedure of calculating multi-head attention. Also you can see that the output of the multi-head attention is the same as the input, and you get a 9\times 9 matrix of attention heat maps of each attention head.

I guess the most complicated part of this implementation above is the split_head() function, especially if you do not understand tensor arithmetic. This part corresponds to splitting the input tensor to 8 different colored matrices as in one of the figures above. If you cannot understand what is going on in the function, I recommend you to prepare a sample tensor as below.

This is just a simple (1, 9, 512) sized tensor with sequential integer elements. The first row (1, 2, …., 512) corresponds to the first input token, and (4097, 4098, … , 4608) to the last one. You should try converting this sample tensor to see how multi-head attention is implemented. For example you can try the operations below.

These operations correspond to splitting the input into 8 heads, whose sizes are all (9, 64). And the second axis of the resulting (1, 8, 9, 64) tensor corresponds to the index of the heads. Thus sample_sentence[0][0] corresponds to the first head, the blue 9\times 64 matrix. Some Tensorflow functions enable linear calculations in each attention head, independently as in the codes below.

Very importantly, we have been only considering the cases of calculating self attentions, where all “queries”, “keys”, and “values” come from the same sentence in the same language. However, as I showed in the last article, usually “queries” are in a different language from “keys” and “values” in translation tasks, and “keys” and “values” are in the same language. And as you can imagine, usualy “queries” have different number of tokens from “keys” or “values.” You also need to understand this case, which is not calculating self-attentions. If you have followed this article so far, this case is not that hard to you. Let’s briefly see an example where the input sentence in the source language is composed 9 tokens, on the other hand the output is composed 12 tokens.

As I mentioned, one of the outputs of each multi-head attention class is 9\times 9 matrix of attention heat maps, which I displayed as a matrix composed of blue circles in the last section. The the implementation in the Tensorflow official tutorial, I have added codes to display actual heat maps of any input sentences in English.

*If you want to try displaying them by yourself, download or just copy and paste codes in this Github page. Please maker “datasets” directory in the same directory as the code. Please download “spa-eng.zip” from this page, and unzip it. After that please put “spa.txt” on the “datasets” directory. Also, please download the “checkpoints_en_es” folder from this link, and place the folder in the same directory as the file in the Github page. In the upcoming articles, you would need similar processes to run my codes.

After running codes in the Github page, you can display heat maps of self attentions. Let’s input the sentence “Anthony Hopkins admired Michael Bay as a great director.” You would get a heat maps like this.

In fact, my toy implementation cannot handle proper nouns such as “Anthony” or “Michael.” Then let’s consider a simple input sentence “He admired her as a great director.” In each layer, you respectively get 8 self-attention heat maps.

I think we can see some tendencies in those heat maps. The heat maps in the early layers, which are close to the input, are blurry. And the distributions of the heat maps come to concentrate more or less diagonally. At the end, presumably they learn to pay attention to the start and the end of sentences.

You have finally finished reading this article. Congratulations.

You should be proud of having been patient, and you passed the most tiresome part of learning Transformer model. You must be ready for making a toy English-German translator in the upcoming articles. Also I am sure you have understood that Michael Bay is a great director, no matter what people say.

*Hannibal Lecter, I mean Athony Hopkins, also wrote a letter to the staff of “Breaking Bad,” and he told them the tv show let him regain his passion. He is a kind of admiring around, and I am a little worried that he might be getting senile. He played a role of a father forgetting his daughter in his new film “The Father.” I must see it to check if that is really an acting, or not.

[References]

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, “Attention Is All You Need” (2017)

[2] “Transformer model for language understanding,” Tensorflow Core
https://www.tensorflow.org/overview

[3] “Neural machine translation with attention,” Tensorflow Core
https://www.tensorflow.org/tutorials/text/nmt_with_attention

[4] Jay Alammar, “The Illustrated Transformer,”
http://jalammar.github.io/illustrated-transformer/

[5] “Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention,” stanfordonline, (2019)
https://www.youtube.com/watch?v=5vcj8kSwBCY

[6]Tsuboi Yuuta, Unno Yuuya, Suzuki Jun, “Machine Learning Professional Series: Natural Language Processing with Deep Learning,” (2017), pp. 91-94
坪井祐太、海野裕也、鈴木潤 著, 「機械学習プロフェッショナルシリーズ 深層学習による自然言語処理」, (2017), pp. 191-193

[7]”Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 8 – Translation, Seq2Seq, Attention”, stanfordonline, (2019)
https://www.youtube.com/watch?v=XXtpJxZBa2c

[8]Rosemary Rossi, “Anthony Hopkins Compares ‘Genius’ Michael Bay to Spielberg, Scorsese,” yahoo! entertainment, (2017)
https://www.yahoo.com/entertainment/anthony-hopkins-transformers-director-michael-bay-guy-genius-010058439.html

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

Why Is Physical Security Vital for Data Security?

Modern businesses hold on to an increasing amount of sensitive and sometimes confidential data. As a result, they’ve had to invest in new technology and practices to keep that data safe.

Many of these businesses, when developing their data security or cybersecurity protocols, focus on the security of their hardware, software and business network. Prioritizing these assets is essential — however, if physical security gets left behind, even the best digital tech may not keep a company’s data safe.

There’s practically no stopping someone with physical access to your data storage from stealing info or compromising your business network.

This is why companies that prioritize digital security also need to carefully consider physical security — and what may happen when physical security is neglected.

Physical Access Can Allow Criminals to Bypass Even the Best Digital Security

It’s almost impossible to protect any device from a physical attack. If a hacker has sustained access to device hardware, they’ll be able to breach its defenses eventually — potentially giving them access to the information on that device, as well as any stored security credentials.

Devices that are digitally secured but not physically secured — like a laptop left behind in a coffee shop, or an IoT sensor in an unlocked case — can provide a valuable vector of attack for hackers. In some cases, that vector may be all they need to create serious trouble for a company.

In some cases, poor building security may enable hackers to sneak into server rooms or gain access to off-site devices, like IoT sensors. Often, hackers also gain access to hardware either by theft — for example, swiping a laptop left sitting in a coffee shop — or by using social engineering to gain remote access.

Even large devices that are rarely moved or accessed by staff — like servers in a data center — can be at risk.

This is why large, high-budget data centers often have what’s colloquially called a mantrap — a set of two interlocking doors, somewhat like an airlock, that one has  to pass through to reach the server hardware. These doors serve as a final access check for the data center and help to minimize the risk of unauthorized server access.

These threats aren’t an abstraction — hackers and other criminals have used physical access to steal data in the past.

In 2015, for example, hackers stole five servers from the offices of a British charity, PlanUK. Those servers contained a wealth of information on donators, including names, addresses, bank account numbers and sort codes.

In 2018, the theft of a laptop exposed the data of more than 43,000 patients of the West Virginia-based Coplin Health System — part of the reason that laptop theft is ranked the number one cause of health data breaches.

Valuable Hardware and Essential Systems May Be at High Risk

Hackers may also use physical attack vectors if they need to gain access to critical infrastructure, which may otherwise be air-gapped from internet-connected systems and impossible to attack with digital-only methods.

This is part of why major physical security manufacturers dedicate entire product lines to physical security for nuclear power plants, for example, or airports or international organizations — and why those kinds of institutions take physical security so seriously.

Enterprise-grade computer hardware can also be very valuable — making that hardware a major target. While you may expect criminals to be driven more by data or network access than by the resale value of your servers, theft for resale or reuse has happened before.

In 2018, for example, Icelandic criminals stole 600 bitcoin-mining servers in one of the biggest tech heists on record. Rising cryptocurrency prices may encourage some criminals to plan similar heists of powerful hardware. Owners of data centers, rendering farms and other facilities with high-value hardware should be aware of these risks, as well as how good physical security is necessary to keep their hardware safe.

Using Physical Security to Complement Your Digital Security Planning

Without strong physical security practices, your data can be vulnerable — even if you have a great digital security plan in place.

Hackers, when faced with strong cyber defenses, sometimes turn to physical attacks to gain access to critical hardware. In other cases, they may also be after the hardware for sale or personal use.

Even a basic physical security plan — one that involves ID verification and access control — can go a long way in complementing a digital security strategy and keeping data safe.

Support Vector Machines for Text Recognition

Hand Written Alphabet recognition Using Support Vector Machine

We have used image classification as an task in many cases, more often this has been done using an module like openCV in python or using pre-trained models like in case of MNIST data sets. The idea of using Support Vector Machines for carrying out the same task is to give a simpler approach for a complicated process. There are some pro’s and con’s in every algorithm. Support vector machine for data with very high dimension may prove counter productive. But in case of image data we are actually using a array. If its a mono chrome then its just a 2 dimensional array, if grey scale or color image stack then we may have a 3 dimensional array processing to be considered. You can get more clarity on the array part if you go through this article on Machine learning using only numpy array. While there are certainly advantages of using OCR packages like Tesseract or OpenCV or GPTs, I am putting forth this approach of using a simple SVM model for hand written text classification. As a student while doing linear regression, I learn’t a principle “Occam’s Razor”, Basically means, keep things simple if they can explain what you want to. In short, the law of parsimony, simplify and not complicate. Applying the same principle on Hand written Alphabet recognition is an attempt to simplify using a classic algorithm, the Support Vector Machine. We break the  problem of hand written alphabet recognition into a simple process rather avoiding usage of heavy packages. This is an attempt to create the data and then build a model using Support Vector Machines for Classification.

Data Preparation

Manually edit the data instead of downloading it from the web. This will help you understand your data from the beginning. Manually write some letters on white paper and get the photo from your mobile phone. Then store it on your hard drive. As we are doing a trial we don’t want to waste a lot of time in data creation at this stage, so it’s a good idea to create two or three different characters for your dry run. You may need to change the code as you add more instances of classes, but this is where the learning phase begins. We are now at the training level.

Data Structure

You can create the data yourself by taking standard pictures of hand written text in a 200 x 200 pixel dimension. Alternatively you can use a pen tab to manually write these alphabets and save them as files. If you know and photo editing tools you can use them as well. For ease of use, I have already created a sample data and saved it in the structure as below.

Image Source : From Author

You can download the data which I have used, right click on this download data link and open in new tab or window. Then unzip the folders and you should be able to see the same structure and data as above in your downloads folder. I would suggest, you should create your own data and repeat the  process. This would help you understand the complete flow.

Install the Dependency Packages for RStudio

We will be using the jpeg package in R for Image handling and the SVM implementation from the kernlab package.  Also we need to make sure that the image data has dimension’s of 200 x 200 pixels, with a horizontal and vertical resolution of 120dpi. You can vary the dimension’s like move it to 300 x 300 or reduce it to 100 x 100. The higher the dimension, you will need more compute power. Experiment around the color channels and resolution later once you have implemented it in the current form.

# install package "jpeg"

  install.packages("jpeg", dependencies = TRUE)

 

# install the "kernlab" package for building the model using support vector machines

install.packages("kernlab", dependencies  = TRUE)

Load the training data set

# load the "jpeg" package for reading the JPEG format files
library(jpeg)

# set the working directory for reading the training image data set
setwd("C:/Users/mohan/Desktop/alphabet_folder/Train")

# extract the directory names for using as image labels

f_train<-list.files()
 

# Create an empty data frame to store the image data labels and the extracted new features in training environment

df_train<- data.frame(matrix(nrow=0,ncol=5))

Feature Transformation

Since we don’t intend to use the typical CNN, we are going to use the white, grey and black pixel values for new feature creation. We will use the summation of all the pixel values of a image  and save it as a new feature called as “sum”, the count of all pixels adding up to zero as “zero”, the count of all pixels adding up to “ones” and the sum of all pixels between zero’s and one’s as “in_between”. The “label” feature names are extracted from the names of the folder

# names the columns of the data frame as per the feature name schema

names(df_train)<- c("sum","zero","one","in_between","label")

# loop to compute as per the logic

counter<-1 

for(i in 1:length(f_train))

{

setwd(paste("C:/Users/mohan/Desktop/alphabet_folder/Train/",f_train[i],sep=""))

 

data_list<-list.files()

 

  for(j in 1:length(data_list))

  {

    temp<- readJPEG(data_list[j])

    df_train[counter,1]<- sum(temp)

    df_train[counter,2]<- sum(temp==0)

    df_train[counter,3]<- sum(temp==1)

    df_train[counter,4]<- sum(temp > 0 & temp < 1)

    df_train[counter,5]<- f_train[i]

    counter=counter+1

  }

}


# Convert the labels from text to factor form

df_train$label<- factor(df_train$label)

Support Vector Machine model

# load the "kernlab" package for accessing the support vector machine implementation

library(kernlab)


# build the model using the training data

image_classifier <- ksvm(label~.,data=df_train)

Evaluate the Model on the Testing Data Set

# set the working directory for reading the testing image data set

setwd("C:/Users/mohan/Desktop/alphabet_folder/Test")


# extract the directory names for using as image labels

f_test <- list.files()

 
# Create an empty data frame to store the image data labels and the extracted new features in training environment

df_test<- data.frame(matrix(nrow=0,ncol=5))

 
# Repeat of feature extraction in test data

names(df_test)<- c("sum","zero","one","in_between","label")

 
# loop to compute as per the logic

for(i in 1:length(f_test))

{

temp<- readJPEG(f_test[i])

df_test[i,1]<- sum(temp)

df_test[i,2]<- sum(temp==0)

df_test[i,3]<- sum(temp==1)

df_train[counter,4]<- sum(temp > 0 & temp < 1)

df_test[i,5]<- strsplit(x = f_test[i],split = "[.]")[[1]][1]

}
 

# Use the classifier named "image_classifier" built in train environment to predict the outcomes on features in Test environment

df_test$label_predicted<- predict(image_classifier,df_test)
 

# Cross Tab for Classification Metric evaluation

table(Actual_data=df_test$label,Predicted_data=df_test$label_predicted) 

I would recommend you to learn concepts of SVM which couldn’t be explained completely in this article by going through my free Data Science and Machine Learning video courses. We have created the classifier using the Kerlab package in R, but I would advise you to study the mathematics involved in Support vector machines to get a clear understanding.

Data Security for Data Scientists & Co. – Infographic

Data becomes information and information becomes knowledge. For this reason, companies are nowadays also evaluated with regard to their data and their data quality. Furthermore, data is also the material that is needed for management decisions and artificial intelligence. For this reason, IT Security is very important and special consulting and auditing companies offer their own services specifically for the security of IT systems.

However, every Data Scientist, Data Analyst and Data Engineer rarely only works with open data, but rather intensively with customer data. Therefore, every expert for the storage and analysis of data should at least have a basic knowledge of Data Security and work according to certain principles in order to guarantee the security of the data and the legality of the data processing.

There are a number of rules and principles for data security that must be observed. Some of them – in our opinion the most important ones – we from DATANOMIQ have summarized in an infographic for Data Scientists, Data Analysts and Data Engineers. You can download the infographic here: DataSecurity_Infographic

Data Security for Data Scientists, Data Analysts and Data Engineers

Data Security for Data Scientists, Data Analysts and Data Engineers

Download Infographic as PDF

Infographic - Data Security for Data Scientists, Data Analysts and Data Engineers

Infographic – Data Security for Data Scientists, Data Analysts and Data Engineers

In-memory Caching in Finance

Big data has been gradually creeping into a number of industries through the years, and it seems there are no exceptions when it comes to what type of business it plans to affect. Businesses, understandably, are scrambling to catch up to new technological developments and innovations in the areas of data processing, storage, and analytics. Companies are in a race to discover how they can make big data work for them and bring them closer to their business goals. On the other hand, consumers are more concerned than ever about data privacy and security, taking every step to minimize the data they provide to the companies whose services they use. In today’s ever-connected, always online landscape, however, every company and consumer engages with data in one way or another, even if indirectly so.

Despite the reluctance of consumers to share data with businesses and online financial service providers, it is actually in their best interest to do so. It ensures that they are provided the best experience possible, using historical data, browsing histories, and previous purchases. This is why it is also vital for businesses to find ways to maximize the use of data so they can provide the best customer experience each time. Even the more traditional industries like finance have gradually been exploring the benefits they can gain from big data. Big data in the financial services industry refers to complex sets of data that can help provide solutions to the business challenges financial institutions and banking companies have faced through the years. Considered today as a business imperative, data management is increasingly leveraged in finance to enhance processes, their organization, and the industry in general.

How Caching Can Boost Performance in Finance

In computing, caching is a method used to manage frequently accessed data saved in a system’s main memory (RAM). By using RAM, this method allows quick access to data without placing too much load on the main data stores. Caching also addresses the problems of high latency, network congestion, and high concurrency. Batch jobs are also done faster because request run times are reduced—from hours to minutes and from minutes to mere seconds. This is especially important today, when a host of online services are available and accessible to users. A delay of even a few seconds can lead to lost business, making both speed and performance critical factors to business success. Scalability is another aspect that caching can help improve by allowing finance applications to scale elastically. Elastic scalability ensures that a business is equipped to handle usage peaks without impacting performance and with the minimum required effort.

Below are the main benefits of big data and in-memory caching to financial services:

  • Big data analytics integration with financial models
    Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
    Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
  • Real-time stock market insights
    As data volumes grow, data management becomes a vital factor to business success. Stock markets and investors around the globe now rely on advanced algorithms to find patterns in data that will help enable computers to make human-like decisions and predictions. Working in conjunction with algorithmic trading, big data can help provide optimized insights to maximize portfolio returns. Caching can consequently make the process smoother by making access to needed data easier, quicker, and more efficient.
  • Customer analytics
    Understanding customer needs and preferences is the heart and soul of data management, and, ultimately, it is the goal of transforming complex datasets into actionable insights. In banking and finance, big data initiatives focus on customer analytics and providing the best customer experience possible. By focusing on the customer, companies are able to Ieverage new technologies and channels to anticipate future behaviors and enhance products and services accordingly. By building meaningful customer relationships, it becomes easier to create customer-centric financial products and seize market opportunities.
  • Fraud detection and risk management
    In the finance industry, risk is the primary focus of big data analytics. It helps in identifying fraud and mitigating operational risk while ensuring regulatory compliance and maintaining data integrity. In this aspect, an in-memory cache can help provide real-time data that can help in identifying fraudulent activities and the vulnerabilities that caused them so that they can be avoided in the future.

What Does This Mean for the Finance Industry?

Big data is set to be a disruptor in the finance sector, with 70% of companies citing big data as a critical factor of the business. In 2015 alone, financial service providers spent $6.4 billion on data-related applications, with this spending predicted to increase at a rate of 26% per year. The ability to anticipate risk and pre-empt potential problems are arguably the main reasons why the finance industry in general is leaning toward a more data-centric and customer-focused model. Data analysis is also not limited to customer data; getting an overview of business processes helps managers make informed operational and long-term decisions that can bring the company closer to its objectives. The challenge is taking a strategic approach to data management, choosing and analyzing the right data, and transforming it into useful, actionable insights.