AI Platforms – A Comprehensive Guide

A comprehensive guide compiled to introduce readers to AI platforms, their types, and benefits. A concluding section to discuss AI platform selection strategy with Attri’s Best of Breed approach to build AI platforms. 

Don’t you think that this century is really fortunate? In my opinion, the answer is yes; we witnessed technological transformations and their miracles that created substantial changes in our lifestyle. While talking about these life-changing technological revolutions, AI or artificial intelligence deserves a front seat due to its incredible contribution and capabilities. Now everyone knows AI has limitless potential simply from creating funny faces in mobile to taking informed and intelligent business decisions. In the last 50 years, we have progressed by leaps and bounds to give machines the ability to understand, help and mimic us.

Artificial intelligence enables machines to imitate human intelligence across a variety of domains ranging from problem-solving and reasoning to General Intelligence and in-depth knowledge representation. With tremendous progress in AI, another enabler came into existence and received attention—AI platforms. AI-platform is a layer that integrates all the tools and processes required to build, deploy and monitor ML models. In this article, we shall go through the various aspects of AI platforms covering a range of topics like AI Platform types, the benefits such platforms entail, selection strategy in detail as well as a brief look into Attri’s industry contribution with an Open AI Platform.

Diving Deeper With AI Platforms

The AI Platform acts as a layer over your current AI infrastructure and integrates all the tools and processes required to develop ML models. It provides you the flexibility to integrate all your ML models under a single roof. With this flexibility, you can create and deploy several ML models over the platform. Further, you can even monitor these models to confirm that they are serving their intended purpose. AI platform makes your AI adoption easy by attaining the following requirements–

  • Use of vast data to develop ML solutions.
  • Ensure transparency and reproducibility within a project
  • Accelerate collaboration and governance within teams
  • Ensure scalability for ever-growing machine learning demands

An ideal AI platform should ensure the following features for better addressing different challenges.

  • Seamless access control: Ensure robust access control to team members in order to conquer the challenge of centralized data access with AI projects.
  • Excellent monitoring: Integrate top-notch observability practices while developing ML models.
  • Data and technology-agnostic integration: Seamless experience to enterprises with infrastructure set up responsibility handed over to platform providers
  • All-inclusive Platform: Single platform to facilitate all underlying tasks from data preparation to model deployment
  • Continuous Improvement: Ability to produce and deploy models as a reproducible package and thereby integrate changes with models that are already in production
  • Rapid Processing: Faster data preparation and powerful visual interfaces

AI Platform Classification

With loads of AI platform providers available in the market, AI platform classification becomes a tough job, as it requires thinking separately on each platform’s offerings, its features, and cost factors. Also, you need to check whether AI solutions are open source AI platforms or proprietary offerings.

We have decided to present an AI platform classification based on its striking features and offerings. With this, we have classified AI platforms across three main classes—

  • AI cloud-based platforms
  • AI conversational platforms
  • No code AI platforms

Cloud based AI Platforms

All major cloud providers offer cloud-based AI platforms to boost businesses with AI capabilities. With cloud AI platforms, enterprises can leverage cloud providers’ matchless technical expertise to overcome affordability and data requirement challenges associated with AI implementation. Cloud-based AI offerings benefit businesses with economic AI solutions, defined and pre-packaged services, lower risks, and modern technology.

Amazon Web Services

AWS offers a comprehensive set of AI solutions to conquer major hurdles in the AI adoption journey of businesses. AWS has been recognized as the topmost cloud AI partner with its broad capable portfolio. AWS pre-trained models cater to diverse use cases like forecasting, recommendations, computer vision, language interpretation, customer engagement, and safety for deploying ML models at scale. Amazon also provides text analytics, NLP, chatbots, and document analysis solutions. Fully managed AWS packages amplify your experience with minimum resource requirements and wizard-based friendly model development experience. Hence, AWS is one of the top cloud AI partners that cater to your AI adoption needs.

Google cloud

 The Google Cloud Platform (GCP) is a Google offering for cloud-driven computing services devised to support multiple use cases such as hosting containerized applications, massive-scale data analytics platforms, and even applying ML and AI for business use cases. Google AI Platform is a Google Cloud offering that helps build, deploy and manage machine learning models in the cloud.

Google leverages enterprise AI experience through its consumer-facing products. Google helps improve customer satisfaction through Contact Center AI. Google offering DialogFlow CX is used to create advanced chatbots that handle customer messaging, response, and voice recognition. Digiflow is applied to create virtual agents for messaging services, mobile apps, and IoT devices.

Google’s Cloud Vision API is beneficial to recognize objects, logos, and landmarks within content or images. Google provides Natural Language API to bring more clarity in content classification, entities, syntax, and sentiments. Further, Google speech API helps in converting audio to text and recognizing 110 languages.

Google’s Cloud ML services facilitate better decision-making with end-to-end ML solutions. Google offers an all-inclusive ML development platform that enables effective decision-making backed by explainable AI, continuous evaluation, data labeling, pipelines, training, and what-if tool. This platform is based on the TensorFlow framework and it enables building predictive models for various scenarios.

Kubeflow is a Cloud-Native and open-source platform that helps you build portable ML pipelines that can be executed on-premises or on the cloud. With this, you can access Google technologies like TPUs, TensorFlow, and TFX tools as you deploy your ML models in production.

For expert ML developers, Google provides an Open Source AI platform with TensorFlow models that are trained for various scenarios. It offers an excellent prediction service using trained models.

Microsoft Azure

Similar to Amazon Web Services, Microsoft Azure ML capabilities are based on its real-time and live applications. Azure provides superior machine learning capabilities to develop, train, and deploy machine learning models through Azure Machine Learning, Azure Databricks, and ONNX.

  • Azure Machine Learning

A Python-based ML service to facilitate automated machine learning.

  • ONNX

An open-source model format enables machine learning through various frameworks and hardware platforms of the user’s choice.

  • Azure Cognitive Search

Formerly known as Azure Search,this is the only cloud search service that allows built-in AI capabilities to explore content effectively at scale. Microsoft empowers the user with cognitive search services like text analytics, translation, document analytics, custom vision, and Azure Machine Learning solutions.

IBM Cloud

IBM has brought Watson studio a data analysis application to accelerate innovation and ML-centric practices in business.  IBM Cloud AI Platform offers 170 services with more emphasis on data-speech conversions and analytics. Watson Studio offers an all-inclusive suite to work with data and train, build and deploy ML models.

An innovative giant IBM also brought AI based learning platform recently to aid academic stakeholder like students, researchers and teachers.

AI Conversational Platforms

Conversational AI opens new doors for automated conversations between an enterprise and its customers. These conversations include messaging or voice-based communication platforms to enable text or audio-based conversation.

Conversational platforms leverage your customer experience with a range of applications such as follow-up, guidance, or the resolution of customer queries and round-the-clock support. These platforms are beneficial to drive more leads, increase conversions by cross-selling and upselling, promotional efforts, customer research, queries resolution and customer feedback handling, etc.

AI technology helps systems to mimic human conversations to a certain level and with great accuracy. An AI offering- Natural Language processing is used to shape these conversations by understanding intent, text, speech, and languages.

Intelligent Virtual Assistants

The intelligent virtual assistants represent an advanced level of Conversational AI and their discussion is incomplete without a mention to Siri and Alexa. Most popular intelligent virtual assistants include Siri by Apple, Alexa by Amazon, Google Assistant, and Bixby by Samsung. While Alexa performs as a voice assistant for the home, Siri and Bixby stand as mobile assistants with numerous operations support like navigation, text-to-speech, response to weather, quick reply, and address search.

SAP Conversational AI

SAP Conversational AI is one of the leading conversational AI platforms. With its friendly UI and multiple versioning, it offers a better experience of mimicking human conversations. SAP Conversational AI Platform uses NLP to facilitate developing chatbot that works more humanely and serves your customers 24*7. Its striking features include—

  • Simple integration
  • NLP capabilities
  • Analytics tools to help you
  • Multi-language support

Clinc

A powerful self-learning Conversational AI Platform enriched with NLP capabilities and machine learning. It secures top position in the Conversational AI Platform list due to its learning from previous conversations and improving responses over time. Its feature set include—

  • No technical expertise required
  • Self-learning abilities
  • NLP capabilities

Kore.ai

An enterprise-grade Conversational AI Platform to cater to your consumer as well as staff needs. It helps to build a virtual chatbot for any suitable platform without compromising the safety and security standards. Its major features cover—

  • The high degree of customization for chatbots
  • Comprehensive analytics with FAQs and alerts
  • Simple integration with ML models and channels
  • Flexible deployment
  • Supported with a multi-pronged NLP engine

Mindmeld

It is an excellent option as a Deep-Domain Conversational AI Platform with NLP capabilities. It can be used for both text-based and voice-based virtual assistants. This platform effectively caters to multiple industries and their numerous use cases. Check its striking features list—

  • Open-source platform
  • NLP capabilities
  • Supports discovering on-demand video or music
  • Quick chat-based transactions

No Code AI Platforms

As discussed above, AI platform classification necessitates platform considerations from various perspectives. We are introducing another category of AI platforms—No Code AI Platforms. The motivation behind introducing these platforms is to encourage enterprise AI adoption while keeping AI implementation costs low and minimizing dependencies on skilled professionals. Many IT giants are now offering no-code AI Platforms to enterprises for their AI adoption.

Google ML Kit

Google ML Kit comes with Android and iOS and it facilitates the integration of functions with lesser codes or with minimum knowledge of machine learning algorithms. This open source AI Platform supports different features such as text recognition, face detection, and landmark recognition.

RapidMiner Studio

RapidMiner Studio enables powerful data analytics with drag and drop features. Rapidminer Studio allows easy integration with databases, warehouses, social media for easy data access by authorized persons.

ML Platform Selection Strategy

Having discussed so many types of ML platforms, their features, and offerings, the next question is–how to select the best ML Platform for an enterprise AI adoption. Well, to answer this Million-Dollar question, we need to consider a few key aspects, such as

  • Who will use and benefit from the AI Platform? It is required to find out AI platform users here, the data science team, analytics team, developers, and how the platform will benefit each stakeholder.
  • The next aspect is to explore the skill levels of AI platform users, are they competent to handle ML development and analytics requirements with years of experience
  • Proficiency of users with programming languages
  • The next point in finalizing the AI platform strategy is to conclude code-first or code-free approaches to streamline AI workflows. This aspect can be studied by thinking about different attributes such as data preparation ease, feature engineering automation, ML algorithms, Model Deployment ease, and platform integration aspects.

Once you come up with answers to these queries, you will be able to finalize the best AI Platform Selection strategy for your enterprise. It can be a unique cloud platform, or even it can be a hybrid solution with a “best-of-breed” approach.

All-in-one platform strategy involves getting one end-to-end platform for the entire AI project lifecycle from raw data prep to ETL to building and operationalizing models followed by monitoring and governance of systems.

The best-of-breed approach allows using the preferred and custom tools for each phase of the lifecycle and aligning these tools together to build a customized platform solution for AI adoption.

This approach offers an excellent AI platform solution for organizations looking for flexible, inexpensive, change-oriented AI solutions and having a DIY spirit. With this mix-and-match approach, you can combine APIs offered by different cloud platforms and deliver AI solutions that cater to your AI use cases. Organizations using the best-of-breed approach are more comfortable with technology shifts with their abilities to use, adopt and swap out tools as requirement changes.

Business Process AI Transformation Simplified With Attri’s Open AI Platform

At Attri, we provide AI platform solutions to diverse industry verticals. With our flagship Open AI Platform, we heighten your AI adoption experience with a rich array of platform features like—

  • Customizable best-of-breed architecture
  • Utilize existing infrastructure
  • AI as a platform solution
  • Reduced effort in migrating to a new technology
  • Centralized Monitoring and Governance
  • Explainable and Responsible AI

We help you achieve your business process transformation goals with our unique AI offerings such as Open AI Platform  and Open AI solutions.

Our AI platform assures multiple benefits to your enterprise while keeping AI adoptions costs low and ensuring faster AI implementations. We can summarize the benefits of Attri Open AI Platform as under–

No efforts in reinventing complete AI suites

Attri’s AI Platform integrates multiple AI services and eliminates the need for reinventing complete AI suites. The platform delights enterprises with scalability, the ability to reuse current infrastructure, and customizable architecture.

Accelerated Go To Market

Attri’s Open AI Platform ensures accelerated GTM with a sincere approach to testing, reviewing, and finalizing reference templates for different industries.

No vendor lock-in

With Open AI Platform, we bring client-friendly policies such as no vendor lock-in and flexibility to choose their preferred tools and technology.

High reliability

We keep our AI Platform highly reliable with a comprehensive testing approach. We also meet the growing requirements of enterprises by ensuring high scalability with our open AI platform.

Get connected with us for your enterprise AI adoption requirements.

Know more about our Open AI Platform…

Top 10 Python Libraries Of All Time

Python is a very popular and renowned language that has replaced several programming languages in the market. Its amazing collection of libraries makes it a convenient programming language for developers.

Python is an ocean of libraries serving an ample number of purposes and as a developer; you must possess sound knowledge of the 10 libraries. One needs to familiarize themselves with the libraries to go on and work on different projects. For the data scientist, it has been a charmer now.

Here today, for you this is a curated list of 10 Python libraries that can help you along with its significant features, when to use them, and also the benefits.

10 Best Python Libraries of All Times

  1. Pandas: Pandas is an open-source library that offers instant high performance, data analysis, and simple data structures. When can you use it? It can be used for data munging and wrangling. If one is looking for quick data visuals, aggregation, manipulation, and reading, then this library is suitable. You can impute the missing data files, plot the data, and make edits in the data column. Moreover, for renaming and merging, this tool can do wonders. It is a foundation library, and a data scientist should have in-depth knowledge about Pandas before any other library knowledge.
  1. TensorFlow: TensorFlow is developed by Google in collaboration with the Brain Team. Using this tool, you can instantly visualize any part of the graphical representation. It comes with modularity and offers high flexibility in its operations. This library is ideal for running and operating in large scale systems. So, as long as you have good internet connectivity, you can use it because it is an open-source platform. What is the beauty of this library? It comes with an unending list of applications associated with it.
  1. NumPy: NumPy is the most popular Python library used by developers. It is used by various libraries for conducting easy operations. What is the beauty of NumPy? Array Interface is the beauty of NumPy and it is always a highlighted feature. NumPy is interactive and very simple to use. It can instantly solve complicated mathematical problems. With this, you need not worry about daunting phases of coding and offering open-source contributions. This interface is widely used for expressing raw streams, sound waves, and other images. If you are looking to implement this into machine learning, you must possess in-depth knowledge about NumPy.
  1. Keras: Are you looking for a cool Python library? Well, Keras is the coolest machine learning python library. It runs smoothly on both CPU and GPU. Do you want to know where Keras is used? It is used in popular applications like Uber, Swiggy, Netflix, Square, and Yelp. Keras easily supports the fully connected, pooling, convolution, and recurrent neural networks. For any innovative research, it does fine because it is expressive and flexible. Keras is completely based on a framework, which enables easy debugging and exploring. Various large scientific organizations use Keras for innovative research.
  1. Scikit- Learn: If your project deals with complex data, it has to be the Scikit- Learn python library. This Python Machine Learning Library is associated with NumPy and SciPy. After various modifications, one such feather cross-validation is used for enabling more than one metric. It is used for extracting features and data from texts and images. It uses various algorithms to make changes in machine learning. What are its functions? It is used in model selection, classification, clustering, and regression. Various training methods like nearest neighbor and logistics regressions are subjected to minimal modification.
  1. PyTorch: PyTorch is the largest library which conducts various computations and accelerations. Also, it solves complicated application issues that are related to the neural networks. It is completely based on the machine language Torch, which is a free and open-source platform. PyTorch is new but gaining huge popularity and very much a favorite among the developers. Why such popularity? It comes with a hybrid end-user which ensures easy usage and flexibility. For processing natural language applications, this library is used. Do you know what the best part is? It is outperforming and taking the popularity of Tensor Flow in recent times.
  1. MoviePy: The MoviePy is a tool that offers unending functionality related to movies and visuals. It is used for exporting, modifying, and importing various video files. Do you want to add a title to your video or rotate it 90 degrees? Well, MoviePy helps you to do all such tasks related to videos. It is not a tool for manipulating data like Pillow. In any task related to movies and videos in python coding, you can no doubt rely on the functionality of MoviePy. It is designed to conduct all the aspects of a standard task and can get it done instantly. For any common task associated with videos, it has a MoviePy library.
  1. Matplotlib: Matplotlib is no doubt a quintessential python library whose presence can never be forgotten. You can visualize data and create innovative and interesting stories. When can you use it? You can use Matplotlib for embedding different plots into the application as it provides an object-oriented application program interface. Any sort of visualization, be it bar graph, histogram, pie chart, or graphs, Matplotlib can easily depict it. With this library, you can create any type of visualization. Do you want to know what visualizations you can create? You can create a histogram, Bar graph, pie chart, area plot, stem plot, and line plot. It also facilitates the legends, grids, and labels.
  2. Tkinter: Tkinter is a library that can help you create any Python application with the help of a graphical user interface. Tkinter is the most common and easy to use python library for developing apps with GUI. It binds python to the GUI tool kit which can be used in any modern operating system. To create a python GUI, Tkinter is the only best way to start instantly.
  3. Plotly: The Plotly is an essential graph plotting python library for developers. Users can import, copy, paste, export the data that needs to be analyzed and visualized. When can you use it? You can use Plotly to display and create figures and visual images. What is interesting is that it has amazing features for sending data to the various cloud servers.

What are the visual charts prepared with Plotly? You can create line pie, bubble, dot, scatter, and pie. One can also construct financial charts, contours, maps, subplots, carpet, radar, and logs. Do you have anything in your mind which needs to be represented visually? Use Plotly!

Finishing Up

In a nutshell, you have the best python libraries of recent times which contribute hugely to development. If your favorite python library didn’t make it in this list of the top 10 best python libraries, do not take offense.

Python comes with unending library packages, and these 10 are some of its popular and best-used ones. If you are a python developer, these are the best libraries you must have in-depth knowledge of.

Web Scraping Using R..!

In this blog, I’ll show you, How to Web Scrape using R..?

What is R..?

R is a programming language and its environment built for statistical analysis, graphical representation & reporting. R programming is mostly preferred by statisticians, data miners, and software programmers who want to develop statistical software.

R is also available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.

Reasons to choose R

Reasons to choose R

Let’s begin our topic of Web Scraping using R.

Step 1- Select the website & the data you want to scrape.

I picked this website “https://www.alexa.com/topsites/countries/IN” and want to scrape data of Top 50 sites in India.

Data we want to scrape

Data we want to scrape

Step 2- Get to know the HTML tags using SelectorGadget.

In my previous blog, I already discussed how to inspect & find the proper HTML tags. So, now I’ll explain an easier way to get the HTML tags.

You have to go to Google chrome extension (chrome://extensions) & search SelectorGadget. Add it to your browser, it’s a quite good CSS selector.

Step 3- R Code

Evoking Important Libraries or Packages

I’m using RVEST package to scrape the data from the webpage; it is inspired by libraries like Beautiful Soup. If you didn’t install the package yet, then follow the code in the snippet below.

Step 4- Set the url of the website

Step 5- Find the HTML tags using SelectorGadget

It’s quite easy to find the proper HTML tags in which your data is present.

Firstly, I have to click on data using SelectorGadget which I want to scrape, it automatically selects the data which are similar to selected HTML tags. Before going forward, cross-check the selected values, are they correct or some junk data is also gets selected..? If you noticed our page has only 50 values, but you can see 156 values are selected.

Selection by SelectorGadget

Selection by SelectorGadget

So I need to remove unwanted values who get selected, once you click on them to deselect it, it turns red and others will turn yellow except our primary selection which turn to green. Now you can see only 50 values are selected as per our primary requirement but it’s not enough. I have to again cross-check that some required values are not exchanged with junk values.

If we satisfy with our selection then copy the HTML tag & include it into the code, else repeat this exercise.

Modified Selection by SelectorGadget

Step 6- Include the tag in our Code

After including the tags, our code is like this.

Code Snippet

If I run the code, values in each list object will be 50.

Data Stored in List Objects

Step 7- Creating DataFrame

Now, we create a dataframe with our list-objects. So for creating a dataframe, we always need to remember one thumb rule that is the number of rows (length of all the lists) should be equal, else we get an error.

Error appears when number of rows differs

Finally, Our DataFrame will look like this:

Our Final Data

Step 8- Writing our DataFrame to CSV file

We need our scraped data to be available locally for further analysis & model building or other purposes.

Our final piece of code to write it in CSV file is:

Writing to CSV file

Step 9- Check the CSV file

Data written in CSV file

Conclusion-

I tried to explain Web Scraping using R in a simple way, Hope this will help you in understanding it better.

Find full code on

https://github.com/vgyaan/Alexa/blob/master/webscrap.R

If you have any questions about the code or web scraping in general, reach out to me on LinkedIn!

Okay, we will meet again with the new exposer.

Till then,

Happy Coding..!

In-memory Data Grid vs. Distributed Cache: Which is Best?

Distributed caching has been a boon for IT professionals in the past due to its ability to make data always available even when offline. However, with the growing popularity of the Internet of Things (IoT) and the increasing amounts of data businesses need to process daily, distributed caching is slowly being overshadowed by a newer and more robust technology solution—the in-memory data grid (IMDG).

Distributed caches allow organizations to combine the amount of memory of computers within a network, boosting performance at minimum cost because there’s no need to purchase more disk storage or more high-end computers. Essentially, a data cache is distributed among all networked computers so that applications can use all available memory when needed. Memory is pooled into a single data store or data cache to provide faster access to data. Distributed caches are typically housed in a single physical server kept on site.

The main challenge of distributed caching today is that in-memory data grids can do distributed caching—and much more. What used to be complicated tasks for data analysts and IT professionals has been made simpler and more accessible to the layman. Data analytics, in particular, has become vital for businesses, especially in the areas of marketing and customer service. Nowadays, there are solutions available that present data via graphs and other visualizations to make data mining and analysis less complicated and quicker. The in-memory data grid is one such solution, and is one that’s gradually gaining popularity in the business intelligence (BI) space.

In-memory computing has almost pushed the distributed cache to a realm of obsolescence, so much so, that the remaining organizations that gold onto it as a solution are those that are afraid to embrace digital transformation or those that do not have the resources. However, this doesn’t mean that the distributed cache is less important in the history of computing. In its heyday, distributed caching helped solve a lot of IT infrastructure problems for a number of businesses and industries, and it did all of that at minimal cost.

Distributed Cache for High Availability

The main goal of the distributed cache is to make data always available, which is most useful for companies that require constant access to data, such as mobile applications that store information like user profiles or historical data. Common use cases for distributed caching include payment computations, external web service calls, and dynamic data like number of views or followers. The main draw, however, is how it allows users to access cached data whether the user is online or offline, which, in today’s always-connected world, is a major benefit. Distributed caches take note of frequently accessed data and keep them in process memory so there’s no need to repeatedly access disk storage to get to that data.

Typically, distributed caches offered simplicity through simple “put” and “get” operations through distributed key/value stores. They’re flexible enough, however, to handle more complicated processes through read-through and write-through instances that allow caches to read and write values to and from disk. Depending on the implementation, it can also handle ACID transactions, data replication, and active backups. Ultimately, distributed caching can help handle large, unpredictable amounts of data without sacrificing read consistency.

In-memory Data Grid for High Speed and Much More

The in-memory data grid (IMDG) is not just a storage solution; it’s a powerful computing solution that has the capability to do distributed caching and more. Designed to use RAM and eliminate the need for constant access to disk-based storage, an IMDG is able to process complex data for large-scale implementations at high speeds. Similar to distributed caching, it “distributes” the workload to a multitude of computers within a network, not only combining available RAM but also the computing power of all available computers.

An IMDG runs specialized software on each computer to enable this and to minimize movement of data to and from disk and within the network. Limiting physical disk access eliminates the bottlenecks usually caused by disk-based storage, since using disk in data processing means using an intermediary physical server to move data from one storage system to another. Consistent data synchronicity is also a highlight of the IMDG. This addresses challenges brought about by the complexity of data retrieval and updating, helping to speed up application development. An IMDG also allows both the application and its data to collocate in a single memory space to minimize latency.

Overall, the IMDG is a cost-effective solution because it all but eliminates the complexities and challenges involved in handling disk-based storage. It’s also highly scalable because its architecture is designed to scale horizontally. IMDG implementations can be scaled by simply adding new nodes to an existing cluster of server nodes.

In-memory Computing for Business

Businesses that have adopted in-memory solutions currently enjoy the platform’s relative simplicity and ease of use. Self-service is the ultimate goal of in-memory computing solutions, and this design philosophy is helping typical users transition into “power users” that expect high performance and more sophisticated features and capabilities.

The rise of in-memory computing may be a telltale sign of the distributed cache’s eventual exit, but it still retains its use, especially for organizations that are just looking to address current needs. It might not be an effective solution in the long run, however, as the future leans toward hybrid data and in-memory computing platforms that are more than just data management solutions.

Test-data management  support in Test Automation Development

Data is centric in testing of several applications because data is critical to organizations. Businesses are becoming more data-driven, and hence it is imperative that as Automation Test developers, the value of the test-data is understood and  completely harnessed during Test Automation development. The test-data involved in both Manual/Automation testing encompasses the test-data inputs, test-data outputs, and the test-data flow.

TestProject.io is the world’s first free cloud-based, community-powered test automation platform which caters to this important aspect of Test Automation development. The tool successfully adheres to the importance of keeping test-data centric in Automation Test solutions.

To start with, organizing and managing test data is very easy in TestProject. We are aware that as an application gets bigger and more tests are added, test data management becomes more difficult. This tool allows easy and clear management of the elements, tests, parameters by helping the Automation Test Developer associate data, be as an input or output in the UI as follows:

The tool makes the tests maintainable by allowing the Test data to be easily added, deleted, modified  making it  flexible in the perspective when business  requirements change. It also allows test data to be associated with Web, Android and iOS apps, allowing several types of input – web pages, JSON, PDFs etc. The test data can be also tested on several browsers such as Chrome, Firefox, Safari, Edge, Internet Explorer.

TestProject enables easy collaboration in a test automation team- by allowing/dis-allowing sharing of the test cases, test data etc as and when applicable. Eventually the team has shareable test repository which can be easily managed and controlled.

Sharing of parameters is available in levels –Test level and Project level. For example,

Hence, because of this, the test data can be easily re-usable, without having to mention the same test data repeatedly in some cases.

TestProject also has a “Secret Parameter” feature built in the smart test recorder that allows storing sensitive test data in an encrypted state.

There are also powerful Addons available in TestProject that can help the Automation Developers complete their tasks easily and quickly .For example, there are several  Random Data Generator Addons available. ‘Random Login Credentials Addon’ is one such Addon which generates random credentials to be entered for several tests.  Similarly, there are many more Random data generators available, such as for generating random dates, character/word/number etc as per several requirements. This definitely makes the job of an Automation developer much easier, and helps save time.

In TestProject, we can choose the input data source to be the default input parameters or to be associated with the data- driven method as follows :

The Data-driven Testing method of testing is necessarily important in cases when the coverage of any data variable comes into picture. We are aware that Data driven tests are tests that run multiple times, but with different values for some of the variables in the test. For example if you wanted to test that the username field on a login page could handle several different types of inputs you could create a separate test for each input, or you could use a data driven tests to drive the same login test multiple times, but just using a different username input each time. We are aware that Data-driven Testing is a very good approach if you have huge volumes of data to be tested for the same scripts.

One such support for Data driven testing in this tool is the Parameterization of variables. Once the parameters are added, like in the screenshot below, the parameter can be navigated to and picked for use.

In order to run a ‘Data-driven’ test, the Automation Developer would need to associate the test with various Data Sources. One such example is as follows, where the Developer can associate the test with the input CSV data source as follows:

Since it supports Data-driven test development, it results in stronger Test Coverage. That is, large volume of data can be managed and executed thereby improving regression testing and better coverage.

Speaking about data sources, TestProject also provides addons that help to work with several database as PostgreSQL, MySQL, MSSQL, Db2, Oracle. The tool can be easily linked with the databases by providing details as:

All this also shows the fact that the tool clearly separates the test cases and the test data and hence allows testers to test their applications using different data values and parameters without the need for changing test script/cases. While making a change in data sets such as addition, or deletion, doesn’t have implication with test cases.

Also, once the test is generated by the Automation developer, it can be viewed both in the ‘Manual Test’ view or the ‘Test document’ view. In both cases, once either of the options are chosen and they are downloaded, the test data is clearly mentioned in their respective columns in the documents.

For example, the ‘Manual Test’ document that gets generated automatically shows the Test Data used as,

And, the ‘Test’ document that gets generated automatically shows the Test Data’s default values used as,

While assesing the test results,  the tool clearly gives details on failures, helping the automation developer to easily debug the issue/ decide to open a defect. For example, the details are clearly showed as :

TestProject.io tool can also be easily integrated with many other tools, such as Jenkins, qTest, Slack etc, and the testcases/test data etc are easily synced during this association. Example, in the cases of Jenkins, we can associate the build step by linking it with the TestProject data source as follows:

Eventually, TestProject has emerged as a powerful test Automation framework, having very attractive features especially to the fact that it imparts the value of Test-data being centric in the  Automation Test tasks. Along with the fact that the tool supports the ideology of having the test-data to be the driving base to the whole Test Automation framework process, it  also enables sharing and syncing with other teams and tools during the development, management and execution of the Test Automation Solution.

Data Analytics and Mining for Dummies

Data Analytics and Mining is often perceived as an extremely tricky task cut out for Data Analysts and Data Scientists having a thorough knowledge encompassing several different domains such as mathematics, statistics, computer algorithms and programming. However, there are several tools available today that make it possible for novice programmers or people with no absolutely no algorithmic or programming expertise to carry out Data Analytics and Mining. One such tool which is very powerful and provides a graphical user interface and an assembly of nodes for ETL: Extraction, Transformation, Loading, for modeling, data analysis and visualization without, or with only slight programming is the KNIME Analytics Platform.

KNIME, or the Konstanz Information Miner, was developed by the University of Konstanz and is now popular with a large international community of developers. Initially KNIME was originally made for commercial use but now it is available as an open source software and has been used extensively in pharmaceutical research since 2006 and also a powerful data mining tool for the financial data sector. It is also frequently used in the Business Intelligence (BI) sector.

KNIME as a Data Mining Tool

KNIME is also one of the most well-organized tools which enables various methods of machine learning and data mining to be integrated. It is very effective when we are pre-processing data i.e. extracting, transforming, and loading data.

KNIME has a number of good features like quick deployment and scaling efficiency. It employs an assembly of nodes to pre-process data for analytics and visualization. It is also used for discovering patterns among large volumes of data and transforming data into more polished/actionable information.

Some Features of KNIME:

  • Free and open source
  • Graphical and logically designed
  • Very rich in analytics capabilities
  • No limitations on data size, memory usage, or functionalities
  • Compatible with Windows ,OS and Linux
  • Written in Java and edited with Eclipse.

A node is the smallest design unit in KNIME and each node serves a dedicated task. KNIME contains graphical, drag-drop nodes that require no coding. Nodes are connected with one’s output being another’s input, as a workflow. Therefore end-to-end pipelines can be built requiring no coding effort. This makes KNIME stand out, makes it user-friendly and make it accessible for dummies not from a computer science background.

KNIME workflow designed for graduate admission prediction

KNIME workflow designed for graduate admission prediction

KNIME has nodes to carry out Univariate Statistics, Multivariate Statistics, Data Mining, Time Series Analysis, Image Processing, Web Analytics, Text Mining, Network Analysis and Social Media Analysis. The KNIME node repository has a node for every functionality you can possibly think of and need while building a data mining model. One can execute different algorithms such as clustering and classification on a dataset and visualize the results inside the framework itself. It is a framework capable of giving insights on data and the phenomenon that the data represent.

Some commonly used KNIME node groups include:

  • Input-Output or I/O:  Nodes in this group retrieve data from or to write data to external files or data bases.
  • Data Manipulation: Used for data pre-processing tasks. Contains nodes to filter, group, pivot, bin, normalize, aggregate, join, sample, partition, etc.
  • Views: This set of nodes permit users to inspect data and analysis results using multiple views. This gives a means for truly interactive exploration of a data set.
  • Data Mining: In this group, there are nodes that implement certain algorithms (like K-means clustering, Decision Trees, etc.)

Comparison with other tools 

The first version of the KNIME Analytics Platform was released in 2006 whereas Weka and R studio were released in 1997 and 1993 respectively. KNIME is a proper data mining tool whereas Weka and R studio are Machine Learning tools which can also do data mining. KNIME integrates with Weka to add machine learning algorithms to the system. The R project adds statistical functionalities as well. Furthermore, KNIME’s range of functions is impressive, with more than 1,000 modules and ready-made application packages. The modules can be further expanded by additional commercial features.

Article series: 5 Clean Coding Tips – 5.Put yourself in somebody else’s shoes

This is the fifth of the article series “5 tips for clean coding” to follow as soon as you’ve made the first steps into your coding career, in this article series. Read the introduction here, to find out why it is important to write clean code if you missed it.

It might be a bit repetitive to bring up how important the readability of the code is, let’s do it anyway. In the majority of the cases you are writing for others, therefore you need to put yourself in their shoes to be able to assess how good the readability of your code is. For you, it all might be obvious because you wrote it. But it doesn’t have to be easy to read for someone else. If you have a colleague or a friend that has a bit of time for you and is willing to give you feedback, that is great. If, however, you don’t have such a person, having a few imaginary friends might be helpful in this case. It might sound crazy, but don’t close this page just yet. Having a set of imaginary personas at your disposal, to review your work with their eyes, can help you a lot. Imagine that your code met one of those guys. What would they say about it? If you work in a team or collaborate with people, you probably don’t have to imagine them. You’ve met them.

The_PEP8_guy – He has years of experience. He is used to seeing the code in a very particular way. He quotes the style guide during lunch. His fingers make the perfect line splitting and indentation without even his thoughts reaching the conscious state. He knows that lowercase_with_underscore is for variables, UPPER_CASE_NAMES are for constants and the CapitalizedWords are for classes. He will be lost if you do it in any different way. His expectations will not meet what you wrote, and he will not understand anything, because he will be too distracted by the messed up visual. Depending on the character he might start either crying or shouting. Read the style guide and follow it. You might be able to please this guy at least a little bit with the automatic tools like pylint.

The_ grieving _widow – Imagine that something happens to you. Let’s say, that you get hit by a bus[i]. You leave behind sadness and the_ grieving_widow to manage your code, your legacy. Will the future generations be able to make use of it or were you the only one who can understand anything you wrote? That is a bit of an extreme situation, ok. Alternatively, imagine, that you go for a 5-week vacation to a silent retreat with a strict no-phone policy (or that is what you tell your colleagues). Will they be able to carry on if they cannot ask you anything about the code? Review your code and the documentation from the perspective of the poor grieving_widow.

The_not_your_domain_guy – He is from the outside of the world you are currently in and he just does not understand your jargon. He doesn’t have to know that in data science a feature, a predictor and an x probably mean the same thing. SNR might shout signal-to-noise ratio at you, it will only snort at him. You might use abbreviations that are obvious to you but not to everyone. If you think that the majority of people can understand, and it helps with the code readability keep the abbreviations but just in case, document/comment them. There might be abbreviations specific to your company and, someone from the outside, a new guy, a consultant will not get them. Put yourself in the shoes of that guy and maybe make your code a bit more democratic wherever possible.

The_foreigner– You might be working in an environment, where every single person speaks the same language you speak, and it happens not to be English. So, you and your colleagues name variables and write the comments in your language. However, unless you work in a team with rules a strict as Athletic Bilbao, there might be a foreigner joining your team in the future. It is hard to argue that English is the lingua franca in programming (and in the world), these days. So, it might be worth putting yourself in the_foreigner’s shoes, while writing your code, to avoid a huge amount of work in the future, that the translation and explanation will require. And even if you are working on your own, you might want to make your code public one day and want as many people as possible to read it.

The_hurry_up_guy – we all know this guy. Sometimes he doesn’t have a body or a face, but we can feel his presence. You might want to write a perfect solution, comment it in the best possible way and maybe add a bit of glitter on top but sometimes you just need to give in and do it his way. And that’s ok too.

References:

[i] https://en.wikipedia.org/wiki/Bus_factor

Article series: 5 Clean Coding Tips – 4. Stop commenting the obvious

This is the fourth of the article series “5 tips for clean coding” to follow as soon as you’ve made the first steps into your coding career, in this article series. Read the introduction here, to find out why it is important to write clean code if you missed it.

Everyone will tell you that you need to comment your code. You do it for yourself, for others, it might help you to put down a structure of your code before you get down to coding properly. Writing a lot of comments might give you a false sense of confidence, that you are doing a good job. While in reality, you are commenting your code a lot with obvious, redundant statements that are not bringing any value. The role of a comment it to explain, not to describe. You need to realize that any piece of comment has to add information to the code you already have, not to double it.

Keep in mind, you are not narrating the code, adding ‘subtitles’ to python’s performance. The comments are there to clarify what is not explicit in the code itself. Adding a comment saying what the line of code does is completely redundant most of the time:

# importing pandas
import pandas as pd

# loading the data
csv_file = csv.reader(open'data.csv’)

# creating an empty data frame
data = pd.DataFrame()

A good rule of thumb would be: if it starts to sound like an instastory, rethink it. ‘So, I am having my breakfast, with a chai latte and my friend, the cat is here as well’. No.

It is also a good thing to learn to always update necessary comments before you modify the code. It is incredibly easy to modify a line of code, move on and forget the comment. There are people who claim that there are very few crimes in the world worse than comments that contradict the code itself.

Of course, there are situations, where you might be preparing a tutorial for others and you want to narrate what the code is doing. Then writing that load function will load the data is good. It does not have to be obvious for the listener. When teaching, repetitions, and overly explicit explanations are more than welcome. Always have in mind who your reader will be.

Article series: 5 Clean Coding Tips – 3. Take Advantage of the Formatting Tools.

This is the third of the article series “5 tips for clean coding” to follow as soon as you’ve made the first steps into your coding career, in this article series. Read the introduction here, to find out why it is important to write clean code if you missed it.

Unfortunately, no automatic formatting tool will correct the logic in your code, suggest meaningful names of your variables or comment the code for you. Yet. Gmail has lately started suggesting email titles based on email content. AI-powered variable naming can be next, who knows. Anyway, the visual level of the code is much easier to correct and there are tools that will do some of the code formatting on the visual level job for you. Some of them might be already existing in your IDE, you just need to look for them a bit, others need to be installed. One of the most popular formatting tools is pylint[i]. It is worth checking it out and learning to use it in an efficient way.

Beware that as convenient as it may seem to copy and paste your code into a quick online ‘beautifier’ it is not always a good idea. The online tools might store your code. If you are working on something that shouldn’t just freely float in the world wide web, stick to reliable tools like pylint, that will store the data within your working directory.

These tools can become very good friends of yours but also very annoying ones. They will not miss single whitespace and will not keep their mouth shut when your line length jumps from 79 to 80 characters. They will be shouting with an underscoring of some worrying color and/or exclamation marks. You will need to find your way to coexist and retain your sanity. It can be very distracting when you are in a working flow and warnings pop up all the time about formatting details that have nothing to do with what you are trying to solve. Sometimes, it might be better to turn those warnings off while you are in your most concentrated/creative phase of writing and turn them back on while the dust of your genius settles down a little bit. Usually the offer a lot of flexibility, regarding which warnings you want to be ignored and other features. The good thing is, they also teach you what are mistakes that you are making and after some time you will just stop making them in the first place.

References:

[i] https://www.pylint.org/

Article series: 5 Clean Coding Tips – 2. Name Variables in a Meaningful Way

This is the second of the article series “5 tips for clean coding” to follow as soon as you’ve made the first steps into your coding career, in this article series. Read the introduction here, to find out why it is important to write clean code if you missed it.

When it comes to naming variables, there are a few official rules in the PEP8 style guide. A variable must start with an underscore or a letter and can be followed by a number of underscores or letters or digits. They cannot be reserved words: True, False, or, not, lambda etc. The preferred naming style is lowercase or lowercase_with_underscore. This all refers to variable names on a visual level. However, for readability purposes, the semantic level is as important, or maybe even more so. If it was for python, the variables could be named like this:

b_a327647_3 = DataFrame() 
hw_abc7622 = DataFrame()  
a10001_kkl = DataFrame()

It wouldn’t make the slightest difference. But again, the code is not only for the interpreter to be read. It is for humans. Other people might need to look at your code to understand what you did, to be able to continue the work that you have already started. In any case, they need to be able to decipher what hides behind the variable names, that you’ve given the objects in your code. They will need to remember what they meant as they reappear in the code. And it might not be easy for them.

Remembering names is not an easy thing to do in all life situations. Let’s consider the following situation. You go to a party, there is a bunch of new people that you meet for the first time. They all have names and you try very hard to remember them all. Imagine how much easier would it be if you could call the new girl who came with John as the_girl_who_came_with_John. How much easier would it be to gossip to your friends about her? ‘Camilla is on the 5th glass of wine tonight, isn’t she?!.’ ‘Who are you talking about???’ Your friends might ask. ‘The_Girl_who_came_with_John.’ And they will all know. ‘It was nice to meet you girl_who_came_with_john, see you around.’ The good thing is that variables are not really like people. You can be a bit rude to them, they will not mind. You don’t have to force yourself or anyone else to remember an arbitrary name of a variable, that accidentally came to your mind in the moment of creation. Let your colleagues figure out what is what by a meaningful, straightforward description of it.

There is an important tradeoff to be aware of here. The lines of code should not exceed a certain length (79 characters, according to the PEP 8), therefore, it is recommended that you keep your names as short as possible. It is worth to give it a bit of thought about how you can name your variable in the most descriptive way, keeping it as short as possible. Keep in mind, that
the_blond_girl_in_a_dark_blue_dress_who_came_with_John_to_this_party might not be the best choice.

There are a few additional pieces of advice when it comes to naming your variables. First, try to always use pronounceable names. If you’ve ever been to an international party, you will know how much harder to remember is something that you cannot even repeat. Second, you probably have been taught over and over again that whenever you create a loop, you use i and j to denote the iterators.

for i in m:
    for j in n:

It is probably engraved deep into the folds in your brain to write for i in…. You need to try and scrape it out of your cortex. Think about what the i stands for, what it really does and name it accordingly. Is i maybe the row_index? Is it a list_element?

for element in list_of_words:
    for letter in word:

Additionally, think about when to use a noun and where a verb. Variables usually are things and functions usually do things. So, it might be better to name functions with verb expressions, for example: get_id() or raise_to_power().

Moreover, it is a good practice to name constant numbers in the code. First, because when you name them you explain the meaning of the number. Second, because maybe one day you will have to change that number. If it appears multiple times in your code, you will avoid searching and changing it in every place. PEP 8 states that the constants should be named with UPPER_CASE_NAME. It is also quite common practice to explain the meaning of the constants with an inline comment at the end of the line, where the number appears. However, this approach will increase the line length and will require repeating the comment if the number appears more than one time in the code.