December 2021

5 Best Practices for Performing Data Backup and Recovery

December 20, 2021/in Datacenter, Main Category/by Shannon Flynn

Data backup and recovery are critical for any organization in the digital age. The field of data science has developed advanced, secure, user-friendly backup and recovery technology over recent years. For anyone new to data backup and recovery, it can be challenging knowing where to start, especially when dealing with large quantities of data. There are some best practices in data backup and recovery that are beneficial for any user or organization. These tips will provide a jumping-off point for creating a customized data protection strategy.

1. Create a Frequent Backup Plan

One of the first steps to protecting data from loss is creating a plan or schedule for backups. Frequency is key for a quality backup schedule. Creating data backups only once or twice a year increases the risk of losing data in the intervening months between backups. The exact frequency will depend on individual circumstances to a certain extent, specifically the frequency with which new data is being created.

For individuals, weekly backups are recommended for devices like personal computers. Businesses and organizations have significantly more data to manage than individuals do. This means that more data has to be included in each backup, new data is created faster, and data storage is more expensive.

After deciding on the timing of the backups, consider what the best way to execute them is. For more frequent backups, automation may be a good idea. Automated backups dodge the risk of anyone forgetting to initiate the backup and make it easier to manage large backups.

2. Vary Backup Locations and Media

One of the most common data backup and recovery tips is the 3-2-1 rule. This data backup strategy suggests keeping three backups of important files with two copies backed up in two distinct storage types and one copy backed up off-site.

The idea behind the 3-2-1 approach is to build resilience through redundancy and variation. Even if a hacker is able to access an on-site hard drive of sensitive data, they won’t be able to damage the isolated off-site copy of that data.

The 3-2-1 rule is simply a starting point for data storage methods. Individuals and organizations should carefully consider what backup and recovery media best suits their specific needs. The cloud might be ideal for one business’s data storage, while independent drives might be better for another. The key is to have some measure of variation in the types of backup media and where they are stored. You could use an offsite server, the cloud, or any other combination of backup storage options. Keeping at least one copy in a unique location is wise, though. In the event of a natural disaster, for example, this could be critical to recovering data lost on-site.

3. Plan for Extensive Data Storage

This next tip is especially important for organizations or individuals backing up large amounts of data. From the start, it is a good idea to plan for extensive storage needs. The cost of data storage may seem intimidating, but it is often better to face it up front and consider how much data storage will be needed in the long term.

You might start out using only a partition of cloud storage and a smaller backup server. Have a plan in mind for how you will expand your storage space as time goes on. Different niches and industries have different data storage needs. For example, organizations in the ad tech industry will need bulk data storage for app tracking data and media. This data can pile up rapidly, so a bulk storage plan is critical for a data backup and recovery strategy.

4. Regularly Test Backup and Recovery Measures

A crucial component of any data backup and recovery strategy is a schedule for testing the strategy. In the event that a recovery is needed, it will be extremely helpful for key team members to know how to proceed. Knowing that the recovery tactics in place have been tested recently offers some peace of mind, as well.

There are countless ways to test data backup and recovery strategies. Simulations are a popular method. For example, a data scientist could use an AI or white-hat hacker to conduct a simulated cyberattack on the data then run a recovery of that data afterward. Before running a test simulation, it is a good idea to backup data and ensure that no data is genuinely at risk of being lost, just in case the recovery strategy has unforeseen weaknesses.

5. Budget for Security

One of the main goals of creating a data backup and recovery plan is protecting data from cyberattacks. So, it is important to make sure that the backup and recovery methods being used are secure. There are layers to this security, as well. For example, an organization might choose to back up some of its data in the cloud. Their first line of defense is the security of their cloud storage provider. The next line of defense then might be encryption on the organization’s files or documents stored with that cloud provider.

Security measures vary from case to case. A general rule of thumb, however, is to invest in the best security possible. Take the time to research the defenses of data storage providers before choosing one to partner with. Make sure on-site cybersecurity is resilient and up-to-date. Encrypt anything particularly sensitive, just in case. Cybersecurity is an investment, but budgeting for it may be the difference between recovering data and losing it.

Resilient Data Backup and Recovery

These best practices for performing a successful data backup and recovery will help get you started. The next step is to conduct thorough research on your personal or organizational data protection needs. The goal is to find a balance between budget and performance, where you are getting the most secure data storage possible at the best value.

process.science presents a new release

December 6, 2021/in Business Analytics, Business Intelligence, Main Category, Process Mining, Sponsoring Partner Posts, Tools/by Editorial Staff

Advertisement

Process Mining Tool provider process.science presents a new release

process.science, specialist in the development of process mining plugins for BI systems, presents its upgraded version of their product ps4pbi. Process.science has added the following improvements to their plug-in for Microsoft Power BI. Identcal upgrades will soon also be released for ps4qlk, the corresponding plug-in for Qlik Sense:

3x faster performance: By improvement of the graph library the graph built got approx. 300% more performant. This is particularly noticeable in complex processes
Navigator window: For a better overview in complex graphs, an overview window has been added, in which the entire graph and the respective position of the viewed area within the overall process is displayed
Activities legend: This allows activities to be assigned to specific categories and highlighted in different colors, for example in which source system an activity was carried out
Activity drill-through: This makes it possible to take filters that have been set for selected activities into other dashboards
Value Color Scale: Activity values can be color-coded and assigned to freely selectable groupings, which makes the overview easier at first sight

process.science Process Mining on Power BI

Process mining is a business data analysis technique. The software used for this extracts the data that is already available in the source systems and visualizes them in a process graph. The aim is to ensure continuous monitoring in real time in order to identify optimization measures for processes, to simulate them and to continuously evaluate them after implementation.

The process mining tools from process.science are integrated directly into Microsoft Power BI and Qlik Sense. A corresponding plug-in for Tableau is already in development. So it is not a complicated isolated solution requires a new set up in addition to existing systems. With process.science the existing know-how on the BI system already implemented and the existing infrastructure framework can be adapted.

The integration of process.science in the BI systems has no influence on day-to-day business and bears absolutely no risk of system failures, as process.science does not intervene in the the source system or any other program but extends the respective business intelligence tool by the process perspective including various functionalities.

Contact person for inquiries:

process.science GmbH & Co. KG
Gordon Arnemann
Tel .: + 49 (231) 5869 2868
Email: ga@process.science
https://de.process.science/

Data Science mit Python - Buchempfehlung 2021

Data Science mit Python – Aktuelle Buchempfehlungen

December 4, 2021/in Books, Data Science, Recommendations/by Benjamin Aunkofer

Als Dozent für Data Science und Python Programmierung für Hochschulen und Unternehmen (Mitarbeiter-Training) werde ich natürlich immer wieder zu Literatur-Empfehlungen in deutscher Sprache gefragt. Aus aktuellem Anlass gebe ich hiermit eine Empfehlung von Büchern, die ich auch für meine Trainingserklärungen und -beispiele verwende oder einfach generell empfehlen kann.

Das Buch Praktische Statistik für Data Scientists: 50+ essenzielle Konzepte mit R und Python (Animals) ist aktuell eines meiner Lieblinge unter den Büchern, die Statistik methodisch nicht zu trocken, aber auch nicht zu beispielorientiert erklären, sondern eine flüssig lesbare Erläuterung zu den wichtigsten Prinzipien der Statistik von der deskriptiven, induktiven und explorativen Statistik bis hin zu Machine Learning bieten. Dazu gibt es Programmiercode in R und Python, was ich an dieser Stelle eher bemängle als bewundere. Dennoch ein sehr ordentlich geschriebenes und beinahe flüssig lesbares Buch mit tollen Erklärungen.

Das Buch Einführung in Data Science: Grundprinzipien der Datenanalyse mit Python (Animals) kenne ich nur aus der ersten Auflage, die zweite wird jedoch sicher nicht schlechter sein. Dieses Buch sticht mit seiner Methodenorientiertheit hervor, denn hier geht es um die Erläuterung von Prinzipien der Data Science (Statistik, Machine Learning) mit Python, jedoch ohne besonders auf bestehende Bibliotheken zu setzen. Es geht um die Grundprinzipien der Data Science mit didaktischem Mehrwert und verleitet ein Gefühl dafür, wie die Algorithmen funktionieren.

Wer ganz auf das Wissen rund um Machine Learning setzen möchte, liegt mit dem Machine Learning mit Python und Keras, TensorFlow 2 und Scikit-Learn: Das umfassende Praxis-Handbuch für Data Science, Deep Learning und Predictive Analytics (mitp Professional) richtig. Es setzt hingegen sehr auf die Nutzung der Bibliotheken Scikit-Learn und Tensorflow, erklärt dabei die Verfahrensweise von Lernalgorithmen der Klassifikation und Regression sowie des unüberwachten maschinellen Lernens recht ausführlich und mit sehr erklärenden Abbildungen. Insbesondere wird hier auf die grundlegenden Prinzipien des Deep Learnings vom MLP zum CNN eingegangen. Es schlägt die Brücke von Python für Machine Learning zu Python für Deep Learning.

Wenn es schnell gehen soll mit dem Einstieg in Machine Learning mit Python, könnte Data Science mit Python: Das Handbuch für den Einsatz von IPython, Jupyter, NumPy, Pandas, Matplotlib und Scikit-Learn (mitp Professional) eine gute Wahl sein. Auf besonders ausführliche Erklärungen über die Algorithmen des machinellen Lernens muss man hier weitgehend verzichten, dafür sind die Beispiele, gelöst mit den typischen Python-Bibliotheken sehr umfangreich und sofort anwendbar. Dieses Buch ist etwas mehr eines über die Bibliotheken in Python für Data Science als über die dahinter liegenden Methoden.

Alternativ zum vorgenannten Buch gibt es vom konkurrierendem Verlag Datenanalyse mit Python: Auswertung von Daten mit Pandas, NumPy und IPython (Animals). Dieses eignet sich besonders zum einfachen Erlernen der Funktionsweisen der Methoden und Datenstrukturen in Python Numpy, Pandas und Matplotlib. Die klassische Datenanalyse mit deskriptiver Statistik steht hier mehr im Vordergrund als Machine Learning, sorgt jedoch auch dafür, dass die Datenanalyse mit Python sehr ausführlich erklärt wird. Es ist ebenfalls etwas mehr ein Python-Buch als ein Buch über Verfahrensweisen der Data Science. Es eignet sich meiner Meinung nach besonders gut für Python-Lerner, die es bisher gewohnt waren, Daten in SQL zu analysieren und nun auf Pandas umsteigen möchten.

Alle Buchempfehlungen basieren auf meiner Erfahrung als Dozent. Ich habe alle Bücher intensiv gelesen und genutzt.
Die Links sind sogenannte Affiliate-Links. Wenn Du als Leser auf so einen Affiliate-Link klickst und über diesen Link einkaufst, bekomme ich als Inhaber des Data Science Blogs eine Provision, ohne dass sich der Kaufpreis des Artikels ändert. Ich versichere, dass jegliche Einnahmen nach Steuer zu 100% wieder in den Data Science Blog investiert werden.

5 Best Practices for Performing Data Backup and Recovery

1. Create a Frequent Backup Plan

2. Vary Backup Locations and Media

3. Plan for Extensive Data Storage

4. Regularly Test Backup and Recovery Measures

5. Budget for Security

Resilient Data Backup and Recovery

process.science presents a new release

Data Science mit Python – Aktuelle Buchempfehlungen

Interesting links

Pages

Categories

Archive