Understanding databases for storing, updating and analyzing data requires the understanding of two concepts: ACID and BASE. This is the first article of the article series Data Warehousing Basics.
The properties of ACID are being applied for databases in order to fulfill enterprise requirements of reliability and consistency.
ACID is an acronym, and stands for:
- Atomicity – Each transaction is either properly executed completely or does not happen at all. If the transaction was not finished the process reverts the database back to the state before the transaction started. This ensures that all data in the database is valid even if we execute big transactions which include multiple statements (e. g. SQL) composed into one transaction updating many data rows in the database. If one statement fails, the entire transaction will be aborted, and hence, no changes will be made.
- Consistency – Databases are governed by specific rules defined by table formats (data types) and table relations as well as further functions like triggers. The consistency of data will stay reliable if transactions never endanger the structural integrity of the database. Therefor, it is not allowed to save data of different types into the same single column, to use written primary key values again or to delete data from a table which is strictly related to data in another table.
- Isolation – Databases are multi-user systems where multiple transactions happen at the same time. With Isolation, transactions cannot compromise the integrity of other transactions by interacting with them while they are still in progress. It guarantees data tables will be in the same states with several transactions happening concurrently as they happen sequentially.
- Durability – The data related to the completed transaction will persist even in cases of network or power outages. Databases that guarante Durability save data inserted or updated permanently, save all executed and planed transactions in a recording and ensure availability of the data committed via transaction even after a power failure or other system failures If a transaction fails to complete successfully because of a technical failure, it will not transform the targeted data.
The ACID transaction model ensures that all performed transactions will result in reliable and consistent databases. This suits best for businesses which use OLTP (Online Transaction Processing) for IT-Systems such like ERP- or CRM-Systems. Furthermore, it can also be a good choice for OLAP (Online Analytical Processing) which is used in Data Warehouses. These applications need backend database systems which can handle many small- or medium-sized transactions occurring simultaneous by many users. An interrupted transaction with write-access must be removed from the database immediately as it could cause negative side effects impacting the consistency(e.g., vendors could be deleted although they still have open purchase orders or financial payments could be debited from one account and due to technical failure, never credited to another).
The speed of the querying should be as fast as possible, but even more important for those applications is zero tolerance for invalid states which is prevented by using ACID-conform databases.
ACID databases have their advantages but also one big tradeoff: If all transactions need to be committed and checked for consistency correctly, the databases are slow in reading and writing data. Furthermore, they demand more effort if it comes to storing new data in new formats.
In chemistry, a base is the opposite to acid. The database concepts of BASE and ACID have a similar relationship. The BASE concept provides several benefits over ACID compliant databases asthey focus more intensely on data availability of database systems without guarantee of safety from network failures or inconsistency.
The acronym BASE is even more confusing than ACID as BASE relates to ACID indirectly. The words behind BASE suggest alternatives to ACID.
BASE stands for:
- Basically Available – Rather than enforcing consistency in any case, BASE databases will guarantee availability of data by spreading and replicating it across the nodes of the database cluster. Basic read and write functionality is provided without liabilityfor consistency. In rare cases it could happen that an insert- or update-statement does not result in persistently stored data. Read queries might not provide the latest data.
- Soft State – Databases following this concept do not check rules to stay write-consistent or mutually consistent. The user can toss all data into the database, delegating the responsibility of avoiding inconsistency or redundancy to developers or users.
- Eventually Consistent –No guarantee of enforced immediate consistency does not mean that the database never achieves it. The database can become consistent over time. After a waiting period, updates will ripple through all cluster nodes of the database. However, reading data out of it will stay always be possible, it is just not certain if we always get the last refreshed data.
All the three above mentioned properties of BASE-conforming databases sound like disadvantages. So why would you choose BASE? There is a tradeoff compared to ACID. If databases do not have to follow ACID properties then the database can work much faster in terms of writing and reading from the database. Further, the developers have more freedom to implement data storage solutions or simplify data entry into the database without thinking about formats and structure beforehand.
While ACID databases are mostly RDBMS, most other database types, known as NoSQL databases, tend more to conform to BASE principles. Redis, CouchDB, MongoDB, Cosmos DB, Cassandra, ElasticSearch, Neo4J, OrientDB or ArangoDB are just some popular examples. But other than ACID, BASE is not a strict approach. Some NoSQL databases apply at least partly to ACID rules or provide optional functions to get almost or even full ACID compatibility. These databases provide different level of freedom which can be useful for the Staging Layer in Data Warehouses or as a Data Lake, but they are not the recommended choice for applications which need data environments guaranteeing strict consistency.