Cold, warm and hot redundancy: determining how much you need

Reliability requirements in process control systems are different for every industry, and for some this may require PLC redundancy in the automation system to keep people and equipment safe. Some processes will require very little or moderate intervention, while failures or delays cannot be tolerated in other processes. Here, we look at different types of redundancy and where they should be used.

Cold, warm and hot redundancy: determining how much you need

What is redundancy?

Redundancy comes in many forms depending on the application, however for all situations it is about providing reliability and a process alternative to a failing condition.

An alternative response can be designed into a process control system at either the component or process level.

While most PLC vendors provide units with built-in redundancy for processor control and power supplies, extra control hardware and software can be installed to further reduce the risk of damage and inconvenience if a controller should fail. Depending on the consequences of failure, increasing reliability through redundancy can be an easy decision. The tricky part is that not all vendors solutions are equal and how do you know if the marketing glossy is telling the whole story?

The form of redundancy is dependent on a number of factors and can be classed as cold, warm or hot redundancy.

Cold redundancy

Cold redundancy is best suited to non-critical processes where down time is not a big concern and human intervention is possible.

For example, if a belt press machine in a large wastewater treatment plant fails, the control system will set off an alarm to inform the operator of the problem. Typically, these plants have several belt presses working in parallel so the operator can then make a decision to take the unit out of service, or resume operation by starting another unit and requesting a service for the failed unit.

In this example, the plant has redundancy of equipment so a PLC failure is not a big deal. Cold redundancy would consist of having an identical spare PLC or parts thereof close by and access to the latest PLC code so that the CPU can be programmed easily. This design is acceptable as the loss of the press is unlikely to have a critical impact on operations and operator intervention is acceptable.

For processes which are more time critical, a warm or hot redundancy design is a better approach.

Warm redundancy

Warm redundancy design is suited to processes where time and response are important but a momentary outage is still acceptable.

Looking at a fluid transfer system as an example, if a valve fails to operate, the pump can be disabled and the system shut down. The length of acceptable time the system can be shut down can range from a few seconds or minutes, or even longer, and will be determined by the product and how long it takes to be damaged, contaminated or start to deteriorate. Generally, the process must be restored quickly and automatically to avoid any integrity issues.

Warm redundancy systems usually operate in shadow mode where identical software connects the primary and standby processors. The primary processor controls the system’s input and outputs (I/O) while the standby processor will take control of the I/O if the primary processor goes offline, allowing the system to be maintained without losing process control.

During normal operation, the standby processor is only provided with periodic updates by the primary processor, usually at the end of each program scan and may only involve a portion of the data. This means it may take a few program scans for the secondary processor to catch up to the primary processor after the switch over has occurred. Most warm standby systems will halt the process for this period although they will typically hold the last state of the outputs while the changeover occurs.

The hardware for both warm and hot redundancy systems are almost identical, and care needs to be taken when examining the different types of system as the hardware can easily be confused. It is critical to talk to your system integrator about the hardware selection to ensure you know what sort of performance will be achieved from a system before specifying a certain brand or type of PLC system.

Hot Redundancy

While the architecture of warm and hot redundancy systems are very similar, unlike warm systems, hot redundancy systems provide instant process correction when a failure is detected. This makes it the best solution for critical processes which cannot experience an outage for even a brief moment.

Examples of where a hot redundancy system is applicable would be a critical power system for a hospital, an air traffic control system or a process plant with large high-speed machinery. In such applications, if a primary controller fails, a backup one needs to assume control immediately so that there are no delays in the transfer. Any delays in transfer could result in critical equipment damage, supply breakers tripping or delays in generator transfers resulting in power glitches or complete momentary loss.

For hot redundancy systems, the PLC programming software and hardware coordination must be exact to allow processors to constantly relay messages between each other and so they can access common data for a smooth transition to take place. The most important thing is that the secondary processor has knowledge of every logic cycle of the primary processor to ensure data integrity. Some of the best systems available today provide data transfer speeds in the nanoseconds and ensure that the data tables of both processors are updated throughout the scan cycle so that when a failure occurs control is transferred to the secondary within a single scan and the process does not experience even the slightest glitch.

Unlike warm redundancy systems, hot systems provide a seamless transfer of the I/O during the changeover between processors.

In the second part of this article, we will look at different brands of PLC and how they perform redundancy.