On February 5, 2021, the water treatment system for the city of Oldsmar, Florida was breached to such a degree that the sodium hydroxide level was briefly increased from the normal range of 100 parts per million (ppm) to 11,100 ppm — a level more suitable for household cleaners than for human consumption.
While the FBI investigates the source and method of the attack, during a press conference held the following Monday, Pinellas County Sheriff Bob Gualtieri reminded the public that what is known is the following: “Somebody hacked into the system — not just once but twice — controlled the system, took control of the mouse, moved it around, opened the program, and changed the levels from 100 to 11,100 ppm with a caustic substance. Those are the facts.”
Whether this is an act by a nation-state or a script kiddie, one thing is certain: This is not the last time our vital industries — power, water, gas, energy — and their mission-critical systems will be targeted and exposed through a cyber attack, resulting in potentially extensive (and expensive) business and health and safety consequences. Protiviti’s view — especially in this current environment — is that securing (and limiting where possible) remote access connections to critical industrial control systems (ICS) and focusing efforts on early detection and an effective incident response program should be prioritized in helping mitigate system disruptions and attempted breaches.
So, what can we learn from the near catastrophe at Oldsmar’s water treatment plant?
Establish Baselines for Early Detection
One might note that during the second breach, at 1:30 p.m., an alert plant operator detected the unusual activity and avoided the need to rely on downstream safety checks that may have kicked in approximately 36 hours after the disastrous alteration to chemicals in the water at the Oldsmar plant. However, the same operator observed the activities stemming from the first breach, at 8 a.m., and did not initially see any cause for alarm. While early detection and alerting may not have spawned such an interesting story and case study, it would have resulted in a timely and more effective response.
An emphasis on early detection shifts the dependency from an alert engineer viewing the right screen to instead rely on preset alert thresholds that monitor conditions continually and even behind the scenes.
A prerequisite for such an implementation is establishing baselines customized to the industry. Continuing our example of water utilities, when the sodium hydroxide level is raised by a factor of more than 100 to a clearly unhealthy range, detective controls can be triggered to alert operators of potential manipulation and even unintentional human errors. For instance, some organizations operating out of limited geographic locations will generate alerts and require additional user verification when remote access is initiated from unfamiliar locations. Establishing baselines for expected system traffic and user patterns drives system owners to build a better understanding of their processes, focusing detection and subsequent triage efforts on anomalies.
Formalize and Communicate Incident Response Procedures
A comprehensive incident response plan (IRP) provides a proven method for system resiliency and reduces recovery times when breaches occur — and sooner or later there will be a breach. Where an IRP already exists, we strongly recommend that it be reviewed from an ICS perspective. This would take into account operational technology (OT) systems and their operators, including them as crucial contributors to incidence response and triaging. Periodic tabletop exercises should be conducted where escalation chains and team triage actions are tested to simulated events, followed by an open discussion of lessons learned that can be incorporated into the IRP. Ideally, the aim is to reduce dependence on the ICS operator’s recollection of response procedures and aim to train team members to know their parts and act — or escalate — suitably during an actual incident.
Secure Remote Access
Considering that this blog is likely being read from a home office, it is likely that remote access is the present and potential future of our work models. Acknowledging the significance of quick and hassle-free access for field engineers and vendors, the most stringent of controls need to be implemented for remote access to mission-critical OT/ICS systems. If an organization is looking to implement a solution for secure and reliable remote diagnostics, at minimum, it should consider implementing just-in-time privileges, multifactor authentication and full session recording for comprehensive event logging. If there is already a remote access solution in place, periodic security assessments by reputable third-party firms will provide alignment with the dynamic threat landscape.
As more details are uncovered regarding this incident, we will gain a better understanding of system and personnel failures that led to the breach occurring in the first place. For now, we must act swiftly to evaluate existing setups and install protective measures to ensure that these are just as secure and reliable in our pandemic-induced work environments. Overarching all of the above recommendations is the need to have security governance and an established program for securing ICS systems. See our previous post, Three Steps to Build an Effective Industrial Control Systems Security Program and podcast on building an effective industrial control systems security program.
While we can appreciate the vigilance of the Oldsmar water treatment plant operator and the associated exercise of quick thinking to avoid a potential disaster, we must consider this incident as a cautionary tale, recognizing that ultimately robust security controls, reliable monitoring, and clear communication and escalation channels are the best way to offer ICS/OT systems and personnel the processes, methods and tools to resolve a similar incident.