In: Electrical Engineering
Question:
Modern signalling systems require different levels of safety integrity levels for communication between different systems. Provide a system architecture which shows a signalling system including:
i) The signallers’ interface
ii) The interlocking
iii) Object controllers
iv) Line side objects
(a) Your answer should describe the safety integrity requirements for each part of the signaling system.
(b) Identify the key system interfaces and describe the communication method, including a outline of the protocol requirements and how the safety integrity level is maintained.
Abstract
Securing a safety-critical system is a challenging task, because safety requirements have to be considered alongside security controls. We report on our experience to develop a security architecture for railway signalling systems starting from the bare safety-critical system that requires protection. We use a threat-based approach to determine security risk acceptance criteria and derive security requirements. We discuss the executed process and make suggestions for improvements. Based on the security requirements, we develop a security architecture. The architecture is based on a hardware platform that provides the resources required for safety as well as security applications and is able to run these applications of mixed-criticality (safety-critical applications and other applications run on the same device). To achieve this, we apply the MILS approach, a separation-based high-assurance security architecture to simplify the safety case and security case of our approach. We describe the assurance requirements of the separation kernel subcomponent, which represents the key component of the MILS architecture. We further discuss the security measures of our architecture that are included to protect the safety-critical application from cyberattacks.
1. Introduction
The integration of commercial off-the-shelf (COTS) hardware and software into industrial control systems such as railway command and control systems (CCSs) is in progress. However, introducing COTS components into a previously proprietary safety system leads to novel security threats. The interplay between safety and security is an active research area, where many questions are yet to be answered. An extensive survey of approaches to combine safety and security in industrial control systems has been performed by Kriaa et al. [1]. Our study of the safety-security interplay is motivated by the lack of a security architecture for railway signalling.
Current train control is centralized in a signal box, also called interlocking system (ILS), that controls a defined area of the tracks comprising of multiple track switches (points) and signals. An example with a single point and signal is shown in Figure 1. If a train needs to move on the tracks, the ILS sets the points according to the desired route. If the movement of the train is considered safe, the ILS sets the signal for the route to clear. The aspect of the signal is observed by the train driver, who is allowed to safely proceed on the journey. A route is considered safe for a train, if it is not occupied by or reserved for another train, thus precluding collisions.
Points, signals, and other controllable objects are summarized under the term field elements. Earlier ILS generations used analogue signal transmission to set their field elements. Modern ILSs utilize IP networks to transmit their commands digitally to an object controller (OC) that in turn steers the field element by starting and stopping the point machine or turning on and off the signal’s light bulbs, respectively. This allows for decoupling energy supply and command transmission and thus for larger distances between ILS and field element. Since railway signalling networks are classified as critical infrastructure (CI), the impact of potential security incidents on the railway system can be huge. This calls for the need of a security concept to ensure the robustness of railway signalling networks against cyberattacks. Furthermore, the railway system as a CI must meet national safety regulations. To address these issues, we execute a railway domain-specific requirements engineering process that has been proposed for German railways [2] but can be used as a template for international railway operation. The output of this process provides the foundation of a security architecture, which we propose for railway CCS.
Our contribution consists of the following. We report on our experience with the requirements engineering process of DIN VDE V 0831-104, and make suggestions for improvements. Then, we investigate the effect of these requirements on our case study, the safety-critical railway CCS. We show and discuss the derived security requirements for the case study. Subsequently, we use the identified threats and requirements to propose a security architecture for mixed-criticality systems such as the railway CCS running safety-critical applications along nonsafety-critical security applications. A mixed-criticality system must be carefully designed in order to maintain functionalities such as dependability and responsiveness under constrained resources and in the presence of attackers [3]. We propose the Secure Object Controller (SecOC), a security architecture based on a hardware platform that includes a hardware trust anchor. On top of the hardware, a separation kernel (SK) provides a software framework that allows running applications of mixed-criticality on our platform. On the software platform, safety and security applications coexist. The security applications protect the safety-critical application from cyberattacks. Complementary, the SK ensures that the security applications cannot exhaust the resources required by the safety application to fulfill its safety-critical task. Additionally, to enhance the security of the system, we apply security measures. An authenticated boot process uses the hardware trust anchor to ensure that only authorized software instances are executed on the hardware platform. A health monitor observes the system state during runtime and can report conspicuous state changes. A secure update mechanism allows for altering the system software, firmware, and configuration from authorized sources only. We further discuss our approach to handle assurance of the mixed-criticality system towards multiple certification authorities. Also, we show how the proposed architecture fulfills the requirements we derived.
The paper is organized as follows. Section 2 presents the system model. Section 3 details the conducted threat analysis. The elicited security requirements are presented in Section 4. Section 5 discusses the shortcomings of the conducted process as well as the suggested enhancements. Section 6 describes and discusses the proposed security architecture. Section 7 justifies that our security architecture fulfills the security requirements. Section 8 concludes the paper.
2. System Model
Our system model is a general railway signalling architecture composed of the interlocking layer and the field element layer, as depicted in Figure 2.
The interlocking layer encompasses the ILS and a maintenance and data management system (MDM). The MDM is responsible for updating the software of the components in the interlocking and field element layer as well as for time synchronization, and diagnostic data collection due to legal requirements. The heart of the signalling network is the ILS. It is responsible for issuing commands to the field elements of the lower architectural layer to execute the orders of the traffic supervisor, who is not considered in our system model. The ILS guarantees safe train operation by discarding unsafe orders, e.g., orders that would lead to the collision or derailment of trains.
On the lower layer, point machines are driven to switch tracks and clearances are communicated to train drivers via light signals. The commands to the OCs are sent through the network using specialized railway protocols. The OCs, located in junction boxes close to the tracks, switch the power of their corresponding field element (points, signals). There is a one-to-one relation between field element and OC. Thus, OCs are spatially distributed with limited physical protection by junction boxes and are therefore accessible by an attacker. In contrast, physical access to the components of the interlocking layer is more difficult to obtain, as these components are subject to physical access control in a building.
The layers are connected via a network that is considered untrusted from a security perspective, because the railway operator has neither full control over its physical layout nor can guarantee its impenetrable protection. As a result, packets might be dropped, changed, rerouted, or read by an unknown third party.
The depicted building blocks are located in housings (indicated by the dashed line in Figure 2). This fact is important for the subsequent security analysis, because of the influence the housing can have on the attacker’s capabilities to access the components. It protects the internal network of the interlocking layer and the enclosed objects such that certain malicious actions, e.g., physical penetration, become infeasible.
According to DIN VDE V 0831-104 [2], we split the system model into logical zones featuring components that are assumed to have equal security requirements. We identify three zones, named Z.ILS, Z.MDM, and Z.OC, shown in Figure 2. Our definition of zones includes the list of objects in the zone, the logical and physical borders, the data flows between the zones, the interfaces (Ethernet), and the physical connections of each zone. In our system definition, zones Z.ILS and Z.MDM reside in the same building and are instantiated only once per railway station. Many zones of type Z.OC exist in a single railway station and have weaker housing compared to Z.ILS and Z.MDM. The number of Z.OC is determined by the number of field elements controlled by a railway station and can reach up to 250 entities.
We use the system model consisting of the three zones and their interconnections as input for the next process step where we analyse the threats to the system.
3. Risk Analysis
To derive security requirements, we follow the German prestandard DIN VDE V 0831-104 for security in railway signalling networks [2]. It is designed as a guideline for applying the IEC 62443 framework for industrial control system security to railway signalling, while obeying to relevant railway safety standards such as EN 50128, EN 50129, and EN 50159.
In this section, we detail the conducted risk analysis, considering each of the zones of the system model (see Section 2) along with its components, their functions, the relevant data, and the communication between the components.
From a safety perspective, the risk is indicated as the product of the probability of an event to occur and the loss associated with the event. This approach has proven to be impractical for security analyses [4]. Attacks on security do not follow a probabilistic model of occurrence, because they are intentional. Moreover, even if security attacks could be characterized by a stochastic process, probabilities could not be easily determined due to the lack of a sufficient amount of samples and a constantly changing attacker landscape. Therefore, we conduct an explicit risk analysis, following one of the approaches proposed by DIN VDE V 0831-104, by using the presented system model.
3.1. Attacker Model
In order to describe capabilities of an attacker, DIN VDE V 0831-104 uses a set of qualitative dimensions: resource, knowledge, and motivation. The resource dimension reflects the financial and workforce capacity of the attacker to prepare and launch an attack. The knowledge dimension describes what she knows about the system before attacking it and can use to create opportunities for a successful offensive. These dimensions are characterized by numerical values from low over medium to high , where attackers with basic capabilities are not considered [2]. Thus, generic knowledge (K = 2) comprises of publicly available information such as protocol specifications and COTS hardware. An attacker with extended knowledge (K = 4) has access to some closed information usually only available to a small circle of experts (insiders) working with the system. Low financial resources (R = 2) comprise a few thousand Euro while extended financial capacity (R = 4) provides resources in the magnitude of state actors.
In order to describe the attacker’s motivation DIN VDE V 0831-104 introduces railway-specific mitigation factors related to the risk for the attacker to be discovered. The higher this risk for a particular attack, the lower is the motivation to carry it out. This way, existing security controls, can be taken into account.
The standard considers the following mitigation factors: location (LOC), traceability (TRA), and extent of attack (EXT) [2]. LOC determines whether an attack can be executed remotely (LOC = 0) or only given a physical access to the system (LOC = 1). Remote attacks are considered to be more dangerous, as the chance for the attacker to remain undiscovered is higher. Similarly, if an attack can be traced to the attacker (TRA = 1), it is less dangerous than an untraceable attack (TRA = 0). EXT denotes the potential damage of an attack, which can be either critical (EXT = 1) or serious in case fatalities might be involved (EXT = 0).
This attacker model is applied during the risk assessment to evaluate each potential threat and determine the security level required to protect the system against it (see Section 3.2). Though this approach does not explicitly employ traditional attacker categories including, e.g., basic user, cybercriminals, terrorists, insiders, or nation state [5], it uses the same idea to describe attacker’s capabilities.
In the specific context of this paper involving safety aspects, the only relevant goal of the attacker is to cause collisions or derailment of trains. This goal can also be achieved by simply blocking the tracks, exchanging the colors of a light signal or detaching the light signal from the OC and applying current, so that the signal shows clear when it should show stop. For example, political activists from Greenpeace are reported to cause disruptions by sabotaging the railway to prevent the transportation of nuclear waste (http://www.greenpeace.org/international/en/news/Blogs/nuclear-reaction/greenpeace-block-nuclear-waste-transport/blog/35204/). This kind of simple physical attacks targeted against individual field elements have always been possible and cannot be fully prevented by IT security mechanisms.
The digitalization in turn introduces much broader adversarial opportunities due to connectivity and usage of regular IT components. Thus, our goal is to protect the railway signalling network against one-affects-all attacks that can be applied to a multitude of field elements at the same time and can paralyze the complete infrastructure. Additionally, by the use of COTS products, attacks can become remotely executable in case the attacker finds a way to access the network shown in Figures 1 and 2. In contrast to such scenarios, physical attacks are much more restricted. As the hardware belonging to Z.ILS and Z.MDM is set up in a building with physical access control, it is reasonable to assume that an attacker cannot gain physical access to this hardware to modify or even replace the components. The OCs in Z.OC are installed in a junction box and therefore physically accessible for the attacker to tamper with the hardware or replace it entirely. However, we assume that physically tampering with an OC is equivalent to local attacks we described before (blocking the tracks; changing colors of light signal). They do not scale to a one-affects-all attack, because each field element has to be attacked individually. Also it is considered virtually impossible to protect against such physical attacks in a large-scale infrastructure like nationwide railway signalling [6, 7]. For this reason, only protection against high-impact one-affects-all attacks is the main scope of this paper.
3.2. Threat Analysis and Preliminary Security Level
We perform a threat-based risk analysis as defined in DIN VDE V 0831-104 by systematically listing cyberattacks that threaten the components presented in our system model. This approach allows us to consider typical cyberattacks as well as infrastructure-specific and railway-specific attacks. As a result of the threat analysis, 67 threats are defined.
Some examples are given in Table 1. The table also shows the underlying attacker model of DIN VDE V 0831-104 as explained in the previous section.
Each threat was given an identifying name and a detailed description, where the description provides details about attack implementation and potential impact. For example, T.SI.Attacker.Malware describes an attacker who introduces malware to undermine the integrity of the railway CCS. A typical denial-of-service (DoS) attack is covered by the threat named T.RA.Attacker.DoS. A threat more typical for railway CCS describes an attacker that records and analyses the traffic of a signalling network in order to prepare a more sophisticated attack (T.DC.Attacker.TrafficAnalysis).
The threats are assigned to at least one of the foundational requirements (FRs) of IEC 62443 being identification and authentication control (IAC), use control (UC), system integrity (SI), data confidentiality (DC), restricted data flow (RDF), timely response to events (TRE), and resource availability (RA).
In addition, each threat was assigned to the zones to which it is applicable. Out of all the 67 threats, 51 are applicable to Z.OC, 51 are applicable to Z.ILS and 47 are applicable to Z.MDM. While 38 threats are relevant for all the three zones (i.e., executable on different components), seven are applicable to exactly two zones, and 22 are specific to a single zone (i.e., exploiting the characteristics of the respective zone).
To estimate the risk imposed by a given threat, we consider the attacker capabilities for this threat in accordance with the attacker model introduced in Section 3.1. The attacker’s resource and knowledge capabilities are combined to form a preliminary security level (PSL) related to the given threat. The PSL is later used to calculate the final security level (SL) for the respective threat. A SL—ranging from 1 (low) to 4 (high)—describes the level of protection a system provides against an attacker.
During the execution of the risk assessment process each threat is assigned values for the attacker capabilities (R, K) and values for the mitigation factors (LOC, TRA, EXT) to calculate and respectively adjust the PSL relevant for the respective threat. Examples are shown in Table 1.
3.3. Calculation of Security Levels
As defined by DIN VDE V 0831-104 [2], the final SL, using the PSL and the mitigation factors, is calculated by
According to the equation, the value of the PSL is reduced by one, if any of the mitigation factors equals one (corresponding to a logical “or”). This means that a threat that can only be executed locally (LOC) is traceable (TRA) or has only a critical (critical in this case, is the less dangerous/severe outcome) extent (EXT) which will reduce the PSL by one level to form the SL.
This calculation is done for each of the 67 identified threats in the seven FRs. To calculate the SL value for the three zones, each is assigned a vector of seven values corresponding to the respective FRs, as shown in Table 2. The seven entries are determined by the maximum SL value over the threats assigned to the respective zone and FR. For each zone, the FR with the greatest SL determines the security level of the zone (set italic in Table 2). For simplicity, we write SL , when we refer to an SL vector with maximum entry . In this way, the SL vector of Z.OC yields SL4 for the zone. Respectively do the vectors of Z.ILS and Z.MDM yield an SL of 4.
We identify three decisive threats that determine the security levels. We use them to exemplify how the attacker capabilities and mitigation factors lead to the SL. Two of the decisive threats are responsible for the value of 4 in all three zones. The first decisive threat is the remote execution of malware on the systems in our reference architecture (T.SI.Attacker.Malware, see Table 1). We assigned an attacker with moderate resources (R = 3) and extended knowledge (K = 4) to it, resulting in PSL = 4 for the threat. The assessment of the mitigation factors resulted in the following values: the threat description implies that it is remotely executable (LOC = 0). A skilled attacker is assumed to be able to hide the traces, such that we consider the threat as not traceable (TRA = 0). Also, an attacker with deep knowledge (K = 4) is capable of performing a carefully targeted attack with potentially serious extent (EXT = 0). Thus, none of the mitigation factors apply to reduce the final SL. Subsequently, plugging the mitigation factors and the PSL into (1) yields SL4.
The second decisive threat describes the manipulation of patches such that legitimate processes execute malicious code chosen by the attacker when rolled out to devices through update mechanisms. Analogous considerations for attacker capabilities and mitigation factors as in the previously discussed case apply to this threat leading to SL 4.
Only for zone Z.MDM there is a third decisive threat that results in SL 4. It covers the manipulation of data on the MDM where the firmware of the ILS and OCs is stored and distributed from. This threat poses a high risk, because the firmware can be manipulated remotely at a central point from which it is distributed to unsuspecting network entities. Without further checks, the ILS and OCs of an entire station’s signalling network can be compromised. Again, we consider an attacker with R3 and K4 to perform this attack and could not identify an applicable mitigation factor (LOC = 0, TRA = 0, and EXT = 0). Hence, the analysis of this threat also leads to SL 4 for zone Z.MDM.
4. Security Requirements Elicitation
After having derived the SL of each zone, the specific system requirements can be retrieved from IEC 62443-3-3. The standard contains a list of 100 system requirements that are applicable to each zone depending on the identified SL. We evaluated the SL for each FR to select the security requirements from the IEC 62443-3-3 standard. As a result of our risk analysis, we found 69 system requirements that are relevant for our system model.
The requirements of IEC 62443 are on a generic level, as it is a standard for industrial automation and control system (IACS). In order to reduce the complexity of handling 69 relevant requirements and to conserve the railway-specific knowledge gathered while specifying the threats, we choose to explore an additional approach to derive security requirements. For this approach, we elicit requirements with a methodology suggested by Myagmar et al. [8]. The authors propose to take the outcome of the conducted threat analysis as a basis, and transform each identified threat to a requirement. Deriving requirements besides IEC 62443 opens the possibility of responding to railway-specific factors that cannot be reflected in a IACS standard like IEC 62443. Examples for such railway-specific requirements are R1, R11, and R13 that are shown and discussed later in this section.
In order to derive requirements, we investigate corresponding threats and formulate a statement indicating which behaviour the system must or must not exhibit in order to prevent the respective threat [8]. We mark the threat to be covered by the respective requirement. Alternatively, we declare that the required behaviour was already contained in a previously found statement, so we mark the threat to be covered by the respective requirement. We repeat the addition of requirements until every threat has at least one corresponding requirement. Subsequently, we derive the following security requirements that are explained in more detail afterwards:R1The system shall detect unauthorized physical access to its subsystems and/or prevent relevant exploitations of physical access.R2The system shall not allow the compromise of a communication key.R3The system shall not disclose classified or confidential data (such as access credentials) to any illegitimate user.R4The system shall exclude compromised endpoints from communication.R5The system shall not use insecure transfer methods.R6The system shall not allow any unauthorized user to access an endpoint (e.g., MDM, ILS, and OC).R7The system shall not allow unauthorized and unauthenticated communication between endpoints.R8The system shall not violate the runtime behaviour requirements.R9The system shall allow for the updating of security mechanisms, credentials, and configurations in order to patch known vulnerabilities.R10The system shall not allow the execution of unauthorized software instances.R11The system shall maintain the transmission system requirements defined in EN 50159.R12The system shall provide mechanisms to detect an undesirable system state change and anomalies.R13The system shall impede that an unauthorized user can force it into one of the fall-back levels defined by the railway safety process.R14The system shall maintain the integrity of software, firmware, configuration, and hardware.
In the following, we discuss those requirements that are specific to railways as they could violate the safety constraints posed by the domain standards.
The physical access detection required by R1 is especially relevant for the railway domain, because OCs are spatially distributed over large areas next to railway tracks and cannot be as well protected as, for example, within factory premises. Therefore, unauthorized physical access to the junction box of the OC has at least to be detected such that further actions can be triggered by, e.g., a security operations center to avoid or mitigate consequences.
In order to keep a railway station operational, R4 requires that compromised endpoints (OCs) can be excluded from the network, such that benign OCs are not affected. An endpoint is considered compromised, if an attacker can remotely control the safety functionality, because the OC accepts the attacker’s commands or the attacker controls the safety-critical software on the endpoint. The disclosure of cryptographic keys belonging to an endpoint renders it compromised as well. Physical access to the OC’s hardware constitutes a compromise that can be detected if the junction box is opened or the OC is removed from the rack. Physical access to the steered field element is not a compromise as this attack is already possible in current railway infrastructures without digital components. However, physical attacks of this kind do not scale to multiple OCs, if no shared secrets can be gained by the attacker.
Requirement R5 excludes transfer methods such as network protocols that involve cryptographic functions which usage is discourage by institutions like NIST or national agencies for information security. The requirement enforces the usage of communication protocols that do not employ cryptography that is considered broken.
Due to safety reasons, railway signalling networks require a failure disclosure time, which is addressed by R8. Any security measure that influences the network traffic (e.g., message encryption) must not exhaust the network resources such that the network latency exceeds a threshold of 50 ms as specified by railway safety standards and railway operator specifications.
A common requirement from both security and safety perspectives is the robustness of a transmission system against repetition, deletion, insertion, resequencing, corruption, delay, and masquerade of messages. This is specified by EN 50159, a European standard for safety-related communication in transmission systems required to receive admission to operate a railway system. R11 ensures that these requirements are fulfilled and also considered in the design of a security architecture to avoid fulfilling a requirement twice. Some security functionalities might already be available in the system due to fulfilling EN 50159 or can be established by adding only minimal features.
During the design of the security architecture, it must be considered that railway signalling has processes that take effect in case of technical failure in order to maintain operation of the railway system. These fall-back processes involve human interaction and do not provide the full extent of safety and capacity compared to a fully functional, automated interlocking in terms of failure rate. This risk is covered by R13, which requires the security architecture to be designed in such a way that it does not allow an attacker to force the interlocking system into a fall-back state. The attack surface should not be increased by the security architecture compared to attacks with the same effect already possible today, e.g., physically destroying a cable. A security architecture that can force the safety system into a fall-back state enables the attacker to remotely implement a large-scale DoS attack causing major disruption in the railway transportation system.