Clients Choosing Not to Implement Fault Tolerance
iLink 2 .X does not mandate that client systems use the fault tolerance feature; however, CME Group strongly recommends using this functionality. Clients who do not implement the fault tolerance cannot dynamically recover from process or network failures.
CME Group does not recommend running redundant processes on the same machine because if a machine fails, all the processes running on it fail simultaneously.
iLink 2 .X has a designated host that is primary and another that is designated backup. Customers must successfully log on to the primary before attempting to log on to the backup. If a customer logs in to the backup gateway and is not already logged into the primary gateway, client systems will receive a Logout (tag 35-MsgType=5) message with tag 58=Invalid logon. Must be logged on to Primary. Logout forced.
Logon Procedure with Fault Tolerance
When the client sends a Logon message, the Fault Tolerance Indicator (FTI) in tag 49-SenderCompID must be set to 'U' for undefined. Tag 49-SenderCompID and tag 56-TargetCompID are 7 characters long and are composed of 3 sub-fields:
Beginning of Week Logon and Mid-Week Logon (tag 35-MsgType=A) messages must be sent with the FTI in tag 49-SenderCompID set to 'U'. If the client application submits a Logon (tag 35-MsgType=A) message and the FTI is not set to 'U', a Logout (tag 35-MsgType=5) message is issued and the connection is dropped. Because In-Session Logon messages may be sent only on the primary channel, the FTI must be set to 'P'.
Based on the value of the FTI contained in tag 56-TargetCompID, the client application must populate the FTI in tag 49-SenderCompID with the same value for all outgoing messages.
The client application must acknowledge that it has successfully received and processed the FTI instruction from iLink 2 .X by sending the FTI in tag 49-SenderCompID for each message to CME Globex.
Application messages (e.g., New Order - Single, Order Cancel/Replace Request) must be sent only through the primary content stream where sequencing is enforced per FIX 4.2 protocol.
Communication over the backup is solely for link maintenance. Only administrative messages (Logon, Logout, Heartbeat and Test Request) are sent through the backup. Sequencing on the backup is not enforced; message sequence numbers in the administrative messages are zero.
Examples of Fault Tolerance Scenarios
Client System Sends FTI Status of 'U' for Beginning of Week or Mid-Week Logon
The following diagram illustrates how member processes of a client application fault-tolerant group connect to CME Globex. In this example, both client member processes send Logon messages with the FTI set to 'U' in tag 49-SenderCompID.
Application Message Sent Over a Backup Connection
In the following diagram:
Backup Client System Sends Incorrect FTI
In the following diagram, the client application is logged on successfully and is designated as a backup by iLink 2.X:
Client System FTI Status Assigned as Primary or Backup
The following message scenario shows the Client System 1 as Primary.
This messaging scenario shows Client System 2 as Backup.
Client System Process Complies with FTI Instruction
In the following diagram, a client application acknowledges that it has successfully processed the FTI instruction by populating the FTI in the SenderCompID for each outgoing message:
Assigns FTI Status
The following diagram illustrates how iLink 2 .X assigns fault tolerance status:
- As a client application is authenticated, iLink 2 .X dynamically assigns the fault tolerance status and populates the FTI with a 'P' or 'B' in the TargetCompID of the Logon Confirmation message.
- As all the client member processes receive and process the FTI, the fault tolerance status of the client application fault tolerant group members is fully determined.
In the following diagram, the client system is logged on successfully and is designated as the primary by iLink 2.X:
- The primary client system sends a iLink 2 Heartbeat message with an incorrect FTI in the tag 49-SenderCompID. It sets its FTI to 'B' instead of 'P'.
- As a result, iLink 2 .X logs the client system out.
- When the primary client system is logged out, all the backup systems are also logged out.
Fault Tolerance Error Conditions
iLink 2 .X detects seven categories of error conditions described as follows.
Client Primary Process Failure
If iLink 2 .X does not receive any messages from the primary client process within the defined heartbeat interval:
- CME Globex sends a iLink 2 Test Request message to invoke a iLink 2 Heartbeat message from the client.
- If the primary client process does not respond with a Heartbeat message to the Test Request message within the defined hearbeat interval (or if the client does not send any message during the entire interval), iLink 2 .X designates the primary client process as failed and initiates failover.
- The primary client application is disconnected from iLink 2.X.
- One of the backup client applications is chosen to communicate over a new primary channel.
- The backup client application is notified of such fault tolerance status change by examining the FTI in the TargetCompID of the next incoming message.
- If the primary client process fails without closing the TCP connection, then it takes two Heartbeat intervals for iLink 2 .X to detect the primary process failure. The backup client application should check the FTI on every message to determine its status. If clients want to avoid the time delay in this process, then they should ensure that the TCP connection is closed whenever their application fails.
Before failover, the backup client application was receiving sequence numbers set to zero. During and after the failover process, the backup client application is responsible for ensuring that its inbound and outbound sequence numbers are synchronized with the primary application that just failed. This is critical since the newly elected primary member must know exactly where the failed member left off. If the sequence number of the message sent by the new primary client application is lower than that of the original client application, iLink 2 .X logs the client application out per the FIX 4.2 protocol.
Client Backup Process Failure
If iLink 2 .X does not receive any message from a backup client application within a defined interval:
- CME Group sends a to invoke a Heartbeat message from the client.
- If there is no response to the Test Request message within the defined heartbeat interval (or if the client does not send any administrative message during the entire interval), the backup client application is disconnected from iLink 2.X.
- The status of the primary client application connectivity remains intact.
- CME Globex initiates failover by electing the ranking inactive iLink FIX Gateway to assume the primary role.
- The client application that is connected to this newly chosen iLink FIX Gateway must act as the primary for the client application FT Group.
- The client application is notified to become primary by examining the FTI in the Tag 56-TargetCompID of next incoming message.
Backup CGW Failure
iLink 2 .X maintains a predefined number of processes running for each iLink 2 .X component. If a backup iLink Gateway fails:
In the event of network failure, iLink 2 .X handles socket exceptions that are thrown for network error conditions (i.e., loss of TCP/IP connectivity between the client application and the iLink Gateway). When this happens, iLink 2 .X designates the primary content stream as failed and initiates the failover.