Globex Advisory Notice

 | 
To All GLOBEX Users
From John M. Restivo, Director GLOBEX Control Center
Subject System Update #25 Duplicate Fill Messages (HIP/HOP failover) &
Effective Date 06/18/02
Notice Number SU#0225
Duplicate Fill Messages Problem: This past Friday, 6-14-02 and to a much lesser extent today, Tuesday, 6-18-02, several components within the GLOBEX architecture incorrectly activated their failover systems, resulting in duplicate fill messages, broker accept messages, and order reject messages being transmitted to customers.

Background: The HIP/HOP (Host Input Process/Host Output Process), OES (Order Entry Server) and OMA (Order Management) components of the GLOBEX architecture are implemented as �high availability� pairs. Each back-up process constantly awaits �heartbeat� messages from the primary process to ensure availability. If the back-up fails to receive heartbeats in an allotted timeout period, it becomes the primary process and assumes the processes of the original, however, the primary process had not failed. Heartbeat messages travel over the HIP/HOP bus, which is a specific network segment using TIBCO middleware.

Implementation of COOL Phase 1 on 5-31-02 resulted in an increase in the number of messages. These additional messages along with some yet unexplained event, eventually caused some servers/network interface cards on the network to become overloaded, resulting in missed messages. Some of the crucial heartbeats from the above components did not reach their failover systems in the allotted timeout period. Even though the primary systems were still active, the failover systems reacted to the missing heartbeats and also became active. With two HIP systems processing the same messages, a number of users received unnecessary order reject messages due to duplicate orders being submitted to the GLOBEX host. With two HOP processes, duplicate broker accept and/or fill messages were sent because of this problem.

Resolution: Several steps are being taken to ensure that this type of failure does not occur again. The following short term fixes have already been implemented:

1.Certain administrative processes have been consolidated to reduce the number of messages generated to the TIBCO bus by more than 50%.

2.Heartbeat intervals were increased from 1 heart beat per second with a 5 second timeout to a 3-second heartbeat with a 7 second time out for failover.

In addition, we are taking several other steps to deal with this issue in the longer term. These include:

3.Testing a variety of failover scenarios, including false failovers, to eliminate negative customer impact.

4.Actively investigating moving other administrative processes to a separate network to further reduce the traffic on the primary HIP/HOP network.

5.Base-lining network traffic in production environment.

6.Investigating re-segmenting TIBCO traffic onto two networks.

7.Investigating upgrading from our current version of TIBCO �rvd� to version 6.9 to alleviate known TIBCO bugs that exist in the current version (v5.3)

Market Data Latencies

As initially reported last week, numerous customers have experienced delays in receiving market data ranging from several seconds to several minutes. The latencies are sporadic, affecting both MD API users as well as GLOBEX Trader customers. Different users appear to be affected at different times. The problem is not confined to a particular geographic area either; issues have been reported from both domestic and European customers.

Immediate Response: Development and networking specialists are actively investigating these latencies and making every effort to determine the exact nature of their cause, and will then make a recommendation on a permanent solution. In the interim, the CME has eliminated full market depth in interest rate and currency markets (same practice previously used in the ES & NQ markets). The 5 deep instrument summary book will still be available in ALL markets.

We are still actively investigating this problem but have not identified a root cause as of yet. Market data latencies may continue to occur until the root cause of this problem is identified and a permanent solution is implemented. We will continue to notify you through the GCC Notification System of both future latencies, as well as our ongoing remediation efforts.

If you have any questions, please call the GCC at 312-456-2391 option 2.