High Frequency Trading Infrastructure and Quality Assurance

Exactpro Systems Company Overview

  • A specialist firm focused on functional and non functional testing of securities data distribution, trading systems, risk management and post-trade infrastructures

  • An independent company incorporated in 2009 with 10 people, now employing over 210 specialists

  • A US company registered and head-quartered in San Rafael, California, with four QA & development centres in Russia and sales support in the UK

  • We build software to verify trading and back-office systems for exchanges, brokers and other companies in securities industry

Achieving High Availability

  • Minimize the number of faults and the effect/recovery time of faults in a system

  • Avoid a single-point-of-failure by utilizing redundant parts and rerouting (failover)

  • Have a comprehensive monitoring system in place

  • Reduce the impact of environmental faults by using UPS and off-site data mirroring and/or replication provided for "hot" repair of failed components

Ariane 5 First Launch
4 June 1996

  • Maiden flight of Ariane 5 unmanned missile

  • Loss of guidance and attitude information 37 seconds after ignition Explosion at 3700 meters 3 seconds later

  • Number conversion from 64 bits into 16 bits

  • Ada language invalid operand error

  • The same version of software was used for Ariane 4

  • Horizontal velocity appeared to be much higher

  • Testing using simulators without SRI itself

  • Both primary and secondary systems failed

High Frequency Trading System

  • Hundreds of millions of orders per day;

  • Micro-bursts with thousands of transactions within milliseconds;

  • Latencies 3,000 times faster than the blink of an eye and equal to time it takes a flying passenger jet to cover the distance of 2 cm or light getting from here to Frasne

Mizuho Securities
12 October 2005

  • Attempts to sell a single J-Com stock for 610,000 Yen ($5,041)

  • Mistakenly price and quantity were swapped

  • Risk systems failure:

    • Mizuho Securities

    • Tokyo Stock Exchange

  • Estimated loss $225 millions

  • This type of errors is called Fat Finger

USS Yorktown
21 September 1997

  • CG-47 Aegis pilot version for "Smart Ship" program

  • All systems outage, including propulsion system for 2.5 hours

  • Incorrect data entry into Remote Data Base Manager caused overflow in the database, LAN shutdown and disconnection of all controlling terminals

  • Software defect - division by zero

Fat Finger Order on NASDAQ from ABN AMRO Client
18 September 2012, Stockholm

  • A trader had the intention of posting a sell order for 5,000 SKF B shares. Due to an input error with the Client, the order volume field was populated with a negative value (-5,000)

  • Instead of returning an error, the system converted the value into a random 9-digit figure - 294,962,296

  • The Sell Order corresponded to approximately 71 % of the total outstanding volume in the SKF B share. The Sell Order resulted in execution of 813,442 shares

Flash Crash
6 May 2010

  • Waddell & Reed hedges exposure in equities

  • Algo to sell 75,000 E-mini contracts (~$4.1b) with 9% participation target

  • No price or time constraints in the algo, only volume traded during the previous minute

  • Initial selling was absorbed by HFT and arbitrageurs - buy E-mini, sell SPY or basket of equities. Lack of liquidity and hot-potato exchange between HFT increased volumes and selling pressure from the algo

  • Sharp decline in prices within 5 minutes. Trigger of across the board volatility interruptions

  • Participants are leaving the market, causing liquidity crisis in equities and execution against stub quotes

  • Market recovers within minutes

Facebook IPO
18 May 2012, NASDAQ

  • One of the largest IPOs in history

  • Secondary trading is preceded by a designate Display Only Period (DOP)

  • Multi-component architecture that included Matching Engine, IPO Cross Application and Execution Application

  • At the end of the DOP, NASDAQ's "IPO Cross Application" analyzes all of the buy and sell orders to determine the price at which the largest number of shares will trade and then NASDAQ's matching engine matches buy and sell orders at that price. Usually takes 1-2 ms

  • NASDAQ allowed orders to be cancelled at any time up until the end of the DOP - including the very brief interval during which the IPO cross price is calculated. After calculation is completed the system performed orders validation check between ME and "IPO Cross Application". If any of the orders were cancelled after the start of the cross, the system will have to repeat the calculation

  • Over 496k orders participated in the cross and its duration exceeded 20ms

  • Order cancellation arrived during this period and application had to repeat the calculation. Two more cancellations arrived during the second iteration and four more during the third

  • IPO Cross Application went into infinite loop at 11:05

  • NASDAQ team switched off validation check on the secondary system and performed failover 25 minutes after the start of the loop

  • Unknown at that moment 38k orders submitted between 11:11 and 11:30 were stuck and had not participated in the uncross. It created another discrepancy, this time with Execution App and Members who were not able to receive confirmation for orders executed in the cross until 13:50

ABN and ATG Auction Uncross Problem on NASDAQ
28 August 2013, the SEB

  • A share opened significantly lower than on the previous day. The opening price was 51.80, which was around 24% lower than the closing price

  • A contributing factor was trading performed during the opening auction by Algorithmic Trading Group (ATG), using a certain algorithm that it had developed through Sponsored Access arrangements with ABN Amro Clearing Bank

  • Because of a shortcoming of the said algorithm, it registered, amended and cancelled the orders as soon as the limit price was equal or cross the equilibrium price of the order book

Knight Capital Events, 1 August 2012, USA

  • Knight Capital - one of the most successful HFT firms

  • Implemented changes related to Retail Liquidity Program NYSE

  • SMARS - ultra-fast order router

  • Source code responsible for the legacy functionality PowerPeg

  • 212 parent orders, millions child orders

  • Accumulated loss - $460m or $170k/sec

  • Incorrectly configured risk systems

  • Deployment on 7 servers instead of 8€

North American Blackout, 14 August 2003, USA and Canada

  • Cascade power outage

  • Race conditions resulted in buffer overflow in alerting system

  • Should operators disconnected 4% of the overall load the losses estimated to be $10b could have been avoided

Market Surveillance and Monitoring

  • Process all events

  • Aggregate them

  • Look for patterns using flexible rules

  • Replay for Investigation

  • Store everything as evidence