AI-Driven Synthetic Data Engineering for Improved Digital Resilience

The article was first published in the June 2024 issue of the World Federation of Exchanges Focus Magazine.

Daria Degtiarenko, Senior Marketing Communications Manager, Exactpro

Introduction

In the environment dominated by pervasive process and service digitalisation, financial institutions and firms find themselves heavily reliant on technology transformations affecting both their peripheral and core functions. In just 3 years, cloud migration rose from 37% (in August 2020) to 91% in (August 2023)[1] and, just within this past year, North America has moved to the T+1 settlement cycle, spurring unprecedented levels of optimisation in clearing and settlement processes. General-purpose and narrow AI models are being widely deployed in anticipation of “significant” cost savings in the long term.[2] The whirlwind of innovation spans operations, asset classes and the supporting infrastructure.

The high concentration of at-scale transformation puts operational resilience among the top-of-mind priorities for incumbents, new industry entrants, standard-setting bodies, and, of course, regulators across the globe. We have seen the number of reported exchange outages decrease from sixteen in 2018 to three in 2022,[3] which points to the robustness of the risk management mechanisms put in place, at least for the technologies and the development pace indicative of that period. But, as the rapidly expanding fintech space is venturing into less familiar waters, what can set it up for long-term success?

Seeing Outages as a "Data" Problem

Every transformation has a major impact on various steps of transaction processing across the financial infrastructure, and, at the core of each such transformation, lies thorough testing. It is paramount in helping ensure that the data flows, related functions, and the system’s performance characteristics are not adversely affected by the new features.

From being the lifeblood of the financial industry, data becomes its actual fabric, the resource that the digital evolution hinges on. However, for the real client and transactional data to function smoothly within systems and across infrastructures, carefully curated data “stuntmen” have to go further and wider, taking hits from all the bumps in the road first and testing the way for real data to run smoothly in a production environment. These “stuntmen” are none other than synthetically generated datasets that have been helping shape a high level of digital operational resilience in exchanges, CCPs, CSDs and other financial organisations for over a decade.

The majority of outages reported in 2018-2022 were caused by “software issues” or a combination of issues including a software issue. A software defect can happen for various reasons including errors in specification, code, configuration, development of a rare race condition or a latency- or throughput-related halt, among others. While the root causes differ, the overarching reason remains the same: an outage is the consequence of occurrence of a scenario that was never tested.

From the perspective of long-term operational resilience, an outage event is much less about the event that caused it and more about the reason why that specific event was not injected into the system at the testing phase. Could the overall approach have been too targeted, too narrow? Did the test tools have limitations of their own? Was the selection of techniques and subsequent scenarios lacking diversity?

A test scenario is only as good as the combination of data points, system actions, conditions, and their combinations it verifies. Even though, hypothetically, all possible permutations can be calculated, it does not make it automatically possible to run all conceivable positive and negative scenarios. In fact, budget and production system access constraints make testers compile test execution libraries in a highly selective way. So, quality validation for complex distributed infrastructures requires a comprehensive and yet highly precise and efficient software testing approach.

The success of testing is measured by how well the test library accounts for all “extreme but plausible” scenarios.[4] Having the approach rooted in system modelling combined with synthetically generated data is a way to achieve comprehensive coverage by having enough freedom to explore all relevant functional and performance characteristics of the system, and, more importantly, the intersection of the two. A synthetic-data-driven approach also helps solve access limitations for third-party software testing services providers, makes testing less dependent on system availability times and allows for enhanced edge case testing.

Higher-quality innovation with AI-driven testing

Developing a resource-efficient test library to comprehensively cover a system under test can be further improved by combining system modelling with narrow AI methods. Generative AI algorithms are famous for their creative capabilities, but notorious for not being deterministic. When used to assist human testers in finding all relevant test scenarios, they still have a better chance of pinpointing unique random combinations of parameters that are distinct from human logic. This makes them a most valuable exploratory resource.

Actually executing a test library produced with the help of Generative AI requires a reliable way to significantly optimise that library. This can be done with more restrictive AI-driven methods such as symbolic AI. After careful assessment and iterative improvement, the approach results in achieving a performant set of test scenarios with the same high level of coverage, compared to the original massive test dataset.

Needless to say, synthetic data production requires a high level of skill and deep domain expertise. In the regulated space, it has to comply with laws specific to the use case and be attuned to recommendations such as the ones formulated by supervisory bodies.[5] Synthetic data needs to be business-pertinent, reliable and consistent in its quality, it should reflect real-world values and their statistical distributions. The mechanisms that ensure its provenance need to be clear and easy to explain to all stakeholders.

An AI-driven approach to software testing should be a “system-first” approach: it should be closely tailored to the system and data it serves. At the same time, it allows for cross-project applications of the models and the datasets – which is especially beneficial for new platforms lacking historical data or an in-house testing practice. Having an approach grounded in system-specific complexity helps firms build a foundation for operational resilience going forward. Incumbents planning a technology transformation can use it to assess the current state of the system and consolidate the testing practice, for better tracking of regression issues.

Exactpro has been a trusted industry partner in helping financial organisations develop and temper their digital resilience via independent resource-conscious AI-enabled software testing. To find out how our AI Testing approach can facilitate your digital transformation, reach out to us via info@exactpro.com.

References: