Research Papers

An approach to create a synthetic financial transactions dataset based on NDA-protected dataset

This paper outlines an experiment in building an obfuscated version of a proprietary financial transactions dataset. As per industrial requirements, no data from the original dataset should find its way to third parties, so all the fields were generated artificially, including banks, customers (including geographic locations), and particular transactions. However, we set our goal to keeping as many distributions and correlations from the original dataset as possible, with adjustable levels of introduced noise in each of the fields: geography, bank-to-bank trans-action flows, and distributions of volumes/numbers of transactions in various subsections of the dataset. The article could be of use to anyone who may want to produce a publishable dataset, e.g., for the alternative data market, where it’s essential to keep the structure and correlations of the proprietary non-disclosed original dataset.

Read more

Model-based Testing Approach for Financial Technology Platforms: An Industrial Implementation

This paper looks at the industrial experience of using automated model-based approach for the testing of trading systems. The approach, used by Exactpro, is described using two existing taxonomies. Then, the main future applications of the models in the test automation paradigm are outlined. In our approach a model is kind of a virtual replica of the test system generating expected results from the input (built based on the specifications). Models are created in python which provides flexibility of describing complex financial systems behaviors. Models are integrated into the Exactpro th2 test automation framework, and expected results from the system under test and model are compared automatically.

Read more

Early Detection of Tasks With Uncommonly Long Run Duration in Post-Trade Systems

The paper describes authors' experience of implementing machine learning techniques to predict deviations in service workflows duration, long before the post-trade system reports them as not completed on time. The prediction is based on analyzing a large set of performance metrics collected every second from modules of the system, and using regression models to detect running workflows that are likely to be hung. This article covers raw data preprocessing, dataset dimensionality reduction, applied regression models and their performance. Problems to be resolved and project roadmap are also described.

Read more

Data Stream Processing in Reconciliation Testing: Industrial Experience

The focus area of this research is around tools and methods of reconciliation testing, an approach to software testing that relies on the data reconciliation concept. The importance of such a test approach is steadily increasing across different knowledge domains, triggered by growing data volumes and overall complexity of present-day software systems. The paper describes the software implementation created as part of the authors’ industrial experience with data streaming analysis for the task of reconciliation testing of complex financial technology systems. The described solution is a Python-based component of an open-source test automation framework build as a Kubernetes-based microservices platform. The paper outlines the advantages and disadvantages of the approach as well as compares it to existing state-of-the-art solutions allowing for data streaming analysis and reconciliation checks.

Read more

Black-Box Testing of Financial Virtual Assistants

We propose a hybrid technique of black-box testing of virtual assistants (VAs) in the financial sector. The specifics of the highly regulated industry imposes numerous limitations on the testing process: GDPR and other data protection requirements, the absence of interaction logs with real users, restricted access to internal data, etc. These limitations also decrease the applicability of a few VA testing methods that are widely described in the research literature. The approach suggested in this paper consists of semi-controlled interaction logging from the trained testers and subsequent augmenting the collected data for automated testing.

Read more