Creating Test Data for Market Surveillance Systems with Embedded Machine Learning Algorithms

Abstract — Market surveillance systems, used for monitoring and analysis of all transactions in the financial market, have gained importance since the latest financial crisis. Such systems are designed to detect market abuse behavior and prevent it. The latest approach to the development of such systems is to use machine learning methods. The approach presents a challenge from the standpoint of quality assurance and the standard testing methods. We propose several types of test cases which are based on the equivalence classes methodology. The division into equivalence classes is performed after the analysis of the real data used by real surveillance systems. This paper describes our findings from using this method to test a market surveillance system that is based on machine learning techniques.

Keywords—test data, equivalence classes, market surveillance systems, machine learning.

I. Introduction

A. Market surveillance systems

Electronic trading platforms have become an increasingly important part of the financial market in recent years. They are obligated to take legal responsibilities [1], [2] and correspond to the law and the regulatory requirements. Therefore, all market events in the contemporary electronic trading platforms are monitored and analysed by market surveillance systems.

Such systems are designed to detect market abuse behavior and prevent it. Their main goals are detection and prevention of such market abuse cases as insider trading, intentional and aggressive price positioning, creation of fictitious liquidity, money laundering, marking the close, etc. [3]. Different data mining methods are used for improving the quality of the surveillance systems work [4], [5], [6], [7], [8], [9].

B. Quality assurance for market surveillance systems

The standard quality assurance (QA) methods and technologies seem to be powerless in regard to machine learning (ML) applications. C. Murphy, G. Kaiser, M. Arias even introduced a concept of ”non-testable” applications [10]. From the QA perspective, we do not have to test whether an ML algorithm is learning well, but to ensure that the application uses the algorithm correctly, implements the specification and meets the users expectations. In this paper, we employ the term testing in accordance with the QA theory.

It is clear that a sufficient input data set is needed for high-quality testing coverage. Furthermore, the testing data set should be as close as possible to the real data or should even be real. So, which approach should be used for creating data to verify the implementation of an ML-algorithm more fully?

We can test a market surveillance system in the following ways:

  1. By creating test cases which are based on the knowledge of the business rules from the specification. Such test data are similar to the real users behaviour.
  2. By generating various datasets which contain different combinations of variables.

Both variants are suitable for the surveillance systems that use standard control flows, like loops or choices. For standard systems, there is a set of rules, which allows getting a clear output result for specific input data. When it comes to intelligence systems, it is not normally obvious what will happen as a result of certain input because an ML algorithm builds its own dependencies between the variables and human interpretation of such dependencies is impossible. Because of this, it is important to be able to create a set of test cases that will generate obvious and predictable output. Therefore, the second approach to generating the test data allows for the creation of output that is easily interpretable.

C. Contribution

This paper introduces the following contribution:

  1. Creating a model for classifying the transactions. This model can be used for the detection of market manipulations.
  2. Test cases generation for defining the weaknesses of the created model. The test cases are based on the equivalence classes.
  3. Testing the prototype based on the created model and the analysis of the received results.


The article was first published in the Preliminary Proceedings of the 11th Spring/Summer Young Researchers’ Colloquium on Software Engineering (SYRCoSE 2017). Download the full version: