Introducing the latest evolution in test data generation

By Jeffrey Hughes, BlazeMeter

Mar 10, 2022

3 minutes

Perforce

Developing reliable code depends on thorough testing, but with teams under increasing pressure to create and deliver code faster, testing is sometimes seen as essential but can slow down processes. Plus, within testing, the biggest bottleneck of all has been obtaining test data. Testers report that they spend 60% of their time waiting or searching for test data. The good news is that with the next generation of test data solutions, teams do not need to trade test agility for code quality.

All tests need data, and where many companies run into issues is when they do not have the correct data or only in small quantities that are insufficient for more complete testing. Often, testers are building that data manually on spreadsheets, which are cumbersome, error prone, and time-consuming. As a result, teams often rely on 'happy path testing', whereby testing is done at a superficial level with very clean and simple data.

To robustly test code, it is essential to create some negative scenarios. For example, what if someone enters a negative number into a banking application for a deposit they are making? Will this be tested with a negative value? The challenge is that testing for all real-world scenarios adds complication and considerable time. The risk, however, is that an application is not going to perform when variations of test data are not used. Another example is stress testing applications that need to be very responsive to a Black Friday offer or during a company's promotional campaigns. Applying large amounts of relevant test data to testing is also vital in these scenarios.

Subsetting and Masking Data

Tools to extract and manage test data already exist and are widely used in financial services, arguably leading the way in test data. Such tools must ensure the protection of production data for testing purposes and many regulations exist to enforce protection of sensitive user information. Many organisations use masking and sub-setting to satisfy regulatory requirements. This technique takes a section of their production database (subsetting) and then scrambles (masks) the data so that it is unrecognisable from the original data. For example, data such as a first and last name, credit card numbers and other confidential customer data could be pulled from actual production data and then masked to obfuscate the information.

A better way to generate test data

Many organisations are moving to synthetic data creation because it can be a faster way to generate test data and do it without the compliance headaches. Furthermore, the latest evolution of agile testing tools provide a way to produce synthetic data on the fly, linking data immediately to a test and reusing the data multiple times across multiple test scenarios. This means that development sprints no longer need to be held up by a lack of test data.

These new testing tools also have other benefits. First, because of an easy user interface they are accessible and usable by non-technical testers, meaning more people can become involved in this essential part of software quality control. Second, because these tools are cloud-based, tests can be spun up fast, resources scaled up or down as required, and made available to team members in any location. Also, this is a less labour-intensive way of carrying out different types of tests fast.

Test data is often used for functional, performance and regression testing, but the latest synthetic data generation techniques also support data for mock services (also referred to as virtual services). A mock service is basically a technique that is used to mimic a response from a service. For example, imagine a developer is testing a banking application where the developer needs to test a log-in sequence with a back-end mainframe. Rather than querying the mainframe (which can incur financial costs) the developer will build a simple mock service to simulate a mainframe response.

However, a simple request/response pair really does not test the actual service in the background: itjust responds affirmatively so that the test can proceed. A better approach is to provide dynamic test data for the mock to more accurately simulate a variety of responses that could be received from the backend service. Having a way to generate test data for all types of tests, including mocks, will result in more complete testing.

Traditional test data tools still matter

Large-scale testing is still required for many organisations' applications. Some banks (or other types of businesses) need to generate millions, or even billions, of rows of data and so there is still a need for on premises test data management tools. However, in situations where a nimbler approach is needed for meeting a sprint deadline or testing a new feature, then a cloud-based testing tool can often work faster and better. This is where the latest in synthetic test data generation, such as Test Data, a new feature within BlazeMeter by Perforce, can add complementary value. Indeed, traditional, and new test data approaches can sit side-by-wide to give a comprehensive choice of testing options for your different scenarios.

The scale and complexity of software will only grow, so having this greater flexibility around test data, with the ability to test at speed and simultaneously produce even higher quality code, is the new world of testing.

https://www.blazemeter.com/product/test-data