DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • DataWeave: Play With Dates (Part 1)
  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • The Long Road to Java Virtual Threads
  • Exploring Exciting New Features in Java 17 With Examples

Trending

  • Securing Cloud Storage Access: Approach to Limiting Document Access Attempts
  • Secure Your API With JWT: Kong OpenID Connect
  • Maximizing Developer Efficiency and Productivity in 2024: A Personal Toolkit
  • Exploring the Frontiers of AI: The Emergence of LLM-4 Architectures
  1. DZone
  2. Data Engineering
  3. Data
  4. Datafaker: An Alternative to Using Production Data

Datafaker: An Alternative to Using Production Data

As developers or testers, we frequently have the need to test our systems. But getting access to realistic or useful data isn't always easy.

By 
Erik Pragt user avatar
Erik Pragt
·
May. 22, 22 · Tutorial
Like (12)
Save
Tweet
Share
14.7K Views

Join the DZone community and get the full member experience.

Join For Free

As developers or testers, we frequently have the need to test our systems. In this process, be it unit testing, integration testing, or any other form of testing, the data is often the leading and deciding factor. But getting access to good data isn't always easy. Sometimes the data is quite sensitive, like medical or financial data. At other times, there's not enough data (for example, when attempting a load test), or sometimes the data you're looking for is hard to find. For cases like the above, there's a solution, called Datafaker.

Datafaker is a library for the JVM suitable to generate production-like fake data. This data can be generated as part of your unit tests or can be generated in the form of external files, such as CSV or JSON files, so it can serve as the input to other systems. This article will show you what Datafaker is, what it can do, and how you can use it in an effective way to improve your testing strategy.

What Is Datafaker?

Datafaker is a library written in Java and can be used by popular JVM languages such as Java, Kotlin, or Groovy. It started as a fork of the no longer maintained Javafaker, but it has seen many improvements since its inception. Datafaker consists of a core to handle the generation of data, and on top of that has a wide variety of domain-specific data providers. Such providers can be very useful, for example, to generate real-looking addresses, names, phone numbers, credit cards, and other data, or are sometimes a bit more on the light side, such as when generating the characters of the TV show Friends or the IT Crowd. No matter your use case, there's a high chance that Datafaker can provide data to your application. And, when there's a provider of data available, Datafaker provides the option for a pluggable system to create your own providers!

How to Use Datafaker

Datafaker is published to Maven Central on a regular basis, so the easiest way to get started with Datafaker is to use a dependency management tool like Maven or Gradle. To get started with Datafaker using Maven, you can include the dependency as follows:

XML
 
<dependency>
    <groupId>net.datafaker</groupId>
    <artifactId>datafaker</artifactId>
    <version>1.4.0</version>
</dependency>


Above, we're using version 1.4.0, the latest version at the time of writing this article. To make sure you're using the latest version, please check Maven Central.

Once the library has been included in your project, the easiest way to generate data is as follows:

Java
 
import net.datafaker.Faker;

Faker faker = new Faker();
System.out.println(faker.name().fullName()); // Printed Vicky Nolan


If you need more information, there's an excellent getting started with Datafaker guide in the Datafaker documentation.

A few things are going on which are maybe not immediately visible. For one, whenever you run the above code, it will print a random full name, consisting of the first name and last name. This name will be different every time. In our example above, it's using the default locale (English), and a random seed, which means a random name will be generated every time you run the above code. But if we want something a bit more predictable, and use perhaps a different language, we can:

Java
 
long seed = 1;
Faker faker = new Faker(new Locale("nl"), new Random(seed));
System.out.println(faker.name().fullName());


In the above example, we generate a random Dutch full name, but since we're using a fixed seed now, we know that no matter how often we run our code, the program will produce the same random values on every run. This helps a great deal if we want our test data to be slightly more repeatable, for example when we're doing a regression test.

While the above example shows how to generate names, it's possible to generate a very wide range of random data. Examples of these are addresses, phone numbers, credit cards, colors, codes, etc. A full list of these can be found in the documentation (https://www.datafaker.net/documentation/providers/). Besides these, Datafaker provides also more technical options such as random enums and lists, to make it easier to generate your random test data.

Custom Collections

In case you need to generate a larger set of test data, Datafaker provides several options to do so. One of these options is to use Fake Collections. Fake collections allow the creation of large sets of data in memory by providing a set of data suppliers to the collection method. This is best demonstrated using an example:

Java
 
List<String> names = faker.<String>collection()
    .suppliers(
        () -> faker.name().firstName(),
        () -> faker.name().lastName())
    .minLen(5)
    .maxLen(10)
    .build().get();


The above will create a collection of Strings with at least 5 elements, but with a maximum of 10 elements. Each element will either be a first name or a last name. It's possible to create many variations of the above, and similar examples are possible even when the data types are different:

Java
 
List<Object> data = faker.collection()
    .suppliers(
        () -> faker.date().future(10, TimeUnit.DAYS),
        () -> faker.medical().hospitalName(),
        () -> faker.number().numberBetween(10, 50))
    .minLen(5)
    .maxLen(10)
    .build().get();

System.out.println(data);


This will generate a list of Objects, since the `future`, `hospitalName` and `numberBetween` generators all have different return types.

Custom Providers

While Datafaker provides a lot of generators out of the box, it's possible that generators are missing, or that some of the generators work slightly different than your use-case needs. To support cases like this, it's possible to create your own data provider, either by providing a YML configuration file or by hardcoding the possible values in your code.

To create a provider of data, there are two steps involved: creating the data provider and registering the data provider in your custom Faker. An example can be found below, in which we'll create a specific provider for generating turtle names:

Java
 
class Turtle {
    private static final String[] TURTLE_NAMES = new String[]{"Leonardo", "Raphael", "Donatello", "Michelangelo"};
    private final Faker faker;

    public Turtle(Faker faker) {
        this.faker = faker;
    }

    public String name() {
        return TURTLE_NAMES[faker.random().nextInt(TURTLE_NAMES.length)];
    }
}


Since all methods to access providers in the Faker class are static, we need to create our own custom Faker class, which will extend the original Faker class so we can use all existing data providers, plus our own:

Java
 
class MyCustomFaker extends Faker {
    public Turtle turtle() {
        return getProvider(Turtle.class, () -> new Turtle(this));
    }
}


Using the custom faker is similar to what we've seen before:

Java
 
MyCustomFaker faker = new MyCustomFaker();
System.out.println(faker.turtle().name());


If you want to know more about creating your own provider, or using YML files to provide the data, the Datafaker custom provider documentation provides more information on this subject.

Exporting Data

Sometimes, you want to do more than generate the data in memory, and you might need to provide some data to an external program. A commonly used approach for this would be to provide the data in CSV files. Datafaker provides such a feature out of the box, and besides generating CSV files, it also has the option to generate JSON, YML, or XML files without the need for external libraries. Creating such data is similar to creating collections of data, which we've seen above.

Generation of files could be done in several ways. For instance, sometimes it is required to generate a document with random data. For that purpose, to generate a CSV file with random data, use the `toCsv` method of the `Format` class. An example can be found below:

Java
 
System.out.println(
    Format.toCsv(
            Csv.Column.of("first_name", () -> faker.name().firstName()),
            Csv.Column.of("last_name", () -> faker.name().lastName()),
            Csv.Column.of("address", () -> faker.address().streetAddress()))
        .header(true)
        .separator(",")
        .limit(5).build().get());


In the example above, 5 rows of data are generated, and each row consists of a first name, last name, and street address. It's possible to customize the generation of the CSV, for example by including or excluding the header, or by using a different separator char. More information on different options and examples of how to generate XML, YML, or JSON files can be found in the Datafaker fileformats documentation.

Exporting Data With Some Constraints

There is another way of CSV generation. So-called conditional generation when there are some constraints between data. Imagine we want to generate a document containing a person's name and his/her interests and a sample of interests. For the sake of simplicity, we are going to consider 2 fields of interest: Music and Food. For "Music" we want to see a sample of the music genre, for “Food” we want to see a sample of a dish e.g.

Plain Text
 
"name";"field";"sample"
"Le Ferry";"Music";"Funk"
"Mrs. Florentino Schuster";"Food";"Scotch Eggs"


To do that we need to generate a collection of such data.

First let's create rules for generating the objects, for instance:

Java
 
class Data {
    private Faker faker = new Faker();
    
    private String name;
    private String field;
    private String interestSample;

    public Data() {        
        name = faker.name().name();
        field = faker.options().option("Music", "Food");
        switch (field) {
            case "Music": interestSample = faker.music().genre(); break;
            case "Food": interestSample = faker.food().dish(); break;
        }
    }

    public String getName() {
        return name;
    }

    public String getField() {
        return field;
    }

    public String getInterestSample() {
        return interestSample;
    }
}


Now we can use the Data class to generate CSV data like demonstrated below:

Java
 
String csv = Format.toCsv(
        new Faker().<Data>collection()
            .suppliers(Data::new)
            .maxLen(10)
            .build())
    .headers(() -> "name", () -> "field", () -> "sample")
    .columns(Data::getName, Data::getField, Data::getInterestSample)
    .separator(";")
    .header(true)
    .build().get();


This will generate a CSV string with headers and columns containing random data, but with constraints between the columns we specified.

Conclusion

This article gave an overview of some of the options provided by Datafaker and how Datafaker can help in addressing your testing needs.

For suggestions, bugs, or other feedback, head over to the Datafaker project site and feel free to leave some feedback.

Test data CSV Production (computer science) Strings Data Types

Opinions expressed by DZone contributors are their own.

Related

  • DataWeave: Play With Dates (Part 1)
  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • The Long Road to Java Virtual Threads
  • Exploring Exciting New Features in Java 17 With Examples

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: