Tests bring value by preventing bugs, but they come with a maintenance burden. As code evolves, over-asserting in tests can bake in bad assumptions or create future busy work. Moving assertions from test code into application code and randomizing test input can help you get more value and less pain from fewer tests.


Imagine this scenario. You’re developing a new feature or a bug fix. You finish up, run the full test suite, and see several test failures. How do you feel? Are you...

  • Happy that the tests caught the issue before your customers found out?
  • Grumpy that you have to deal with some irrelevant test failure?

Ask any modern software developer, and they'll tell you that testing is important to having a stable project. However, I’m sure they’ll also have lots of stories about bugs that weren't caught by tests, tests that didn't catch what they meant to catch, tests that were too cumbersome to ever write, and existing tests that were annoying to maintain. How do we get to the world where developers are overjoyed to see test failures? How do we get the most out of our tests?

Writing useful tests is an art.

This idea is described well by Convex’s very own James Cowling’s talk at the CMU Database Group.

A test suite is good if it adds value.

  • It succeeds when the functionality it tests is working.
  • It fails upon the introduction of a real issue.

A test suite is bad when it...

  • continues to succeed upon introduction of a regression (bad coverage),
  • fails randomly (flaking), or
  • slows down development by asserting too tightly.

This post is going to focus on this final bullet. If you’ve ever made a seemingly innocuous change only to watch 100 tests annoyingly fail in some mundane way, then you know the cost of that final bullet all too well. You feel like you’ve been subjected to a barrage of busywork. Worst of all, you might not be empowered to do something about it. After all, they’re not your tests. You feel that you’re supposed to be working on your feature, not wrangling preexisting tests.

Useful failures

One way to think about the long term regression testing value of a test is useful failure rate, which we’ll define as the fraction of test failures that are caused by issues you actually care about. These are the kinds of test failures that make you happy, not grumpy.

Here’s an example - a simplified document store.

struct Documents {
    documents: Vec<String>,
    total_size: usize,
}

impl Documents {
    fn new() -> Self {
        Self { documents: vec![], total_size: 0 }
    }

    fn add(&mut self, document: String) {
        self.total_size += document.len();
        self.documents.push(document);
    }
}

#[test]
fn test_add() {
    let mut docs = Documents::new();
    docs.add("convex".into());
    docs.add("rocks".into());
    assert_eq!(docs.total_size, 11);
}

Let’s add some functionality - initializing the document store with some starter documents.

impl Documents {
    fn new() -> Self {
        let mut docs = Self { documents: vec![], total_size: 0 };
        docs.add("starter_doc".into());
        docs
    }
    // ...
}

Oh no, the test fails. The code is actually correct: It’s just the assertion that’s wrong. This is an annoying failure.

---- test_add stdout ----
thread 'test_add' panicked at 'assertion failed: `(left == right)`
 left: `22`,
right: `11`', src/main.rs:28:5

The simple route would be to update the test’s assertion and move on. However, this creates a maintenance burden. All subsequent future changes to the set of starter documents will cause the test to fail, creating dreaded busywork. Worst of all, in the future, the assertion could get copy-pasted into many tests as the code evolves.

For this test, we can improve the maintenance cost by moving the assertion into the code

impl Documents {
    // ...
    fn add(&mut self, document: String) {
        self.total_size += document.len();
        self.documents.push(document);

        assert_eq!(
            self.total_size,
            self.documents.iter().map(|d| d.len()).sum(),
        );
    }
}

#[test]
fn test_add() {
    let mut docs = Documents::new();
    docs.add("convex".into());
    docs.add("rocks".into());
}

If we want to avoid being accidentally quadratic on our builds, we can use a debug assert to limit to debug builds or tests. Thankfully, rust makes this easy with #[cfg(test)].

impl Documents {
    // ...
    fn add(&mut self, document: String) {
        self.total_size += document.len();
        self.documents.push(document);
	
        #[cfg(test)]
        assert_eq!(
            self.total_size,
            self.documents.iter().map(|d| d.len()).sum(),
        );
    }
}

#[test]
fn test_add() {
    let mut docs = Documents::new();
    docs.add("convex".into());
    docs.add("rocks".into());
}

With these small changes, we’ve simplified the test, and validated invariants within the application. Now, we can flexibly change the implementation of add easily with confidence. Rather than baking assumptions into the test, we are testing the core invariants of our code.

What you'll notice is that the useful failure rate is not just a function of the test quality. Application design and tests must be designed and co-evolve together. In our example, the new functionality required rethinking existing tests.

The lesson? Re-evaluate your tests and testing infrastructure as you do the same with your application code. Their value can change as kind of development happening around it changes (e.g. feature development, project maturity, company evolution, ratio of new/experienced engineers).

Test evolution is an art. Tests aren’t inherently good or bad. They provide value by preventing bugs. It’s good practice to periodically look at tests and reason about whether they are more likely to cause you to be happy or grumpy when they fail. Then, evolve your test suite accordingly.

This begs the question. How do we know that we’re effectively exploring our code?

Randomized Testing

Randomized testing (e.g. QuickCheck) is a testing technique that generates random test cases in order to poke around corner cases of your code.

#[quickcheck_macros::quickcheck]
fn qc_add(adds: Vec<String>) {
    let mut docs = Documents::new();
    for add in adds {
        docs.add(add);
    }
}

Well-designed randomized testing allows us to generate test cases that can continue to provide value over time. We've found great value in randomized tests. You can get a lot of bang for your buck, writing a relatively small amount of testing to get a lot of coverage.

There are a few key design guidelines to getting value out of randomized testing

  • Well-defined component boundaries create natural units to randomize inputs.
  • Deterministic code execution (which we love) makes rare errors reproducible.
  • Debug assertions on logic errors within the app code validate internal consistency while testing explores strange inputs.
  • Intelligent test case input distribution helps explore low-probability paths.
  • In QuickCheck, this involves implementing the [Arbitrary] trait.
  • For example, if our Documents::add is more likely to fail on strings of exactly length 32, we should generate those strings more often.
  • Fewer assertions within test code, to avoid baking in bad assumptions or bugs to the tests. When needed, they should stay simple.

Randomized tests written in this way stand the test of time. They probe the behavior of the code without needing to understand the implementation. They are unlikely to false-positive during normal development and avoid the pitfall of baking assumptions into the test. Even as the implementation changes, the asserts within the application will flag true failures. The ratio of test code to application code can stay small while holding test value.

When developing features, randomized testing can help gain validation while mitigating the frustrations of updating the test suite. When designed well, randomized test failures will result in reproducible, minimized assertion failures within your application code. Most failures will be useful.

Randomized Testing trophies

One great practice is to encode "trophies" or strange, powerful test cases that were caught by randomized testing in the past. Add a test case which iterates through your trophy case (Vec<TestInput>) to make sure they never regress. It also feels great to build your trophy case over time. It's something to be proud of!

Shape Analysis at Convex

Shape Analysis is the feature we provide to analyze the contents of your table columns to understand and expose their types. As you add and delete elements to columns, we track and analyze the “shape” of your column: its current inferred type.

At Convex, we've used randomized testing to validate our shape analysis. We use cheap asserts as well as expensive inline debug consistency checkers to get a lot of value out of our short tests.

Check out this actual test sample from our shape reduction code, validating that insert+delete preserves the shape. These tests also validate that shape’s methods don’t panic, even with weird inputs.

mod insert_remove_inverses {
    use quickcheck_macros::quickcheck;

    use super::parse_json;
    use crate::{
        shapes::Shape,
        value::Value,
    };

    fn test(start_value: Value, value: Value) -> bool {
        let shape = Shape::shape_of_value(&start_value);
        let inserted = shape.insert_value(&value);
        let removed = inserted.remove_value(&value).unwrap();
        removed == shape
    }

    #[quickcheck]
    fn quickcheck_test(start_value: Value, value: Value) -> bool {
        test(start_value, value)
    }

    #[test]
    fn qc_trophies() -> anyhow::Result<()> {
        let trophies = [
            ("0", r#"{"a": null}"#),
            ("{}", r#"[{"a": [[], [0.5, {}], [{"a": []}]]}]"#),
            ("false", r#"[[["a", {}], ["b"], ["c", false]]]"#),
        ];

        for (start_value, value) in trophies.iter() {
            let start_value = parse_json(start_value)?;
            let value = parse_json(value)?;
            assert!(test(start_value, value));
        }
        Ok(())
    }
}

Get on board

We love writing, pruning, refocusing, and improving our test suite to get the most out of our tests. Quality over quantity. We want our developers to get the most bang for their effort, which we want to reflect in our product for our customers. We want to take the bummer out of backends. If this is your jam, give us a try or come join us.