Trust & measurement - Semilattice

How we measure

Every user model is automatically tested using leave-one-out cross-validation. For each question in the seed data, the model predicts the answer using all the other questions, then compares the prediction to the real response distribution. This gives you a baseline accuracy score before you’ve asked a single new question.

The metrics:

Accuracy: how close the predicted distribution is to reality, expressed as a percentage. Higher is better.

Squared error: the average squared difference between predicted and actual percentages. Lower is better.

Normalised information loss: measures how much information is lost between the real distribution and the prediction, normalised for question complexity. Lower is better. This is our preferred single metric because it handles questions with many answer options more fairly.

How you measure

Cross-validation tells you how the model performs on its own data. Test batches tell you how it performs on yours.

Upload ground truth data (real survey responses from your domain) and Semilattice tests the model against questions it has never seen. This is the most direct measure of whether a user model is useful for your specific problem.

What do the accuracy numbers mean?

Our UK Consumer Finance user models score 90% on built-in cross-validation. The score tells you how well the model predicts within its own domain. You can see the accuracy score for every user model in the dashboard. A model that scores highly on cross-validation has strong internal consistency — the seed data is coherent enough for the model to generalise within it.

When should I not trust a prediction?

When the question requires knowledge the model’s seed data doesn’t cover. Financial literacy questions are a known weakness: the model predicts what people should know, not what they do know. Test batches help you find these blind spots before they matter.

How consistent are predictions?

The simulation engine produces stable results. The same question asked multiple times returns similar distributions. Small variations exist because the underlying language model is stochastic, but the distributions are stable enough for decision-making.

What's the difference between accuracy and measurement?

Accuracy is a number: 91%, 85%, 78%. Measurement is the practice of generating that number, understanding what it means, and knowing when to trust it. We publish our methodology, provide tools to test it yourself, and flag the cases where the model struggles. The goal is enough signal to make better decisions than you would without it.

How does this compare to traditional survey accuracy?

Traditional surveys have their own accuracy problems: sampling bias, question framing effects, low response rates, social desirability bias. They’re considered ground truth but they aren’t perfect either. Semilattice is a complement: fast signal when you can’t wait for a survey, and a way to generate hypotheses that a survey can then validate.

Measurement & Trust

​How we measure

​How you measure

How we measure

How you measure