Population model creation is a paid feature on the Play and Launch pricing plans. Visit the account page in the dashboard to upgrade.

Custom populations allow you to make predictions for your specific audience segments, customers, or user groups. This guide shows how to create populations programmatically using the Semilattice API.

Prerequisites

Before creating a population, ensure you have:

  • Seed data from a target group of users or customers in the required CSV format
  • A valid Semilattice API key
  • A paid subscription on the Play or Launch pricing plans
  • The Semilattice SDK installed

Creating a population

from pathlib import Path
from semilattice import Semilattice

semilattice = Semilattice()

# See simulation options section below
simulation_options = {
    "question_options": [
        {
            "question_number": 1,
            "question_type": "single-choice"
        },
        {
            "question_number": 2,
            "question_type": "multiple-choice",
            "limit": 3
        }
    ]
}

population = semilattice.populations.create(
    name="Developer Survey Q4 2024",
    description="Survey of software developers conducted in Q4 2024",
    seed_data=Path("developer_survey_data.csv"),
    reality_target="Software Developers",
    simulation_engine="answers-1",
    simulation_options=simulation_options,  # see below
    run_test=False
)

Parameters

ParameterTypeDescription
namestringDisplay name for your population
descriptionstringDetailed description or documentation link
seed_datafile/pathCSV file containing QA pair data
reality_targetstringDescription of the target user profile
simulation_enginestringEngine version (typically “answers-1”)
simulation_optionsobjectQuestion type configuration for proper simulation (see below)
run_testbooleanWhether to automatically test accuracy after creation

Simulation options

The simulation_options parameter configures how questions are interpreted during simulation:

simulation_options = {
    "question_options": [
        {
            "question_number": 1,
            "question_type": "single-choice"
        },
        {
            "question_number": 2,
            "question_type": "multiple-choice",
            "limit": 2
        }
    ]
}

Field Descriptions:

  • question_number: A 1-based index reference to the column in the seed data file. For example, if the first question column in your CSV is a single-choice question, you would set "question_number": 1.

  • question_type: An enum describing the data type for that column in the seed data file:

    • single-choice: Column cells contain strings representing individual answers (e.g., “Python”, “JavaScript”)
    • multiple-choice: Column cells contain valid lists of strings representing sets of answers (e.g., [“Python”, “JavaScript”, “Go”])
    • open-ended: Column cells contain any string representing free text answers
  • limit (optional): For multiple-choice questions only, sets the maximum number of choices a respondent can select during simulation.

Requirements:

  • There must be one object for each question column in the seed data file in the “question_options” list field
  • The question_number values should correspond to the actual column positions in your CSV file
  • Question types must match the actual data format in your seed data

This configuration ensures test simulations run correctly when evaluating population accuracy. For more details on data format requirements, see the Seed Data Requirements guide.

Population status flow

When you create a population, it goes through several status stages:

1. Processing

Initially, your population will have a status of Processing while Semilattice processes and stores your data.

print(f"Population status: {population.data.status}")
# Output: Population status: Processing

2. Untested

Once processing completes, the status changes to Untested. At this point:

  • The population is ready to make predictions
  • We don’t yet know its estimated accuracy
# Check status periodically
updated_population = semilattice.populations.get(population.data.id)
print(f"Status: {updated_population.data.status}")
# Output: Status: Untested

3. Testing

When you trigger accuracy testing, either automatically or manually (see below), the status changes to Testing while Semilattice evaluates the population’s prediction accuracy.

4. Tested

Once testing completes, the status becomes Tested and the population will have accuracy metrics available for review.

Testing population accuracy

To understand how accurate your population will be, you need to run a population test.

Option 1: Automatic testing

Set run_test=True when creating the population:

population = semilattice.populations.create(
    name="Developer Survey Q4 2024",
    description="Survey of software developers conducted in Q4 2024",
    seed_data=Path("developer_survey_data.csv"),
    reality_target="Software Developers",
    simulation_engine="answers-1",
    simulation_options={},
    run_test=True  # Automatically test after processing
)

With automatic testing, your population will progress through these statuses: ProcessingUntestedTestingTested

Option 2: Manual testing

Alternatively, you can trigger testing manually on an Untested population:

# Trigger testing manually
population = semilattice.populations.test(
    population_id=population.data.id
)
# population.data.status will be "Testing"

Understanding test results

Once testing is complete, your population will have:

  • Status: Tested
  • Accuracy metrics: Estimated accuracy scores

Learn more about interpreting these metrics and performance thresholds in the Accuracy Evaluation section.