Managing Testsets

This guide covers how to create, list, and retrieve testsets using the Agenta SDK for evaluation purposes.

Open in Google Colaboratory

Async examples

Agenta's SDK uses async APIs. In Jupyter/Colab you can use top-level await. In a regular Python script, wrap async code like this:

import asyncio

async def main():
    ...

asyncio.run(main())

Creating a Testset

Use ag.testsets.acreate() to create a new testset with data:

Python SDK

import agenta as ag

# Create a testset with simple data
testset = await ag.testsets.acreate(
    data=[
        {"country": "Germany", "capital": "Berlin"},
        {"country": "France", "capital": "Paris"},
        {"country": "Spain", "capital": "Madrid"}
    ],
    name="Country Capitals",
)

testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")

Parameters:

data: A list of dictionaries containing your test data. Each dictionary represents one testcase.
name: The name of your testset.

Returns: A TestsetRevision object containing:

id: The UUID of the created testset revision
testset_id: The parent testset UUID (stable across revisions)
name: The testset name
slug: The revision slug
version: The revision version string (e.g. "1")
data: The test data (with testcases structure)

Sample Output:

{
    "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
    "name": "Country Capitals",
    "slug": "3ad5d688da6c",
    "data": {
        "testcases": [
            {"data": {"country": "Germany", "capital": "Berlin"}},
            {"data": {"country": "France", "capital": "Paris"}},
            {"data": {"country": "Spain", "capital": "Madrid"}}
        ]
    }
}

tip

The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.

Upserting a Testset

Use ag.testsets.aupsert() to create a new testset or update an existing one if it already exists:

Python SDK

import agenta as ag

# Create or update a testset
testset = await ag.testsets.aupsert(
    name="Country Capitals",
    data=[
        {"country": "Germany", "capital": "Berlin"},
        {"country": "France", "capital": "Paris"},
        {"country": "Spain", "capital": "Madrid"},
        {"country": "Italy", "capital": "Rome"},
    ],
)

testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")

Parameters:

name (required): The name of your testset. Used to find existing testset.
data (required): A list of dictionaries containing your test data.
testset_id (optional): If provided, updates the testset with this specific ID.

Returns: A TestsetRevision object with the created or updated testset.

When to use upsert vs create

Use aupsert() when you want to update an existing testset with the same name, or create it if it doesn't exist. This is useful in CI/CD pipelines where you want to keep testsets synchronized. Use acreate() when you explicitly want to create a new testset every time.

Listing Testsets

To list all testsets in your project, use ag.testsets.alist():

Python SDK

import agenta as ag

# List all testsets
testsets = await ag.testsets.alist()

print(f"Found {len(testsets)} testsets:")
for testset in testsets:
    testset_id = testset.testset_id or testset.id
    print(f"  - {testset.name} (testset_id: {testset_id})")

Parameters: None required.

Returns: A list of TestsetRevision objects. For each item:

id: The latest revision UUID
testset_id: The parent testset UUID
name: The testset name
slug: The revision slug
Additional metadata fields

Sample Output:

[
    {
        "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
        "name": "Country Capitals",
        "slug": "country-capitals"
    },
    {
        "id": "01963520-4e4a-8761-91df-4be6e799eb7d",
        "name": "Math Problems",
        "slug": "math-problems"
    }
]

Retrieving a Testset by ID

To retrieve a specific testset by its ID, use ag.testsets.aretrieve():

Python SDK

import agenta as ag

# Retrieve a specific testset (using the testset_id from creation)
testset = await ag.testsets.aretrieve(testset_id=testset_id)

if testset:
    print(f"Retrieved testset: {testset.id}")
    print(f"Testcases count: {len(testset.data.testcases) if testset.data and testset.data.testcases else 0}")
else:
    print("Testset not found")

Parameters:

testset_id: The UUID of the testset to retrieve

Returns: A TestsetRevision object (or None if not found) containing:

id: The testset revision UUID
testset_id: The parent testset UUID
slug: The revision slug
version: The revision version number
data: The TestsetRevisionData with all testcases

Sample Output:

{
    "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
    "testset_id": "01963413-3d39-7650-80ce-3ad5d688da6c",
    "slug": "3ad5d688da6c",
    "version": "1",
    "data": {
        "testcases": [
            {"data": {"country": "Germany", "capital": "Berlin"}},
            {"data": {"country": "France", "capital": "Paris"}},
            {"data": {"country": "Spain", "capital": "Madrid"}}
        ]
    }
}

info

Testsets are versioned. Each update via ag.testsets.aedit() or ag.testsets.aupsert() creates a new TestsetRevision, while the parent testset_id stays the same.

Retrieving a Testset by Name

While there's no dedicated function for this, you can easily find a testset by name by filtering the results from ag.testsets.alist():

Python SDK

import agenta as ag

async def get_testset_by_name(name: str):
    """Helper function to find a testset by name."""
    testsets = await ag.testsets.alist()

    if not testsets:
        return None

    for testset in testsets:
        if testset.name == name:
            return testset

    return None

# Usage
testset = await get_testset_by_name("Country Capitals")

if testset:
    testset_id = testset.testset_id or testset.id
    print(f"Found testset: {testset.name} (testset_id: {testset_id}, revision_id: {testset.id})")
else:
    print("Testset not found")

Helper Pattern

This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.

Working with Test Data

Once you have a testset, you can access the testcases within it:

Python SDK

import agenta as ag

# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)

# Access testcases
if testset and testset.data and testset.data.testcases:
    for testcase in testset.data.testcases:
        print(f"Testcase: {testcase.data}")
        # Use testcase.data in your evaluation

Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.

Creating a Testset​

Upserting a Testset​

Listing Testsets​

Retrieving a Testset by ID​

Retrieving a Testset by Name​

Working with Test Data​

Creating a Testset

Upserting a Testset

Listing Testsets

Retrieving a Testset by ID

Retrieving a Testset by Name

Working with Test Data