Managing Testsets
This guide covers how to create, list, and retrieve testsets using the Agenta SDK for evaluation purposes.
Agenta's SDK uses async APIs. In Jupyter/Colab you can use top-level await. In a regular Python script, wrap async code like this:
import asyncio
async def main():
...
asyncio.run(main())
Creating a Testset
Use ag.testsets.acreate() to create a new testset with data:
- Python SDK
import agenta as ag
# Create a testset with simple data
testset = await ag.testsets.acreate(
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"}
],
name="Country Capitals",
)
testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")
Parameters:
data: A list of dictionaries containing your test data. Each dictionary represents one testcase.name: The name of your testset.
Returns: A TestsetRevision object containing:
id: The UUID of the created testset revisiontestset_id: The parent testset UUID (stable across revisions)name: The testset nameslug: The revision slugversion: The revision version string (e.g. "1")data: The test data (withtestcasesstructure)
Sample Output:
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "3ad5d688da6c",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.
Upserting a Testset
Use ag.testsets.aupsert() to create a new testset or update an existing one if it already exists:
- Python SDK
import agenta as ag
# Create or update a testset
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"},
{"country": "Italy", "capital": "Rome"},
],
)
testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
Parameters:
name(required): The name of your testset. Used to find existing testset.data(required): A list of dictionaries containing your test data.testset_id(optional): If provided, updates the testset with this specific ID.
Returns: A TestsetRevision object with the created or updated testset.
Use aupsert() when you want to update an existing testset with the same name, or create it if it doesn't exist. This is useful in CI/CD pipelines where you want to keep testsets synchronized. Use acreate() when you explicitly want to create a new testset every time.
Listing Testsets
To list all testsets in your project, use ag.testsets.alist():
- Python SDK
import agenta as ag
# List all testsets
testsets = await ag.testsets.alist()
print(f"Found {len(testsets)} testsets:")
for testset in testsets:
testset_id = testset.testset_id or testset.id
print(f" - {testset.name} (testset_id: {testset_id})")
Parameters: None required.
Returns: A list of TestsetRevision objects. For each item:
id: The latest revision UUIDtestset_id: The parent testset UUIDname: The testset nameslug: The revision slug- Additional metadata fields
Sample Output:
[
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "country-capitals"
},
{
"id": "01963520-4e4a-8761-91df-4be6e799eb7d",
"name": "Math Problems",
"slug": "math-problems"
}
]
Retrieving a Testset by ID
To retrieve a specific testset by its ID, use ag.testsets.aretrieve():
- Python SDK
import agenta as ag
# Retrieve a specific testset (using the testset_id from creation)
testset = await ag.testsets.aretrieve(testset_id=testset_id)
if testset:
print(f"Retrieved testset: {testset.id}")
print(f"Testcases count: {len(testset.data.testcases) if testset.data and testset.data.testcases else 0}")
else:
print("Testset not found")
Parameters:
testset_id: The UUID of the testset to retrieve
Returns: A TestsetRevision object (or None if not found) containing:
id: The testset revision UUIDtestset_id: The parent testset UUIDslug: The revision slugversion: The revision version numberdata: TheTestsetRevisionDatawith all testcases
Sample Output:
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"testset_id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"slug": "3ad5d688da6c",
"version": "1",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
Testsets are versioned. Each update via ag.testsets.aedit() or ag.testsets.aupsert() creates a new TestsetRevision, while the parent testset_id stays the same.
Retrieving a Testset by Name
While there's no dedicated function for this, you can easily find a testset by name by filtering the results from ag.testsets.alist():
- Python SDK
import agenta as ag
async def get_testset_by_name(name: str):
"""Helper function to find a testset by name."""
testsets = await ag.testsets.alist()
if not testsets:
return None
for testset in testsets:
if testset.name == name:
return testset
return None
# Usage
testset = await get_testset_by_name("Country Capitals")
if testset:
testset_id = testset.testset_id or testset.id
print(f"Found testset: {testset.name} (testset_id: {testset_id}, revision_id: {testset.id})")
else:
print("Testset not found")
This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.
Working with Test Data
Once you have a testset, you can access the testcases within it:
- Python SDK
import agenta as ag
# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)
# Access testcases
if testset and testset.data and testset.data.testcases:
for testcase in testset.data.testcases:
print(f"Testcase: {testcase.data}")
# Use testcase.data in your evaluation
Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.
