Windows Server Posted January 24 Posted January 24 In the rapidly evolving field of large language models (LLMs) and small language models (SLMs), fine-tuning and evaluation often present unique challenges. Whether the objective is to optimize models for function-calling use cases or to validate multi-agent workflows, one thing remains constant: the need for high-quality, diverse, and contextually relevant data. But what happens when real-world data is either unavailable, incomplete, or too sensitive to use? Enter synthetic data—a powerful tool for accelerating the journey from experimentation to deployment. In this blog, we’ll explore how synthetic data can address critical challenges, why it’s indispensable for certain scenarios, and how Azure AI’s Evaluator Simulator Package enables seamless generation of synthetic interaction data to simulate user personas and scenarios. The Growing Need for Synthetic Data in LLM Development Fine-tuning or evaluating an LLM/SLM for specific use cases often requires vast amounts of labeled data tailored to the task at hand. However, sourcing such data comes with hurdles: Data Scarcity: Real-world interaction data for niche use cases may not exist in sufficient quantity. Privacy Concerns: User interactions may contain sensitive information, making direct use of this data problematic. Scenario Testing: Real-world data rarely accounts for edge cases or extreme scenarios that models must handle gracefully. Synthetic data solves these problems by creating controlled, customizable datasets that reflect real-world conditions—without the privacy risks or availability constraints. Synthetic Data for Function-Calling Use Cases Function-calling in LLMs involves executing API calls based on natural language inputs. For example, users might ask a travel app to “find flights to Paris under $500.” Fine-tuning models for such use cases requires training them on structured, intent-rich inputs paired with corresponding API call structures. Synthetic data can: Simulate diverse intents: Generate variations of user queries across languages, styles, and preferences. Provide structured outputs: Automatically align these queries with the required API call schema for training or evaluation. Include edge cases: Test how models respond to ambiguous or incomplete queries. Model evaluation post fine-tuning presents another set of challenges where we need trusted data to evaluate the performance. Hence, having synthetic data generated by a superior model followed by human screening filtering out noise can provide a rich and diverse data to compare the performance of fine-tuned vs base models. Synthetic Data in Multi-Agent Workflow Evaluation Multi-agent workflows involve multiple models (or agents) collaborating to achieve a shared goal. A restaurant recommendation system, for example, may feature one agent parsing user preferences, another querying a knowledge graph, and a third crafting human-like responses. Synthetic data can: Simulate complex user personas: From foodies to budget-conscious travelers, generating interactions that test the robustness of multi-agent collaboration. Recreate realistic workflows: Model intricate agent-to-agent interactions, complete with asynchronous communication and fallback mechanisms. Stress-test failure scenarios: Ensure agents recover gracefully from errors, misunderstandings, or timeouts. Multi-agent workflows often rely on hybrid architectures that combine SLMs, LLMs, domain-specific models, and fine-tuned systems to balance cost, latency, and accuracy. Synthetic data generated by a superior model can serve as a baseline for evaluating nuances like agent orchestration and error recovery. Azure AI Evaluator Simulator: A Game-Changer Azure AI's Evaluator Simulator Package offers a robust framework for generating synthetic interaction data tailored to your application needs. By simulating diverse user personas and scenarios, it provides: Realistic Simulations: Emulate a wide range of user behaviors, preferences, and intents, making it ideal for creating datasets for function-calling and multi-agent workflows. Customizability: Tailor simulations to reflect domain-specific nuances, ensuring data relevance. Efficiency: Automate data generation at scale, saving time and resources compared to manual annotation. How It Works The Azure AI Evaluation SDK’s Simulator class is designed to generate synthetic conversations and simulate task-based interactions. The module allows you to configure different personas—such as tech-savvy users, college grads, enterprise professionals, customers, supply chain managers, procurement manager, finance admin etc each interacting with your application in unique ways. You can also define the tasks that each of these users are trying to accomplish like shopping for a family event, manging inventory, preparing financial reports etc. Here’s how it operates: Model Configuration: Initialize the simulator with your model’s parameters (e.g., temperature, top_p, presence_penalty). Input Preparation: Provide input data (e.g., text blobs) for context, such as extracting text from a Wikipedia page. Prompt Optimization: Use the query_response_generating_prompty_override to customize how query-response pairs are generated. User Prompt Specification: Define user behavior using the user_simulating_prompty_override to align simulations with specific personas. Target Callback Specification: Implement a callback function that connects the simulator with your application. Simulation Execution: Run the simulator to generate synthetic conversations based on your configurations. By following these steps, developers can create robust test datasets, enabling thorough evaluation and fine-tuning of their AI applications. Example: Synthetic Data for an E-Commerce Assistant Bot Let’s walk through an example of generating synthetic data for an e-commerce assistant bot. This bot can perform tasks such as acting as a shopping assistant, managing inventory, and creating promo codes. Before we get started, make sure to install azure-ai-evaluation package to follow along Step 1: Define Functions and APIs Start by defining the core functions the bot can invoke, such as search_products, fetch_product_details, and add_to_cart. These functions simulate real-world operations. Please refer functions.py and function_list.py to access the complete list of functions and function definitions. Step 2: Configure the Simulator model_config = { "azure_endpoint": azure_endpoint, "azure_api_key": azure_api_key, "azure_deployment": azure_deployment, } from azure.ai.evaluation.simulator import Simulator simulator = Simulator(model_config=model_config) Next connect the simulator to the application. For this, establish the client and implement a callback function that invokes the application and facilitate interaction between the simulator and app from typing import List, Dict, Any, Optional from functions import * from function_list import function_list from openai import AzureOpenAI from azure.identity import DefaultAzureCredential, get_bearer_token_provider def call_to_ai_application(query: str) -> str: # logic to call your application # use a try except block to catch any errors system_message = "Assume the role of e-commerce assistant designed for multiple roles. You can help with creating promo codes, tracking their usage, checking stock levels, helping customers make shopping decisions and more. You have access to a bunch of tools that you can use to help you with your tasks. You can also ask the user for more information if needed." completion = client.chat.completions.create( model=azure_deployment, messages=[ {"role" : "system", "content" : system_message }, { "role": "user", "content": query, } ], max_tokens=800, temperature=0.1, top_p=0.2, frequency_penalty=0, presence_penalty=0, stop=None, stream=False, tools = function_list, tool_choice="auto" ) message = completion.choices[0].message # print("Message : ", message) # change this to return the response from your application return message async def callback( messages: List[Dict], stream: bool = False, session_state: Any = None, # noqa: ANN401 context: Optional[Dict[str, Any]] = None, ) -> dict: messages_list = messages["messages"] # get last message latest_message = messages_list[-1] query = latest_message["content"] context = None # call your endpoint or ai application here response = call_to_ai_application(query) # we are formatting the response to follow the openAI chat protocol format: if response.tool_calls: prev_messages = messages["messages"] func_call_messages = [] tool_calls = response.tool_calls ## Add the tool calls to the messages for tool_call in tool_calls: formatted_response = {"role" : "assistant", "function_call" : tool_call.function.to_dict()} func_call_messages.append(formatted_response) ## Execute the APIs and add the responses to the messages for tool_call in tool_calls: function_name = tool_call.function.name function_args = tool_call.function.arguments func = globals().get(function_name) if callable(func): result = json.dumps(func(**json.loads(function_args))) # formatted_response = {"content" : result, "role" : "tool", "name" : function_name} formatted_response = {"role" : "function", "content" : result, "name" : function_name} func_call_messages.append(formatted_response) else: print("Function {} not found".format(function_name)) # Second API call: Get the final response from the model final_response = client.chat.completions.create( model=azure_deployment, messages=prev_messages + func_call_messages, ) final_response = {"content" : final_response.choices[0].message.content, "role" : "assistant"} func_call_messages.append(final_response) # Stringify func_call messages to store in session state func_call_messages = create_content_from_func_calls(func_call_messages) func_call_messages = {"role" : "assistant", "content" : func_call_messages} messages["messages"].append(func_call_messages) # messages["messages"].append(final_response) return {"messages": messages["messages"], "stream": stream, "session_state": session_state} else: formatted_response = { "content": response.content, "role": "assistant", } messages["messages"].append(formatted_response) return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context} We have used two helper functions here :create_content_from_func_calls : It creates a string content from a list of function call dictionaries. This merges all the internal messages invoking function calls into a single string. This is needed as the simulator module ignores all internal context and only retains the latest response.split_content : Split a string content into a list of dictionaries based on specified separators.This is required for post-processing step to split the string comprising of function-call and function-response into separate messages each with its own role and content. Step 3: Define the Tasks Use the Azure AI Evaluation SDK to configure the simulator with user personas and tasks, such as: A marketing manager creating a promo code and tracking its usage. A customer making a purchase using the promo code. An inventory manager checking stock levels. Step 4: Customize user persona Internally, the SDK has a prompty file that defines how the LLM which simulates the user should behave. The SDK also offers an option for users to override the file, to support your own prompty files. Let’s override this file to build a user persona who engages in an interactive conversation with the bot and asks follow up questions while responding to bot’s response basis his persona and requirement system: You must behave as a user who wants accomplish this task: {{ task }} and you continue to interact with a system that responds to your queries. If there is a message in the conversation history from the assistant, make sure you read the content of the message and include it your first response. Your mood is {{ mood }} Make sure your conversation is engaging and interactive. Output must be in JSON format Here's a sample output: { "content": "Here is my follow-up question.", "role": "user" } Step 5 : Generate and Store Outputs: Run the simulator to generate synthetic data. You can specify the "num_conversation_turns" that defines the predetermined number of conversation turns to simulate. outputs = await simulator( target=callback, text="Assume the role of e-commerce assistant designed for multiple roles. You can help with creating promo codes, tracking their usage, checking stock levels, helping customers make shopping decisions and more. You have access to a bunch of tools that you can use to help you with your tasks. You can also ask the user for more information if needed.", num_queries=3, max_conversation_turns=5, tasks=tasks, user_simulator_prompty=user_override_prompty, user_simulator_prompty_kwargs=user_prompty_kwargs, ) Step 6 : Review and Save the Outputs Let's look at the output for one of the tasks We can see how the simulator engages in an interactive conversation with the application to accomplish the desired task and all the interaction between app and simulator is captured in the final output. Let's store the output in a file with open("output.json", "w") as f: json.dump(final_outputs, f) Conclusion Synthetic data transcends being a mere substitute for real-world data—it’s a strategic asset for fine-tuning and evaluating LLMs. By enabling precise control over data generation, synthetic datasets empower developers to simulate user behaviors, test edge cases, and optimize models for specific workflows. With tools like Azure AI’s Evaluator Simulator, generating this data has never been more accessible or impactful. Whether you’re building models for function-calling, orchestrating multi-agent systems, or tackling niche use cases, synthetic data ensures you’re equipped to deliver reliable, high-performing solutions—regardless of complexity. Start leveraging synthetic data today and unlock the full potential of your LLM projects! You can access the full code here References azureai-samples/scenarios/evaluate/Simulators/Simulate_Context-Relevant_Data/Simulate_From_Input_Text at main · Azure-Samples/azureai-samples How to generate synthetic and simulated data for evaluation - Azure AI Foundry | Microsoft Learn Generate Synthetic QnAs from Real-world Data on Azure | Microsoft Community Hub How to use function calling with Azure OpenAI Service - Azure OpenAI Service | Microsoft Learn Fine-tuning function calls with Azure OpenAI Service - Azure AI services | Microsoft LearnView the full article Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.