Using XML Schema in AI System Prompts

2024-04-01T14:12:50 Status: #blog Tags: #altova #ai #mobiletogether #apps #development #softwaredevelopment #mobile #tasks #xml-schema #schema #system-prompt #xsd Links: [[Altova]] | [[AI]] | [[MobileTogether]] | [[Mobile App Development]] | [[Software Development]] | [[XML Schema]] | [[XSD]] | [[System Prompt]] | [[Prompt Engineering]] # Using XML Schema in AI System Prompts This past weekend I was experimenting with crafting system prompts to generate output from an [[AI]] that is immediately usable as structured input for an application. In particular, I wanted to see if I could provide an [[XML Schema]] as part of the system prompt and then immediately utilize the [[XML]] output from the AI to generate a dynamic user interface. ## XML Schema Design I decided to build a little task list generation and tracking app to test this approach, and designed a simple XSD to model a list of tasks, with a description, completion state, possible website link, and a list of necessary components as well as individual steps to complete the task: ![[TaskListSchema.svg]] ## UI Design I then proceeded to [build a user interface](https://www.altova.com/mobiletogether/app-development#designer) for my example app using [[MobileTogether]]. On top of the design area I positioned an Edit Field for the user to enter their verbal prompt, followed by a button "Create Task List" that will send the query to the OpenAI API and retrieve the results. The rest of the UI is a set of nested repeating tables and included UI elements that are all connected to the XML data structure that is being returned from the API, and which follows the above schema. So, for each `<Task>` in the `<TaskList>` I have checkbox to mark it as done, a label area for the `@TaskDescription`, another label area with associated action for the `<WebsiteLink>` child element, and then nested tables for the `<StepList>` and `<ComponentList>` respectively: ![[MobileTogetherTaskListScreenshot.png]] ## System Prompt Engineering For the purpose of this experiment, I decided to work with the OpenAI API for my app, so my next step was to experiment for a while in the [OpenAI Platform Playground](https://platform.openai.com/playground) until I got the right kind of results. For the actual task list itself I found various prompt suggestions floating around online, so I liberally borrowed from those sources. But the key to success was to **both** describe the intended structure of the output verbally **and** explicitly include the [[XML Schema]] for how I wanted the output [[XML]] to be constructed. This is ultimately the [[System Prompt]] I came up with: > [!note] System Prompt Your job is to create a task list as an XML document based on the user's prompt. The XML document will include the following elements: TaskList, Task, ComponentList, Component, StepList, Step. Each Task can have multiple Components and multiple Steps necessary to complete the task. Each Task element has a TaskDescription attribute, and may have a TaskType attribute .The TaskType attribute should be omitted, unless the user's prompt hints that this might be a repeating task, in which case the TaskType can be daily, weekly, monthly, or annually. Each Task may also contain an optional WebsiteLink child element to link to a website for more information, when helpful e.g. when referencing a learning resource or online tool. Be sure to break up the Tasks and Steps in a way that is simple and straightforward. Each Task and each Step within each Task needs to be an actionable item that the user can ultimately check-off as they are completing them. Do not return a single Task whose Description is the user's prompt. Always break it up into multiple tasks and return them as a TaskList. If the prompt is something generic like 'make Wienerschnitzel' or something that has a clear beginning and end, return all the steps necessary to get to the end, including all of the components that make up that task. If any of those tasks have inner steps, be sure to include those. Your response should be an XML document that follows this XML Schema, and should not contain any introductory paragraphs or explanation before the XML: ```xml schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" attributeFormDefault="unqualified" vc:minVersion="1.1"> <xs:element name="TaskList"> <xs:complexType> <xs:sequence> <xs:element name="Task" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="WebsiteLink" type="xs:string" minOccurs="0"/> <xs:element name="ComponentList" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="Component" maxOccurs="unbounded"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="done"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="StepList" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="Step" maxOccurs="unbounded"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="done"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="TaskDescription" type="xs:string"/> <xs:attribute name="TaskType"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="today"/> <xs:enumeration value="daily"/> <xs:enumeration value="weekly"/> <xs:enumeration value="monthly"/> <xs:enumeration value="annually"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute ref="done"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:attribute name="done" type="xs:boolean" default="0"/> </xs:schema> ``` For the purpose of this blog post, the above XSD is shown in pretty-printed form, but inside the System Prompt I actually removed all the whitespace and line feeds so it would take up less network bandwidth and consume fewer tokens (you can see that in the JSON query further below). ## ActionTree low-code application logic With the UI design complete and the System Prompt working nicely in the Playground, it was time to put it all together, so I built the [ActionTree](https://www.altova.com/mobiletogether/app-development#action-trees) behind the "Create Task List" button to assemble the actual [[JSON]] query for the API from the System Prompt and the user-entered verbal query, and to then call upon the REST API and retrieve the response and ultimately update the XML tree with the result that was returned by the API: ![[ActionTreeTaskList.png]] The complete JSON query that is created by the above low-code ActionTree - and subsequently sent to the API - would ultimately look like this, so the SystemPrompt is clearly much longer than the actual user query, because it contains all the relevant instructions of how the response is to be structured: ```json { "model": "gpt-4-0613", "messages": [ { "role": "system", "content": "Your job is to create a task list as an XML document based on the user's prompt. The XML document will include the following elements: TaskList, Task, ComponentList, Component, StepList, Step. Each Task can have multiple Components and multiple Steps necessary to complete the task. Each Task element has a TaskDescription attribute, and may have a TaskType attribute .The TaskType attribute should be omitted, unless the user's prompt hints that this might be a repeating task, in which case the TaskType can be daily, weekly, monthly, or annually. Each Task may also contain an optional WebsiteLink child element to link to a website for more information, when helpful e.g. when referencing a learning resource or online tool. Be sure to break up the Tasks and Steps in a way that is simple and straightforward. Each Task and each Step within each Task needs to be an actionable item that the user can ultimately check-off as they are completing them. Do not return a single Task whose Description is the user's prompt. Always break it up into multiple tasks and return them as a TaskList. If the prompt is something generic like 'make Wienerschnitzel' or something that has a clear beginning and end, return all the steps necessary to get to the end, including all of the components that make up that task. If any of those tasks have inner steps, be sure to include those. Your response should be an XML document that follows this XML Schema, and should not contain any introductory paragraphs or explanation before the XML: <?xml version='1.0' encoding='UTF-8'?><xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:vc='http://www.w3.org/2007/XMLSchema-versioning' elementFormDefault='qualified' attributeFormDefault='unqualified' vc:minVersion='1.1'><xs:element name='TaskList'><xs:complexType><xs:sequence><xs:element name='Task' maxOccurs='unbounded'><xs:complexType><xs:sequence><xs:element name='WebsiteLink' type='xs:string' minOccurs='0'/><xs:element name='ComponentList' minOccurs='0'><xs:complexType><xs:sequence><xs:element name='Component' type='xs:string' maxOccurs='unbounded'/></xs:sequence></xs:complexType></xs:element><xs:element name='StepList' minOccurs='0'><xs:complexType><xs:sequence><xs:element name='Step' type='xs:string' maxOccurs='unbounded'/></xs:sequence></xs:complexType></xs:element></xs:sequence><xs:attribute name='TaskDescription' type='xs:string'/><xs:attribute name='TaskType'><xs:simpleType><xs:restriction base='xs:string'><xs:enumeration value='today'/><xs:enumeration value='daily'/><xs:enumeration value='weekly'/><xs:enumeration value='monthly'/><xs:enumeration value='annually'/></xs:restriction></xs:simpleType></xs:attribute></xs:complexType></xs:element></xs:sequence></xs:complexType></xs:element></xs:schema>" }, { "role": "user", "content": "Learn Japanese" } ], "temperature": 0, "max_tokens": 2048, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0 } ``` ## Testing the solution Next let's take a look at some example prompts that a user might enter and look at both the resulting XML generated by the AI as well as how this will look in the user interface we've designed. ### Example 1: Learn Japanese And here is an example of the kind of task list we can now easily produce. I fired up the MobileTogether Simulator to run my app and as a query I entered "Learn Japanese" and then clicked on the Create Task List button, and this is what I got back: ![[LearnJapanese.png]] In this simple app we can now click on any of the links to look at the suggested resources and then we can check off the items on the task list as we complete them. And just for the purpose of easier understanding, here is the corresponding XML response that the AI had provided for us that was automatically used by MobileTogether to render the `<TaskList>` using the UI design we had previously created: ```xml <?xml version="1.0" encoding="UTF-8"?> <TaskList> <Task TaskDescription="Set up learning resources" TaskType="today"> <WebsiteLink>https://www.duolingo.com/course/ja/en/Learn-Japanese-Online</WebsiteLink> <ComponentList> <Component>Duolingo account</Component> <Component>Japanese language course</Component> </ComponentList> <StepList> <Step>Create a Duolingo account</Step> <Step>Enroll in the Japanese language course</Step> </StepList> </Task> <Task TaskDescription="Learn Hiragana" TaskType="today"> <WebsiteLink>https://www.tofugu.com/japanese/learn-hiragana/</WebsiteLink> <ComponentList> <Component>Hiragana chart</Component> </ComponentList> <StepList> <Step>Study the Hiragana chart</Step> <Step>Learn how to draw Hiragana glyphs</Step> </StepList> </Task>  </TaskList> ``` ### Example 2: Cook Wienerschnitzel Looking at the above screenshot as well as the XML result, you may be wondering what the initial intent behind the separation between steps and components was, as the AI obviously gave rather redundant responses in the example of what is required to "Learn Japanese". The answer should become immediately obvious if we look at a task list for a cooking recipe instead - this is the XML returned for the verbal user query "Cook [Wienerschnitzel](https://en.wikipedia.org/wiki/Wiener_schnitzel)": ```xml <?xml version="1.0" encoding="UTF-8"?> <TaskList> <Task TaskDescription="Prepare Ingredients"> <ComponentList> <Component>4 veal cutlets</Component> <Component>1 cup all-purpose flour</Component> <Component>2 eggs</Component> <Component>2 cups bread crumbs</Component> <Component>Salt</Component> <Component>Pepper</Component> <Component>1 cup vegetable oil</Component> <Component>Lemon wedges</Component> </ComponentList> <StepList> <Step>Buy all necessary ingredients from the grocery store</Step> <Step>Wash and dry all fresh produce</Step> <Step>Beat the eggs in a bowl</Step> </StepList> </Task> <Task TaskDescription="Prepare the Veal Cutlets"> <StepList> <Step>Season the veal cutlets with salt and pepper</Step> <Step>Dredge the cutlets in flour, shaking off any excess</Step> <Step>Dip the floured cutlets into the beaten eggs</Step> <Step>Coat the cutlets in bread crumbs, pressing firmly to adhere</Step> </StepList> </Task> <Task TaskDescription="Cook the Wienerschnitzel"> <StepList> <Step>Heat the vegetable oil in a large skillet over medium-high heat</Step> <Step>Add the breaded cutlets to the skillet</Step> <Step>Cook until golden brown, about 3 minutes per side</Step> <Step>Drain the cooked cutlets on paper towels</Step> </StepList> </Task> <Task TaskDescription="Serve the Wienerschnitzel"> <StepList> <Step>Serve the wienerschnitzel hot with lemon wedges on the side</Step> </StepList> </Task> </TaskList> ``` And this obviously also looks a lot more useful in the app right away, as we can now see the purpose of separating steps from components and why we are using different background colors to highlight those: ![[WienerschnitzelTaskList.png]] For this screenshot I also expanded the page source tree on the right side of the simulator to demonstrate that you can look at the underlying data sources at the same time as you are testing the generated app user interface during the simulation - this can be extremely helpful for [testing and debugging applications during development](https://www.altova.com/mobiletogether/app-development#test-app). ## Conclusion This article illustrates how you can incorporate an XML Schema in your System Prompt to get a valid XML response from the AI that conforms to a pre-defined structure and thus can directly be used in your application for further processing. A full-blown real-world solution would, obviously, include some error-checking and actual validation steps before we would ever trust the response from the AI directly, but those can be added easily as needed. You can [download MobileTogether Designer](https://www.altova.com/mobiletogether/download), which was used to build this example app, and use it for free. --- # References * https://www.altova.com/mobiletogether * https://platform.openai.com/ * https://www.duolingo.com/course/ja/en/Learn-Japanese-Online * https://www.tofugu.com/japanese/learn-hiragana/ * https://en.wikipedia.org/wiki/Wiener_schnitzel