Controlling Output Format at the Prompt

Building the right tech stack is key

Nisi enim consequat varius cras aliquam dignissim nam nisi volutpat duis enim sed. Malesuada pulvinar velit vitae libero urna ultricies et dolor vitae varius magna lectus pretium risus eget fermentum eu volutpat varius felis at magna consequat a velit laoreet pharetra fermentum viverra cursus lobortis ac vitae dictumst aliquam eros pretium pharetra vel quam feugiat litum quis etiam sodales turpis.

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potent
Mauris commodo quis imperdiet massa tincidunt nunc pulvinar
Excepteur sint occaecat cupidatat non proident sunt in culpa qui officia

Mauris commodo quis imperdiet massa tincidunt nunc pulvinar

How to choose the right tech stack for your company?

Porta nibh aliquam amet enim ante bibendum ac praesent iaculis hendrerit nisl amet nisl mauris est placerat suscipit mattis ut et vitae convallis congue semper donec eleifend in tincidunt sed faucibus tempus lectus accumsan blandit duis erat arcu gravida ut id lectus egestas nisl orci id blandit ut etiam pharetra feugiat sit congue dolor nunc ultrices sed eu sed sit egestas a eget lectus potenti commodo quam et varius est eleifend nisl at id nulla sapien quam morbi orci tincidunt dolor.

What to consider when choosing the right tech stack?

At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis. porta nibh venenatis cras sed felis eget neque laoreet suspendisse interdum.

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti
Mauris commodo quis imperdiet massa tincidunt nunc pulvinar
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti

“Vestibulum eget eleifend duis at auctor blandit potenti id vel morbi arcu faucibus porta aliquet dignissim odio sit amet auctor risus tortor praesent aliquam.”

What are the most relevant factors to consider?

Lorem cras malesuada aliquet egestas enim nulla ornare in a mauris id cras eget iaculis sollicitudin. Aliquet amet vitae in luctus porttitor eget. parturient porttitor nulla in quis elit commodo posuere nibh. Aliquam sit in ut elementum potenti eleifend augue faucibus donec eu donec neque natoque id integer cursus lectus non luctus non a purus tellus venenatis rutrum vitae cursus orci egestas orci nam a tellus mollis.

What tech stack do we use at Simpletech?

Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat commodo sed egestas aliquam sem fringilla ut morbi tincidunt augue interdum velit euismod eu tincidunt tortor aliquam nulla facilisi aenean sed adipiscing diam donec adipiscing ut lectus arcu potenti eleifend augue faucibus bibendum at varius vel pharetra nibh venenatis cras sed felis eget.

Motivation

With the advent of ChatGPT's API, it has become easy to create applications using advanced NLP. However, when trying to create something that can actually be used at the product level, it is necessary to control the non-deterministic behavior of the LLM, which turns out to be quite difficult.

One of the challenges is to make the output of the LLM in a form that can be parsed by subsequent programs. To be more specific, since JSON is often the output format of choice, what is the best prompt if we want LLM to output JSON with a high probability of correct format & specified structure? This is the challenge.

The simple answer is to tell the LLM to output JSON with the correct format and structure with a high probability of correctness. This is sufficient as long as the structure of the data to be output is simple, but if the data becomes complex, specifying the structure in natural language tends to be lengthy or ambiguous. Therefore, we can think of a way to express it using a notation of data structure that is widely used, such as TypeScript types or zod schema definitions.

The issue of "choosing an appropriate notation for structural definitions for the purpose of conveying the structure of the data expected in the output with a prompt" is whether the LLM of the model to which the prompt is given will correctly interpret the notation of the definition, and whether the LLM of the model within the notation of the definition will correctly interpret it. Also, the issue is whether it is easy to control the LLM's attention within the notation of the definition.

Now, in our (SparkleAI) product, we often adopt "TypeScript type definition" as a method that engineers are accustomed to, without any particular verification, and this never causes any problems. However, there are expressions that are difficult to include in a type (e.g., restrictions on the number of characters), and an evaluation is needed as to which is more appropriate in this respect when compared to Schema definitions such as zod.

In this blog, I will investigate and summarize the rate at which ChatGPT succeeds in molding output when the output format is specified in natural language/TypeScript/zod.

Subject of evaluation

Natural Language

Purpose
Communication

Issues of the method
There are ambiguities and it may not return with the specified type

TypeScript type expression

Purpose
Type checking of program source

Issues of the method
Many programs are written in TS, so it is likely to be interpreted correctly.

zod Schema Expression

Purpose
Schema verification at program runtime

Issues of the method
More specific than type expressions, but does it affect the result?

Prompt for evaluation

When the number of characters to be output or the number of items to be output is specified, ChatGPT often does not follow the constraint. For this reason, prompt engineering is used to generate the prompt once and then fix the number of characters, etc. later. However, in this case, we also want to check the effect of constraints on the number of characters and pieces, such as ZOD, on the results, so we will use the "Generate Problem" prompt with constraints on the number of characters as the subject.

We will use the following prompts with the output format definitions swapped for natural language/TypeScript type expression/Schema expression of zod, respectively.

Evaluation Method

The following information will be tabulated and evaluated on the results of 30 iterations of 10 different documents (300 iterations in total), given the prompts to generate the problem with the output format specified in NL, TS and ZD, respectively.

parse: The percentage of documents that can be parsed as JSON with simple preprocessing
schema: Percentage of JSON structure that matches our specified structure when parsed.
count: Percentage of issues that meet the requirements if the structure is correct.
length: Percentage of string lengths that meet your requirements if the structure is correct.

Assuming that the prompt differences would be greater in settings with a higher diversity of generation, we fixed the setting to a realistic upper limit: Temperature=1.2, which is used in the task of generating diverse results.

Result

NL

parse
0.84 (251/300)

scheme
0.57 (143/251)

count
0.8 (114/143)

length
0.8 (1556/1950)

TS

parse
0.95 (284/300)

scheme
1.0 (283/284)

count
0.73 (206/283)

length
0.76 (2861/3788)

ZOD

parse
0.94 (278/296)

scheme
1.0 (278/278)

count
0.82 (228/278)

length
0.79 (2935/3736)

The large difference between "NL" and "TS and ZOD" in the prase and schema tests indicates that using an artificial language to specify the structure of the output is still more controllable than using natural language. This shows that it is still easier to control the structure of the output using an artificial language than using a natural language.

On top of that, there is no significant difference between TypeScript and Zod in the parse and schema tests. On the other hand, TypeScript's count and length scores are worse than those of Zod and natural language as well, suggesting that the way TypeScript puts constraints on the number of pieces or characters in a type expression may be less likely to attract attention than Schema's specification.

From the above, in terms of the easiest way to produce the most targeted structure, using Schema expressions has a better numerical performance. However, in addition to Zod, there are other Schema expression libraries such as Yup / io-ts / joi, which have similar description methods but different details, and we are not sure if we can control them as we intend to specify. Also, since Schema is less universal than TypeScript, it is necessary to be aware of the possibility that there will be changes in the description in the future.

Conclusion

This article compares and contrasts the use of natural language, TypeScript, and Zod to control the output format of ChatGPT. Its main focus is on how well these methods allow ChatGPT to generate accurate data structures and how well they control ChatGPT's attention.

As a result, we found that using a programming language to specify the structure of the output is more accurate than using natural language. Specifically, the rate at which the output is successfully parsed and the rate at which the schema matches the specified one is higher for both TypeScript and Zod than for natural language, and there is no significant difference between the two.

However, we found that TypeScript was inferior to Zod and natural language when it came to constraining the number of items and the length of strings. This may be due to the fact that TypeScript's type expressions are difficult to include constraints on the number of items and the length of strings, and therefore are less attentive than Schema's specifications.

From this point of view, Schema expressions are appropriate in that they are the easiest to produce the targeted structure. However, Schema expressions have less universality than TypeScript, and the descriptions are subject to change, so care must be taken in this respect.

Based on the above, we believe that TypeScript type expressions are the most useful for specifying JSON output and controlling its structure. However, since TypeScript type expressions have slightly inferior restrictions on the number and length of pieces compared to Schema, it is recommended that these elements be further controlled by plain-text instructions in the prompts.

Hidenori Onishi

As Chief Design Officer (CDO) of Fabrica Communications, Inc., he oversees and promotes the planning, development, and marketing of each product.

Controlling Output Format at the Prompt

Building the right tech stack is key

How to choose the right tech stack for your company?

What to consider when choosing the right tech stack?

What are the most relevant factors to consider?

What tech stack do we use at Simpletech?

Motivation

Subject of evaluation

Natural Language

TypeScript type expression

zod Schema Expression

Prompt for evaluation

Evaluation Method

Result

NL

TS

ZOD

Conclusion

Hidenori Onishi

Latest posts

Web2 Seminar for Web3 2025 – Event Announcement

Controlling Output Format at the Prompt

Web3's Lottery Platform Rattle Joins Sparkle AI