It's a great language.
It's dependent-types / theorem-proving-oriented type-system combined with AI assistants makes it the language of the future IMO.
Yes it is true that the model has undergone SFT, and RLHF, and other alignment procedures, and hence the logprobs do not reflect the probability of the next token as in the pre-training corpus.
Nevertheless, in concrete applications such as our main internal use-case: structured data extraction from pdf documents it revealed very valuable.
When the value was obviously well extracted, the logprob was high and when the information was super hard to find or impossible the model would output - or hallucinate - some value with much lower logprob.
We need to build a syntax tree and be able to map each value (number, boolean, string) to a range of character and then to a GPT token (for which OpenAi produces logprobs).
This is the reason we use Lark.
Same token usage.
Actually OpenAI returns the logprob of each token conditional on the previous ones with the option logprobs=true.
This lib simply parses the output json string with `lark` into an AST with value nodes. The value nodes are mapped back to a range of characters in the json string. Then the characters are mapped back to the GPT tokens overlapping the character ranges and the logprobs of the tokens are summed.