I would be extremely surprised if that's the case. There are "open-source" multimodal LLMs can extract text from images as a proof that the idea works.
Probably the model is hallucinating and adding "Hungarian language is not installed for Tesseract" to the response.
Probably the model is hallucinating and adding "Hungarian language is not installed for Tesseract" to the response.