Did you try it? For example "Internationalization" becomes "Intèrnátìonàlïzâtiòn". If your code did anything bad to the unicode (eg decided codepoints are one byte, stripped high bit) then it would be very obvious.
For my testing I also go to the wikipedia home page and copy the text in the middle of the page which lists how many articles there are in various languages. This is great because it uses a wide variety of code points, including ones greater than 0xffff.
I also see that "Intèrnátìonàlïzâtiòn" breaks in the middle, instead of considering it a word. Is that correct? It may be important for checking your layout. There are also no Asian scripts, nor taller / shorter characters that might overlap if your line spacing is too small.
For my testing I also go to the wikipedia home page and copy the text in the middle of the page which lists how many articles there are in various languages. This is great because it uses a wide variety of code points, including ones greater than 0xffff.