I came across the “The CJK Dictionary Institute” Chinese, Japanese, and Korean as well as Arabic dictionary data. What I found interesting is their database of Arabic names which apparently has over 7 million entries. For example, check out the variations of “Abd Al Raheem” Data Sample! (It has over 1000 variations).
It reminds me of the work I had done early in my career as a co-op student with IBM Canada in the early days of Java. As part of the National Language Technical Center I had worked on putting together the Unicode and character sets to be used in Java 1.1. That brings back memories – I had written some REXX programs on OS/2!