Configuring Per-Locale Font Selection with FontConfig

As I gain experience programming, I find myself becoming more and more mindful about implicit assumptions I might be making while solving problems. Spending hours debugging timezone issues or unexpectedly time-sensitive date calculations has revealed that everything I thought I knew about time (and many other subjects) is wrong. Such experiences have led me to take more cautious approaches to new problems.

One area where this has paid off handsomely is the field of text rendering. During a recent project, one of my primary goals was to enable good localization across the globe. I immediately started reading about various writing systems and how support for them is managed in software. It turns out that the general answer is, “It’s complicated.” As soon as I started looking at Indic text shaping, my head began to spin—but my investigations eventually turned up a simpler solution.

Ideograph Idiosyncrasies

One particular surprise that turned up in my reading was Han Unification. In an effort to save Unicode codepoints, the ideographs of Traditional Chinese, Simplified Chinese, Japanese, and Korean have been “deduplicated.” In essence, drafters of the standard made an effort to map the common ideographs of each language down to a single set of codepoints. This sounds like a good solution, but each language has significant variations in how the ideographs are drawn.

Thankfully, many people much smarter than I have devoted a great deal of time to this subject, and there are fonts which simplify the problem significantly. For my project, I selected a collaboration between Adobe and Google called either Source Han Sans or Noto Sans CJK. This font is distributed under a very liberal license, and it includes glyphs for Japanese, Korean, Simplified Chinese, and Traditional Chinese. All you need to do to ensure proper ideograph use is to choose the correct language variant.

Simplified Font Selection

My project is shipping its own Linux-based operating system, so I figured it would be handy to get the underlying font selection engine to do this for me. After a few hours of FontConfig research, I figured out this method:

First, install the all-in-one Noto Sans CJK super OTC font. Next, you’ll have to tell your Linux system’s font selection engine (FontConfig) what rules it should apply.





  
    zh-CN
  
  
    sans-serif
  
  
    Noto Sans CJK SC
  


  
    zh-TW
  
  
    sans-serif
  
  
    Noto Sans CJK TC
  


  
    ja
  
  
    sans-serif
  
  
    Noto Sans CJK JP
  


  
    ko
  
  
    sans-serif
  
  
    Noto Sans CJK KR
  


  
    sans-serif
  
  
    Noto Sans CJK SC
  


  
    true
  
  
    
      Noto Sans CJK KR
    
    
      Noto Sans CJK SC
    
    
      Noto Sans CJK JP
    
    
      Noto Sans CJK TC
    
    
      Noto Sans Mono CJK KR
    
    
      Noto Sans Mono CJK SC
    
    
      Noto Sans Mono CJK JP
    
    
      Noto Sans Mono CJK TC
    
    false
  
  
    true
    
    
    true
    false
    hintfull
    true
    none

    
        false
    
  

When you place this configuration file in /etc/fonts/local.conf, FontConfig will rely on the user’s current locale to automatically choose the correct glyph set from the Noto Sans CJK TTC you installed earlier.

My colleague Jesse has written about some more generally applicable translation pitfalls you can avoid here. What interesting problems have you fixed or avoided while localizing a program?

Conversation
  • Haoxian Zeng says:

    Hello Mr Johnson,
    Thanks for this great post sharing the configuration of locale fonts in Linux. There is a block in your config file, line 48-55. It looks similar to the first match-block starting at line 5. May I ask what’s the difference of their purposes?
    Best wishes,
    Haoxian

  • Comments are closed.