The special features of Indic scripts

In this section, I will attempt to explain what makes Indic scripts "different", and what makes them so difficult to use as far as computers (especially text processing) is concerned.

When a computer deals with Latin scripts, things are simple and linear. When A occurs after B (at the storage level), the computer knows that it has to display the glyphs in the same order, ie, A after B. When we come to Indic scripts however, if the "i" matra-sign comes after the consonant "ka" (in the storage level), the computer has to render the vowel sign before the consonant.

Figure 1.5. Reordering of ekaar on ka (Bengali)

Reordering of ekaar on ka (Bengali)

Visual reordering of ka ekaar to ekaar ka (Bengali)

An even more complicated case is that of "split vowel signs", where one part of the vowel sign has to be rendered before the base consonant, while the other part is rendered after the consonant. An example of this would be the Bengali vowel sign "O".

Figure 1.6. Ka Okaar (Bengali)

Ka Okaar (Bengali)

Example of split vowel signs - Ka Okaar (Bengali)

In scripts like Latin, there is an almost one to one relation between code point and glyph. In Indic scripts however, this is not the case. For example, let us consider the sequence "ka halant sha". Though this sequence is composed of three code points, on screen, this has to be represented by only one single glyph.

Figure 1.7. The Devanagri conjunct kshha

The Devanagri conjunct kshha

Formation of the Devanagri conjunct kshha

Linguists describe these types of writing systems as "orthographic", which means that Indic scripts are a mixture phonemic (ie, where a basic character represents a single phoneme or a a basic unit of word distinguishing sound) and syllabic forms. When a rendering engine works on an Indic script, it usually does the processing from the level of individual syllables. A syllabic unit is a visual unit (glyph) as well. A syllable is formed around a "central" character (usually a consonant), which is known as the "base" character - for example:

Figure 1.8. Components of a Syllable

Components of a Syllable

Various components of a syllable

Linguists usually treat the consonants inside a syllable as "dead consonants", ie, the consonant sans the inherent vowel in the consonant (this effect is achieved by adding a "halant" to the consonant). This allows us to reduce each component in a combining sequence to its most basic form, which can combine with the other components and generate the final syllable.

However, Unicode treats the syllables slightly differently - the picture below would illustrate the difference

Figure 1.9. Linguistic interpretation of a syllable vs the Unicode interpretation

Linguistic interpretation of a syllable vs the Unicode interpretation

Linguistic interpretation of a syllable vs the Unicode interpretation

While working on fonts, keep in mind the way Unicode handles the syllables - it should make your job easier.