Chillu + Consonant => Conjunct ?

 

 

 

 

Author:

Cibu C Johny

Email:

Cibu (at) yahoo.com

Date:

July 12, 2005

Version:

2

 

 

Abstract

 

This document discusses the pitfalls in allowing Malayalam Chillu letters to form conjunct with a subsequent consonant. Also suggests the specific scenarios where it should be allowed.

 

 

Conventions used in this document

 

DDA    – U+0D22

NNA    – U+0D23

RRA    – U+0D31

Virama – U+0D4D

 

Introduction

 

We are looking at the issues of writing a text in an old orthography font and reading it in new orthography and vice versa. Specifically, we are looking at the possibility of any Chillu-C1 + C2 sequence forming conjuncts in this mixed context.

 

 

There is no General rule for conjunct formation

 

Argument is through examples:

 

Unique Encoding Rule

 

I agree that there could be a Chillu-conjunct formation rule for a proper subset of Chillu-C1 + C2 permutations. Even then, we cannot have same rendering for two different encodings (joiners not considered). Let me illustrate how that is applicable here: Assume there is a conjunct formation rule for a subset of Chillu-C1 + C2 permutations and as per that rule, Chillu-NNA + DDHA ( + ) can form a conjunct () in an old orthography font. Of course, NNA + VIRAMA + DDHA ( +  + ) will also form the same conjunct. There fore, a document written by multiple people (eg: a wiktionary.org document) can quite possibly have both spellings for this conjunct without reader or writer being aware of it. This can cause ineffective searches and inconsistent sorted list of words and finally causing confusion to the users.

So we cannot allow Chillu-C1 + C2 and C1 + VIRAMA + C2 forming same conjunct. I would call this unique encoding rule.

 

This rule has a side effect: Many words like /alpam/ can potentially have two spellings - one with chillu-LA () and other with /lpa/ conjunct (). Both of these spellings are used synonymously in contemporary Malayalam text. This is very similar to two spellings of 'colour' ('color' is the corresponding American spelling). A British English font should not try to convert 'color' to 'colour'. It should remain as intended by the author. Same should be the case with two spellings of /alpam/ in Malayalam. It should be displayed as intended by the author(s) of the text.

 

Except for two cases described below, C1 + VIRAMA + C2 form the conjunct.

 

Exception 1: Malayalam version of eyelash repha

 

Same way as in Exception 1, we can find a way to produce eyelash repha conjunct in old orthography font, while producing explicit Chillu-RA in new orthography font. That is,

Chillu-RA + C2 => eyelash-repha over C2, if available in the font.

 

Example:

 

 +  =>

 

We can use joiners – ZWJ & ZWNJ - in their usual meaning: respectively forcing or avoiding the conjunct formation.

 

As per Unique encoding rule, RA + VIRAMA + C2 should not form eyelash repha conjunct.

 

 

Confusion on  (/nta/) encoding

 

Representation of /nta/ is closely related to what  stands for. Malayalam’s behavior of representing /ta/ and /rra/ with same letter had definitely contributed to the confusion of what * is - /nta/ or /nrra/. Here are the details of the two ways in which it being used:

 

  1. * is used to represent /nrra/ for writing English words like ‘Henry’ () or ‘Enroll’ () in Malayalam, while  represents /nta/. The syllable /nrra/ is not native to Malayalam and used only for foreign words. A casual reader typically overlooks this difference and considers both * and  to be different the renderings of the same syllable. Their difference in pronunciation, if required for a foreign word, is inferred from the context.
  2. * is used in many new orthography fonts to represent /nta/. Typically a font places itself to be somewhere in the middle of new and old orthographies choosing only a convenient subset of conjuncts from old orthography. As an example, Mathrubhumi is the ASCII Malayalam font, which is the closest available to old orthography. Even in that, /nta/ is rendered as *. So this usage is very common.

 

These facts give way to two quite reasonable inputting scenarios:

 

  1. There can be Malayalam keyboard layouts with specific keys for Chillu-NA() and RRA(); and nothing for /nta/.

 

  1. Even if, there is a specific method for inputting /nta/, a writer may choose to input /nta/ as Chillu-NA() and then RRA() adjacent to it.

 

Along with above inputting scenarios following not-so-obvious facts should also be considered:

 

  1. Since Malayalam has, just one letter to represent /ta/ and /rra/,  and * are essentially the same in the graphemic deep structure: Chillu-NA + RRA

 

  1. When used as /nta/, * is considered a single unit. While in /nrra/ context, it is used as having two separate components. It is evident from the usage of left vowel signs in following examples: ,

 

Thus, we end up with 3 mutually exclusive choices:

 

  1. NA + VIRAMA + RRA generates  and Chillu-NA + RRA forms *. This choice invalidates use of * for /nta/.
  2. Chillu-NA + RRA forms the conjunct () to represent /nta/. This conjunct is rendered only if the font has it. As per the Zero-Width-Non-Joiner’s usual meaning, Chillu-NA + RRA + ZWNJ should produce * in every font. Then, as per Unique encoding rule, NA + VIRAMA + RRA should not form /nta/ conjunct. This will make either  or  impossible to write.
  3. New Unicode code point for . This has the same problems of option 1; plus more work and complexity.

 

Due to the lack of a perfect solution, I feel option 1 should be the pragmatic choice.