localeSupport(2)

[Home] [Commands] [Variables] [Macro-Dev] [Glossary]

LOCALE SUPPORT

Locale support within MicroEmacs handles the hardware and software configuration with respect to location, including:-

Displayed Character Set
Keyboard Support
Word characters
Spell Support

There are many other locale problems which are not addressed in this help page. Supporting different locale configurations often requires specific hardware (a locale specific keyboard) and knowledge of the language and customs of the region. This makes it a very difficult area for one localized development team to support, as such, JASSPA rely heavily on the user base to report locale issues.

Note on Names and IDs

The language name is not sufficient to identify a locale (Mexican Spanish is different to Spanish Spanish) neither is the country name (two languages are commonly used in Belgium), so before we've really started the first problem of what to call the locale has no standard answer! Call it what you like but please try to call it something meaningful so others may understand and benefit from your work.

In addition, the internal id and data file names have a length limit of just four characters due to the "8.3" naming conversion of MS-DOS. The standard adopted by JASSPA MicroEmacs for the internal locale id is to combine the 2 letter ISO language name (ISO 639-1) with the 2 letter ISO country name (ISO 3166-1). Should the locale encompasses more than one country, then the most appropriate country id is selected.

Displayed Character Set

A character set is the mapping of an integer number to a display symbol (i.e. character). The ASCII standard defines a mapping of numbers to the standard English characters, this standard is well defined and accepted, as a result the character set rarely causes a problem for plain English.

Problems occur when displaying characters found outside the ASCII standard, such as letters with accents, letters which are not Latin based (e.g. Greek alphabet) and graphical characters (used for drawing dialog boxes etc.). There are many different character sets to choose between and if the wrong character set is selected then the incorrect character translation is performed resulting in an incorrect character display. If the character display looks incorrect then first try changing the font and character-set setting, these can be configured using the platform page of user-setup(3).

If the problem persists (i.e. because the character set used to write the text is not supported on your current system) use the charset-change(3) command to convert the text to the current character set.

If your character-set is not supported then first make sure that MicroEmacs will draw all of the characters to be used. By default MicroEmacs does not draw some characters directly as the symbol may not be defined. When a character is not defined then there will typically be a gap or space in the text at the unknown character, in some cases there may be no space at all which will make it very hard to use. The insert-symbol(3) command (Edit->Insert Symbol) is a good way of looking at which characters can be used with the current character set.

For a character to be rendered (when in main text) or poked (drawn by screen-poke(2) or osd(2)) is defined by the set-char-mask(2) command. The characters that are used when drawing MicroEmacs's window boarders or osd dialogs is set via the $box-chars(5) and $window-chars(5) variables.

MicroEmacs attempts to improve the availability of useful graphics characters on Windows and UNIX X-Term interfaces. The characters between 0 and 31 are typically control characters with no graphical representation (e.g. new-line, backspace, tab etc.) if bit 0x10000 of the $system(5) variable is set then MicroEmacs renders its own set of characters. These characters are typically used for drawing boxes and scroll-bars.

With so many character sets, each with their own character mappings, then the problem of spelling dictionary support is also tied to the locale. MicroEmacs uses the ISO standard character sets (ISO 8859) internally for word and spelling support and therefore a mapping between the ISO standard and the user character set is required. This mapping is defined by using the 'M' flag of the set-char-mask(2) command.

The user may declare the current character set in the platform page of user-setup(3). All the settings required for supporting each character set may be found in the charset.emf macro file, so if your character set is not supported, this is the file to edit.

Keyboard Support

The keyboard to character mapping is defined in the Start-Up page of user-setup(3), where the keyboard may be selected from a list of known keyboards. If your keyboard is not present, or is not working correctly, then this section should allow you to fix the problem (please send JASSPA the fix).

Most operating systems seem to handle keyboard mappings with the exception of MS-Windows which requires a helping hand. The root of the problems with MS-Windows is it's own locale character mappings which change the visibility status of the keyboard messages which conflict with Emacs keystroke bindings. To support key-bindings like 'C-tab' or 'S-return' a low level keyboard interface is required, but this can lead to strange problems with the more obscure keys, particularly with the 'Alt Gr' accented letter keys. For example on American keyboards pressing 'C-#' results in two 'C-#' key events being generated, this peculiarity only occurs with this one key. On a British keyboard the same key generates a 'C-#' followed by a 'C-\'.

This problem can be diagnosed using the $recent-keys(5) variable. Simply type an obvious character, e.g. 'A' then the offending key followed by another obvious key ('B'), then look for this key sequence in the $recent-keys variable (use the list-variables(2) or describe-variable(2) command). So for the above British keyboard problem the recent-keys would be:

    B C-\\ C-# A

($recent-keys lists the keys backwards). Once you have found the key sequence generated by the key, the problem may be fixed using the translate-key(2) to automatically convert the incorrect key sequence into the required key. For the problem above the following line is required:

translate-key "C-# C-\\" "C-#"

Note that once a key sequence has been translated everything, including $recent-keys, receive only the translated key. So if you a suspected a problem with the existing definition, change the keyboard type in user-setup to Default so no translations are performed, quit and restart MicroEmacs before attempting to re-diagnose the problem.

All the settings required for supporting each keyboard may be found in the keyboard.emf macro file, so if your keyboard is not supported, this is the file you need to edit.

Word characters

Word characters are those characters which are deemed to be part of a word, numbers are usually included. Many MicroEmacs commands use the 'Word' character set such as forward-word(2) and upper-case-word(2). The characters that form the word class are determined by the language being used and this can be set in the Start-Up page of user-setup(3).

If your language is not supported you will need to add it to the list and define the word characters, these settings may be found in the language.emf macro file. The 'a' flag of command set-char-mask(2) is used to specify whether a character is part of a word, you must specify the uppercase letter and then the lowercase equivalent so the case conversion functions work correctly.

A list of characters to be removed from the word character set is stored in the .set-char-mask.rm-chars variable. This is done so that the language may be changed many times in the same session of MicroEmacs without any side effects (such as the expansion of the word character set to include all letters of all languages). This makes MicroEmacs ideal for writing multi-language documents.

This may unfortunately be made a little more tricky by the requirement that this list must be specified in the most appropriate ISO standard character set (see Displayed Character Set section). When extending the word character set the characters have to be mapped to the current character set which may not support all the required characters. For example in the PC-437 DOS character set there is an e-grave (`e) but no E-grave so the E-grave is mapped to the normal E. As a result, if trying to write French text the case changing commands will behave oddly, for example:

    r`egle -> REGLE -> r`egl`e

The conversion of all 'E's to '`e' is an undesirable side effect of '`E' being mapped to E. This can be avoided by redefining the base letter again at the end of the word character list, for example:

set-char-mask "a" "`E`eEe"

Spell Support

The current language is set using the Language setting on General page of user-setup(3), if your required language is not listed you must first create the basic language support by following the guide lines in the Word Character section above. If you Language is listed, select it and enable it by either pressing Current or saving and restarting MicroEmacs. in a suitable test buffer run the spelling checker, one of three things will happen:

The Spelling Checker dialog opens and spelling is checked

The spelling checker is supported by the current language and can be used (the rules and dictionaries have been downloaded and installed).

Dialog opens with the following error message:

Rules and dictionaries for language "XXXX" 
   are not available, please download.

The spelling checker is supported by the current language but the required rules and dictionaries have not been downloaded. You should be able to download them from the JASSPA website, see Contact Information. Once downloaded they must be placed in the MicroEmacs search path, i.e. where the other macro files (like me.emf) are located.

Dialog opens with the following error message:

Language "XXXX" not supported!

The spelling checker is not supported by the current language, see the following Adding Spell Support section.

Adding Spell Support

To support a language MicroEmacs's spelling checker requires a base word dictionary and a set of rules which define what words can be derived from each base word in the dictionary. The concept and format of the word list and rules are compatible with the Free Software Foundation GNU ispell(1) package.

The best starting point is to obtain ispell rules and word lists in plain text form, the web can usually yield these. Once these have been obtained the rules file (or affix file) must be converted to a MicroEmacs macro file calling the add-spell-rule(2) command to define the rules. The rule file should be named "lsr<lang-id>.emf" where "<lang-id>" is the spelling language id, determined by the .spell.language variable set in the language.emf macro file.

The spellutl.emf macro file contains the command spell-conv-aff-buffer which will attempt to convert the buffer but due to formatting anomalies this process often goes wrong so using the command spell-conv-aff-line (also contained in spellutl.emf) to convert a single line is often quicker. See existing spelling rule files (lsr*.emf) for examples and help on command add-spell-rule(2).

Note: the character set used by the rules should be the most appropriate ISO standard (see Displayed Character Set section), this can make the process much more difficult if the current character set not compatible, if you are having difficulty with this please e-mail JASSPA Support.

Once the rules have been created, create a dictionary for the language from the word lists, see help on command add-dictionary(2). The dictionary file name should be "lsdm<lang-id>.edf", if the dictionary is large and can be split into two sections, a set of common words and a set of more obscure ones, create two dictionaries calling the dictionary containing obscure words "lsdx<lang-id>.edf" and the other as above.

Once the generated word and dictionary files have been place in the MicroEmacs search path, the spelling checker should find and use them. Please submit your generated support to MicroEmacs for others to benefit.