Home: Henrik Theiling

Ukaliq Script ​​

​​​​​​

Ukaliq
The Ukaliq script is a universal alphabetic script written left-to-right.

The Ukaliq alphabet is meant to be used phonemically or morpho-phonemically, for any human language, i.e., exact phonetic representation is not the goal, but the goal is to be able to write the vast majority of languages sensibly, appropriately, and well. The Latin script, which is arguably the most widely used, does not work well: it has too few vowels and consonants for most languages, even for the most closely related to Latin.

Generally, in Ukaliq, one character equals one phoneme, either vowel or consonant. Common affricates have separate characters, because in many languages, they act specially, e.g., more like regular plosives.

A single diacritic (a dot below right) can be used for purposes defined by each language -- usage is proposed for many typical phenomena.

Even with many characters, it is impossible to encode every phoneme in one letter when a limited set of characters is available, which is 128 for Ukaliq. These 128 characters also comprise numbers and symbols. A useful selection was, therefore, attempted so that most languages will hopefully find better representation than in Latin script, without needing many digraphs or other compromises.

The Ukaliq script provides 17 vowel symbols, which should be enough for the phoneme inventory of most languages. And with the dot diacritic, the number doubles.

Wrt. consonants, 8 points of articulation are distinguished and two sets of plosives plus two sets of fricatives are available, or, by different grouping, four sets of plosives. For languages that have many plosives but few fricatives, the typical fricatives that still occur are kept separate (e.g., 's').

Also, more nasals, liquids, and approximants are available than in Latin script. Modifier letters for clicks and ingressives are available. In total, there are 60 consonant letters, two of which are mainly meant as modifier letter (for implosives/ingressives and for clicks). And again, with the dot diacritic, the number doubles.

The Ukaliq script cannot generally represent morpho-phonological phenomena directly -- languages just differ too much. Ukaliq is meant to only represent sounds of a language, i.e., reflect what is pronounced. There is a single diacritic (the dot) that can be used for special purposes.

It is expected that some languages need digraphs, especially if morphology 'composes' phonemes (e.g., if a plosive + h may become an aspirated plosive etc.).

If you want to use the Ukaliq script and you have questions, need support, or maybe want a different font or font style, or if you found a bug, then please don't hesitate to contact me: ukaliq@theiling.de

1.1 Name

Aingai! The name 'Ukaliq' is an appreciation of the Ukaliq character from Anaanaup Tupinga (ᐊᓈᓇᐅᑉ ᑐᐱᖕᒐ), which I liked a lot (Kalla may be the next font name, because he catches more fish ('iqaluit')). 'Ukaliq (ᐅᑲᓕᖅ) /ukalɜq/' is Inuktitut for 'hare' (probably 'arctic hare' by default). Studies of Greenlandic and Inuktitut grammar and phonology for the preparation of my S33 conlang led me to this wonderful children's program from Iqaluit in Canada presented by Riit (Ittuanga) and Qimmiq. Thank you for this!

1.2 History

The Ukaliq script was originally planned for the S33 conlang (which is not yet published), based on an old idea that needed realisation. It then became a universal script, simply because the opportunity presented itself, and the letter shapes were actually not so bad.

The original old idea was to exploit a digital 7-segment display to its fullest potential. Not because of technological nostalgy, but because it felt like a good exercise in restriction to simple shapes.

The shapes possible with a 7-segment display, smoothened out a bit for a nice font, seemed to have just the right amount of complexity to be sufficient for a good script, and also, shapes in many scripts felt somehow similarly complex to what a 7-segment display can do, which further encouraged the experiment.

Earlier attempts exist, but here, the 7-segment shapes were merely an inspiration for letter shapes, and then from those bases, the shapes were modified or extended. No restriction to a plain 7-segment display was followed. The Ukaliq script strictly follows that restriction, so a 7-segment display (plus an dot, which is usually available for a comma) can fully display the Ukaliq script.

1.3 Symbol Chart

-CFCFECEEFCEFBBCBFBCFBEBCEBEFBCEF
-#’ »:.,)!‘ «("¿;¡?=
D_z>dʒ dʑdz<s×pfkx qχtʃ tɕts
Aɤʉɯøœyɑæoɨueɛiɔ
AD+¤@ɬɭɫ ʟʎl°˂ ɓ ɗ ʄ ɠ ʛɴmɳŋɲn
Gawɻ ɹʒ ʑɾ¬÷jǂ ǀ ʘ ǁ ǃʃ ɕr
DG0123456789101112131415
AGəʕ ɦʁvʐɣ ɰʝðħhχfʂx ɧçθ
ADG%ɢbɖgɟdʡʔqpʈkct
H ̱
DH
AH
ADH
GHʷʲ
DGH
AGH ̤ˠˤʰ
ADGHʼ

2 Font and Script Design

This section explains how the Ukaliq script and the Ukaliq font were constructed. The script construction is explained first, and the font construction later, although during the script construction, the final symbol shapes are used. However, starting with the font construction would require to go into technical detail at the beginning, and I think it is better to start with the higher level design ideas, and then go deeper into the technical details.

2.1 Script Design

The Ukaliq script was originally thought to be a script for a specific constructed language, but then I realised that with 128 characters, it is feasible to construct a universal script that may be suitable for most human languages. Several design goals were then stated.

The 7-segment display approximates shapes from originally nice shapes. The Ukaliq script retrofits nice shapes to all 7-segment patterns, and gives them a meaning.

Here is the 7-segment display, with all 7 line segments and the dot switched on:

ABCDEFGH

To start, all the possible shapes were put into a large table with 16 columns (in two groups) and 8 rows, simply by using the segments A,B,C,D,E,F,G of a 7 segment display in that order as bits for the table coordinates. (A..F are the outer segments, clock-wise, with A the top-most segment, and G is the center segment).

0123456701234567
Group 0Group 1
0
1
2
3
4
5
6
7

I then stared at this table to find patterns that would make good rows and colums of similar symbol and letter properties. Properties would be linguistical like the point of articulation, or numeric, like the sequence of digits (0..15 for hexadecimal), or graphical, somehow.

After staring for a while, it became clear to me that one row will be the 16 digits. I also decided that I want eight points of articulations so that one row could have, for the same manner of articulation (e.g., fricative), two different sets, e.g., voiced and unvoiced fricatives.

After staring more, I decided that I would prefer non-connected symbols for punctuation marks so that they would stand out visually. For this, however, the table as sorted above provided little support as the unconnected symbols are all over the place, so grouping them into rows or columns would be difficult. So the bits used for the X and Y coordinates were shuffled for a while, until the Y coordinate was for the three horizontal segments: D, A, G. I also shuffled the bits in the X coordinate further to make the digit row somehow 'consistent'. The result was the final order of the table:

0123456701234567
Group 0Group 1
0
1
2
3
4
5
6
7

This table felt right: the first column and first row are suitable symbols for punctuation marks. Row 5 now looks nice for a row of digits. And several dash-like shapes are available for dash-like punctuation marks. The other characters can be letters.

The table was then filled from bottom to top with letters: first the fricatives, as this is where languages distinguish the most finely among point of articulation. It turned out that 16 fricatives is too few, while for other consonant groups like nasals or liquids, not nearly that many are needed. This is why the final table has a row of 15 fricatives in two groups in one row (​​​​​​ʕ ʁ v ʐ ɣ ʝ ð’ and ​​​​​​​ħ h χ f ʂ x ç θ’), plus a few extra ones (​​​s z ʃ ʒ’). For plosives, 14 seemed enough and filled a row almost completely. The other groups of consonants have far fewer members, so they were grouped in smaller groups.

The point of articulation (POA) is roughly encoded in the column of each group of eight columns. I started with an order based on front to back position in the mouth and throat, but it did not really work well, so I swapped column assignment of POA until it felt good.

For vowels, again, a representative selection was attempted with eight distinguished positions and two roundedness or other typical distinctions, plus two neutral vowels a’ and ə’. The order within the group is roughly based on consonant POA, but largely pragmatic (based on symbol shape). The more frequent variant for a vowel of a given position is in one group: ​​​​​​​æ o e u ɨ ɛ i ɔ’, the variant (rarer ones) in the other: ​​​​​​ɤ ø ɯ ʉ œ y ɑ’.

Some compromise was needed: eight points of articulation is a lot for distinction in a single language, but the languages of the world together know more than that. Therefore, columns merged POAs, based on whether the distinction is typically phonemic or not. Also, since fricatives have the largest amount of distinguished POAs, and several are added additionally, this way there are enough distinctions for phonemes (but maybe not phones) for most individual languages.

Rare phonemes, or phonemes that are rarely distinguished, were merged with others, like [kx qχ] and [tʃ tɕ]. Some rows still remained with empty letter assignments, and those were filled with punctuation marks. Occasionally, to get a nice punctuation mark shape, column assignments were swapped.

What bothers me about the decimal digits of a 7-segment display is that the '1' is always ugly, because it is not centered and cannot fill the space -- it is not a nice shape for a monospaced font:

For the Ukaliq font, I wanted to rectify this, and also exploit it for punctuation: it may actually be beneficial for some shapes to be using only one side of the display, e.g., for a full-stop or quotation marks, etc. I.e., for all letters and digits, the unevenness can be avoided, and for some punctuation marks, it may be exploitable. This is why single vertical line characters are punctuation marks, and they usually come in pairs, to function like brackets: ‘t’’ and (t)’. Also, the full-stop is usually followed by a space, so is assigned such that there is space on the right: t.

For punctuation marks, I also added some mathematical and logical symbols because I am a programmer and these things are necessary in our modern society. I also tried to eliminate ambiguous usage, e.g., for dash and ellipsis, if it felt feasible with the available symbols.

One achievement is that all commutative mathematical and logical operators are symmetric: ​​​​​+ = × ∧ ∨ ⊕’, while the others are not symmetric: ​​​​​− ÷ < > % ¬’. The numbers and vowels and generally all letters also group nicely -- this is described in respective dedicated chapters.

2.2 Glyph Shapes

ABCDEFGH
The 7-segment display design fascinates me: it is designed to use the smallest number of light bulbs, or LEDs, to represent all 10 decimal digits. I am not aware of any other light-only display that manages to do that with less than 7 LEDs. Several variations of the segment shapes have been used, but the basic straight-line design (with little tweaks at the corners) works very well and is still in use in many, many applications. Typically, the digits look as follows:

I also noticed that the complexity of the shapes of what can be displayed with a 7-segment display is very similar not only to decimal digits, but also to many letters of the Latin alphabet, as well as Cyrillic, Greek, Armenian and many other scripts. There is something about this that is a good match to alphabetic scripts. It is not a surprise, of course, because digits are of similar shape complexity as letters in the mentioned alphabets. However, no script can be fully represented, because there are always some shapes that cannot be displayed or distinguished. Here are some shapes that are very similar to letters in Latin, Cyrillic, and/or Armenian (there are many, many more):

A
H
h
P
d
Б
Ч
П
п
Ω
Ր
Ը
Ք
Ւ
ר
ה

One approach to a new script would be to try to fit Latin, Greek, Cyrillic, Armenian, and maybe other script's letters to 7-segment shapes. For most letters, this will result in a bad approximation, because the shape really does not fit well. There are contrastive appendices, or diagonals, or half-long strokes, or more than two vertical lines, or just too many details.

Q
R
Я
Ա
Ц
Д
Л
W
X
Ф
Ю
Ш
Щ
Ћ

Furthermore, many shapes would map to the same 7-segment pattern, even within the same original script.

B
8
D
O
0
U
V
Ս
Ա
Մ
ר
ד
ך
H
א

Finally, many available shapes would not be mapped at all, because there are no good letter equivalents.

But principally, the complexity of what can be displayed on a 7-segment display is very fitting for an alphabet. Therefore, in the past, I have used the basic shapes of a 7-segment LED in script design, but only as an inspirational starting point. For the Ukaliq script, I wanted to fully restrict myself to only the possible 7-segment shapes. There are 128 patterns: each of the 7 digits can be on or off. With a dot, which is typically present, there are 256 shapes. For the design of the major font shapes, however, the dot was ignored.

Enumerating the 128 shapes is easy. Here they are:

To create stroke-based glyph shapes for the font, the first step is to separate all the patterns that are contiguous, i.e., all patterns of adjacent segments. That's the following ones:

All other shapes are combinations of these contiguous ones. Due to the symmetric display, shapes can be further grouped for those that are mirrow images or rotated versions, and we can focus on one representative.

In the next step, these patterns are converted to non-intersecting strokes, where no segment is drawn twice. If there are junctions, multiple strokes may be necessary. The typical slight tilt of the display is removed for now so we get a non-italic font at first. Simple straight lines are trivial, but for many of the others, this step is where the main magic happens: a nice shape appears once the strokes are rounded out a bit.

The open-ended single stroke shapes are the ones where I think the magic of the retrofitted smooth stroke is the most remarkable. It is fascinating how the shape changes from clunky and digital to appealing and actually font-like by just tracing the stroke and rounding it a bit:

Then there are more single-stroke shapes: cyclic ones, and those with a self-junction. The latter ones also have a multi-stroke decomposition.   

  

For shapes that require multiple strokes, there are often alternative stroke decompositions, a few of which are shown here. Notice how the 'H' shape is found to have a Hebrew 'א' (Aleph) decomposition, too. My algorithm is definitely on the right track.

For each 7-segment pattern, one of those glyphs is then selected to be the primary shape in the Ukaliq script. This is either done manually or half-automatically by checking for features like junctions or cycles or straight lines and weighing the features of the glyph. The other shapes remain as variants -- maybe they are suited for handwriting or other styles of fonts.

The curves are generated algorithmically from the 7-segment pattern, so curve parameters can be adjusted easily. The initial versions of the symbols felt too round, particularly S-shaped symbols. By defining corner tension non-symmetrically, the results are more pleasing, and also, symmetric symbols become different, not mere mirrored or rotated versions of other symbols:

For a font file, the strokes need to be converted to outlines. This opens up the possibility to change the line thickness of a single stroke. I kept it simple and just changed the width based on horizontal vs. vertical orientation.

At this point, a font designer would start working: there are endless possibilities to create detailed letter shapes from here: by adding serifs, by adjusting X height, etc. But my work ends here (for now): this is the initial Ukaliq Sans font for the Ukaliq script. Note that 'Mono' is implicit, because Ukaliq is an inherently monospaced script -- 7-segment displays cannot shrink or expand horizontally, and so the script won't do, either.

2.2.1 The Dot

The dot is the only diacritic. In LED displays, it is usually placed to the right of the digit on the base line. For the nicer looking Ukaliq font, I want tight letter spacing, so space for an optional dot is not ideal. I also want the Ukaliq script to be naturally monospaced (just like multi digit 7-segment LED displays), so adding width when the dot is drawn is also out of the question. So instead, the diacritic is drawn as a dot or a vertical line on the right side below the glyph. Fonts may play with other solutions. The following also shows the combination of contiguous shapes within a glyph: the current fonts simply merges them.

2.3 Paths and Outlines

This section explains how the Ulakiq Sans font gets from the square shaped 7-segment patterns to rounded glyph outlines.

For the first font, I wanted to keep things simple. The goal was not to become a font designer, so the Ukaliq fonts are not artistically sophisticated fonts. The main goal was to create an initial font that is pleasing enough, for a new Ukaliq script.

The original shapes are all rectangular, although this does not matter for the algorithms used in creating the strokes and outlines: arbitrarily angular (without the 'rect-') would work, too.

2.3.1 Paths

The main idea is to use quadratic bezier curves, as they are natively supported by SVG (the 'Q' command) and by many font formats. My favorite documentation of bezier curves (and other things) is by Bartosz Ciechanowski. For this font, I also experimented with cubic beziers, but since some font formats are restricted to quadratic, and they are also algorithmically simpler, I sticked with those. If a font format needs cubic beziers, they can be easily generated from quadratic ones.

To convert an angular stroke to a curved one, start and end points are used as is, and intermediate ones are used as control points for the quadratic bezier curve:

A notion of tension would be nice to make the corner configurable from fully angular (tension=1) to fully round (tension=0). I defined tension as the relative length of a straight line before using a curve. I.e., instead of a bezier stroke, a straight line, a bezier stroke, and again a straight line are drawn, and the straight line length is the tension. E.g., with tensions 1, 0.7, 0.5, 0.3, and 0, the curve looks as follows:

1 0.7 0.5 0.3 0

The tension can be different on the left or the right of a point, allowing for fine-grained adjustment of the shape using only very simple parameters.

0.40.8

The same principle is applied for multiple internal points on a stroke, and the auxiliary points introduced by the tension then become the actual points that the curve passes through. Again, tension can be adjusted.

If needed, for cubic bezier curves, the two cubic auxiliary points can be constructed from the one quadratic auxiliary point. To get the same shape, we can start at the quadratic auxiliary point, and shift it towards each end point 1/3 of the respective way. (Just using the same point twice for both cubic auxiliary points would produce a tighter curve.)

The strokes are drawn with their ends cut off exactly at the stroke's end points. There are other ways to do this, e.g., to use circular ends, but we will use cut-off ends here, to be able to generate crisp corners.

The cut-off ends cause vertical strokes to be wider in total (by the line width) than horizontal strokes and the same holds in vertical direction. In order to make it nice, horizontal strokes should be a bit longer. This means we need to adjust the stroke positions for horizontal end points by half the line thickness. Or maybe a bit less -- it's often good not to use exactly the edge coordinate, because it may look like overshooting.

Line thickness needs even more care. This is because a bold font should work together with a light font from the same family. But simply making the line thicker in the bold font will also make the character larger: taller and wider, because the line thickness will expand in all directions.

As a basic rule, the same character should remain in the same bounding box regardless of line thickness. Sure, many bold fonts in a proportional family (but usually not in a monospaced family) are also wider -- but generally not taller, because height is actually a font's size parameter. The general width of a font is often a different parameter: the stretch. It can be applied separately. First, to keep the bounding box, the stroke points need to be shifted inward for larger line thickness.

There are more details that make a glyph more 'right' to a human eye. For this, corrections are usually applied when the shape of a stroke changes. This is ignored here for the most part, because it quickly gets artistic and this is not mainly a font design project. One noteworthy detail that this font does not care about is to convince the human eye that a round shape is the same height as a straight shape. In the following figure, the first round glyph is exactly the same height as its neighbour, but it looks less tall. It can be made more pleasing by making the round shape overshoot a bit.

2.3.2 Outlines

Computing outlines of a straight line means to compute a new path around the line, which means to expand the stroke on both sides perpendicularly to the direction of the stroke, and then connecting the four new points. This new path is the outline of the stroke.

The same can be done with multiple straight pieces of a stroke, and then the resulting outlines can be connected into one outline. However, inner points at a bend need to be adjusted to avoid a self-intersecting outline. This can be done by finding the intersection point and using it instead. At the outside of corners, we could just connect the outline points if a clipped corner is OK. But we can also intersect those outer lines to find a better point to avoid the clipped corner.

Computing exact outlines for the curved strokes is difficult. So we won't try, but live with approximations. Luckily, computing good approximations is easy, particularly for angles that are not too pointy. And 90° is still good. Approximations can simply be computed by following exactly the same approach as described before, and just handling the auxiliary points like normal outline points for straight lines.

With this method, we can scale the offset from the stroke points to adjust the line width in X and Y direction. E.g. to make horizontal lines thinner and vertical ones thicker, we just scale the Y offsets down a bit (and maybe the X offsets up).

So here are a few enlarged glyphs that are constructed in the described way, with different rendering parameters for weight and also stretch. To be more fun, weight can be adjusted separately for X and Y axis.

2.3.3 Serifs

Serifs are fun, so despite promising I wouldn't touch that topic, here's a section about serifs.

Serifs are for decorating letters, and because decoration is a matter of taste, and there are many different tastes, even simple kinds of serifs can be parameterized in many ways. The mechanism to add serifs to Ukaliq serifs has about the same amount, or even more, parameters as for computing the basic letter shape.

There are many kinds of serifs, but I'd like to concentrate on 'transitional serifs', which is a style used for many Latin, Greek, Cyrillic, and Armenian fonts (LGCA). It is called 'transitional', because it is the transition from old style to modern. It is used in Times New Roman, Baskerville, Palatino, and many other fonts. Old style serifs are more fancy and rounded, and modern serifs are simpler and based on straight lines. There are many more styles, and other writing systems have even more styles, but I feel like the shapes of Ukaliq are most similar to LGCA, so those scripts' serifs may fit well. The old style serifs require more artistic attention and the modern ones look too much like poster style to me. The transitional ones are clean and nice and can be applied by a script. Traditional serifs are based on the following decorative element, which replaces the straight end of a line.

Modern serifs can easily be derived from these by removing the curve.

The left and right part are the same, so it is easy to apply only half a serif. Also, the serifs can be applied in all directions with the same shape: bottom, top, left, right. When only one half is applied, it is often allowed to overshoot a bit so that the end is not straight, particularly for left and right ends on top or bottom horizontal lines.

The final correction step shows how the auxiliary point of the curve needs to be placed to look good: here, the intersection point of the stroke line and of the shifted foot line is used, so that the curve's auxiliary point is shifted a bit to the left. Similar correction of the auxiliary point needs to be applied when the stroke does not come in perpendicularly to the foot line, e.g., in an 'A' or 'Λ'.

Similar to overshoot, undershoot is also often applied, particularly to the top of vertical lines of lower case letters with ascenders, like 'h'.

In sharp corners, half serifs can also be be applied, e.g., in 'B', 'E', 'Γ' or 'Π'.

Usually in LGCA fonts, full serifs are used at vertical bottom ends, at upper case vertical top ends, and at horizontal left and right mid ends. I.e., the letters 'H' and 'Θ' have full serifs. Half serifs are used in other ends or corners, so 'B' has half serifs on the left, 'E' at all top and bottom ends and corners, and 'n' and 'u' on the vertical top mid ends.

A lot of things can be tweaked with serifs, like making the serif foot a bit concave, and traditional serifs will also round out the shape of the serif itself. But Ukaliq does not do that, because it is too artistic. Everything that is shown here is applied automatically to the Ukaliq font outlines computed before. Here are a few serif test glyphs:

Ukaliq Serif applies transitional serifs based on upper case LGCA letters, but the top ends are applied like lower case letters,i.e., only half serifs are used towards the left side only. This is because Ukaliq has no upper/lower case distinction, so Ukaliq letters should be more like lower case in usage, because lower case is the default, unmarked, frequent letter shape. They must not be overdecorated, otherwise everything would look like shouting in upper case.

For the same reason, Ukaliq has no corner serifs, i.e., for the lower left corner, the 'b' style is preferred over the 'B' style. Latin usually does have some corner serifs on small letters, but Ukaliq does not, also because only the lower left corner is really angular in the style used here.

Symbols and punctuation marks have no serifs in Ukaliq, but digits do.

2.3.4 Font File Formats

SVG allows that filled paths overlap, but font formats usually do not allow that. With the construction described so far, overlap can still happen for self-junction strokes. And for cyclic strokes, it would be necessary to disconnect the inner from the outer outline for most font formats, in order to 'subtract' the inner part of the loop instead of drawing the stroke. Doing all this is more complicated than what this section showed, and my script does not bother either, because FontForge can import files in SVG format and do all the corrections that we need. So a FontForge script is used to rectify these things and generate the final font files.

Here's a systematic weight vs. stretch table of the same glyph.

OpenType fonts could, theoretically, encode such stretch and weight settings parametrically, i.e., a single font file can be used to print with any stretch and bold setting. The Ukaliq fonts, however, do not support that, because it is a major additional effort to encode OpenType fonts, instead of just rendering glyphs and make a font with FontForge.

2.4 Technical Realisation

Everything here is generated by a script written in Perl. It started as a Perl script, and then I never needed anything else for generating the font and documentation. It is a large Perl script now.

First, the script analyses the 7-segment patterns, following the algorithm described above: decompose, find strokes, compute outlines. All the SVG graphics to show the parameters etc., are all generated by the same script by setting the parameters and then invoking the algorithm to generate SVG images.

Also, all the Unicode files are generated by the script. The script needed the Unicode properties internally anyway, so then I just wrote them in the proper format. As an input, the original Unicode data files are read. This is done to make characters behave like corresponding characters that already exist in Unicode. Unicode has so many properties that copying them from a template and then adjusting them is much more easy than starting on a blank page.

All the HTML output is also generated, so that I can write HTML marked-up text intermixed with generated SVGs. And the table-of-contents can then be generated automatically. Also, transliterated text samples were converted on-the-fly to the proper Ukaliq Unicode (or SVG) output into the HTML document. Unfortunately, the resulting HTML is quite large, due to all the embedded SVGs.

The script also generates the font. Because I chose SVG as the image format, for simplicity, an SVG font is generated. However, the script cannot do all the magic that is needed, so a FontForge script is used to post-process the raw SVG font to produce a usable font in .ttf and also .svg format. FontForge fixes overlapping outline, gets the outline vertex order right, does the final rounding, and generates TTF format.

3 Suggested Script Usage

The Ukaliq script is meant to be a feasible script for any human language. There will be, as there will always be, exceptions that won't work very well, but that's the problem with anything that uses the 'universal' label for a highly diverse problem. Ukaliq provides a lot of consonant and vowel letters, and in this section, an overview is provided to show how I could imagine these letters could be applied to make a writing system for a human language.

3.1 Letters

For consonants, eight points of articulation are distinguished, which should be enough for most languages. Bilabial/labiodental are merged, dental/alveolar are merged, post-alveolar/palatal/alveolo-palatal are merged, and pharyngeal/epiglottal are merged. If languages do need a phonemic contrast, the dot diacritic may be used (the dot should then preferably mark: bilabial, dental, post-alveolar or alveolo-palatal, epiglottal). Typical sibilants and affricates are additionally available, because they commonly are distinguished despite fewer contrasts in plosives. (If velar and uvular affricates are distinguished phonemically, the dot should mark the uvular one.)

3.1.1 Obstruents

The following table lists the plosives, fricatives, affricates, and sibilants. The letters often use common stroke patterns for the groups and for the weak/strong contrast:

plosivefricativeaffricatesibilant
Patternweakstrongweakstrongweakstrongweakstrong
Pattern󱵰󱵸󱵠󱵨󱴐󱴘󱴀󱴈
labial󱴃
dental/​alveolar󱴇
retroflex󱴄
palatal/​postalveolar󱴆
velar󱴅
uvular󱴂
pharyngeal/​epiglottal󱴀
glottal󱴁

Entries that are missing are rare, very rare, inpronounceable, or usually not phonemically distinct from another entry. The following consonants share a character:

Ukaliq
Designationw ɰʃ ɕtʃ tɕʒ ʑdʒ dʑkx qχʕ ɦ

Depending on language, b’ vs. p’, labelled 'weak' vs 'strong', may be voiced vs. unvoiced, lax vs. tense, or express contrast in aspiration or glottalisation. The same holds for the two fricative series. If only one phonemic distinction is needed, then the weak series should be preferred. If a language has more than two plosive series, then the fricatives may be used, e.g., for aspirated plosives instead, and the 'weak' vs. 'strong' of b’, v’ vs. p’, f’ may encode a different distinction. E.g. in Korean, b’ may be unaspirated non-glottalised, p’ may be unvoiced glottalised, and f’ may be unvoiced aspirated. Typical fricatives (sibilants) in languages with more plosives but fewer fricatives have been kept separate: ​​​z s ʒ ʃ’ are separate letters (the fricative series have ​​​ð θ ʝ ç’). The following is a proposed mapping for a few languages (the headlines also gives the transliteration for reference -- this is not the exact IPA pronunciation in those languages). The 'old spelling' is often given in a help text pop-up.

Ukaliq
Designationbpvfdtðθgkɣx
Italianbpvfdtgk
Germanbvfdgx
Ancient Greekbpdtgk
Modern Greekbpvfdtðθgkɣx
Hindi/​Urdubpdtgk
Koreanb pd tg k

This proposes indeed to change spelling when sound shifts are complete, like from Ancient to Modern Greek: the letters ​b v’ and ​d ð’ and ​g ɣ’ in Ukaliq are similar, so this would even work, in my opinion, to handle dialects. One aspect of complexity of using the Latin script is historic spelling, so misfit letters should only be used if there is very good reason, e.g., to unify dialects into a common spelling that would otherwise be difficult to teach, learn, or use (e.g., like in Faroese).

Many languages have separate sibilants, additional to a large number of plosives. Therefore, Ukaliq has additional sibilants: ​​​s z ʃ ʒ’ that can be used.

There are also separate affricates for all sibilants and for a few others: ​​​​​​​​ts dz tʃ dʒ dʐ tʂ kx pf tɬ’. Often, these can also stand in for another series of 'plosives'. In many languages, the postalveolar or alveolo-palatal affricates are related or have derived from palatal plosives. If many affricates are needed, but no patalal plosives are needed, these may be a used, e.g., in Hindi.

In Korean, there is the complication that it has an additional point of articulation that is an affricate, which acts pretty much like the plosive series for other points of articulation. The plain 'plosive' of this series is /tɕ/. It is differentiated in three manners of articulation, just like for other plosive series. But the Ukaliq script has only two affricate letters for that point of articulation. In this case, the palatal series of plosives/fricatives could be used, but it may be better, because it is closer to the actual pronunciation, to use ​​dz dʒ tʃ’ for /tɕ tɕ͈ tɕʰ/. Also because there is a variant /ts ts͈/ pronunciation for /tɕ tɕ͈/.

The following table gives an overview of how languages could use the sibilants, affricates, and the palatal plosive/fricative series, and also the retroflex fricatives:

Ukaliq
Designationdztsɟ cʝ çzsʒʃjʐʂ
Hindi/​Urdudʒʱtʃʰzsʒʃjʂ
Koreants tɕts͈ tɕ͈tsʰ tɕʰsj
Germantszsʒʃj
Italiandztszsʒʃj
Polishdztszsʑɕjʐʂ

For Italian, it may be sensible not to distinguish /dz ts/ and use dz’ for both, and the same for /z s/ z’, because the phonemic difference is either in the gemination, which is marked by doubling, or in phonological context (beginning of word/stem vs. intervocalic), or dialectal variation. The pairs are not distinguished in Latin spelling either.

For Hindi/Urdu, /ʂ ʒ ʃ/ may not be phonemic. Maybe thay are not needed as letters.

Languages may have, despite a large inventory of consonants, more fricatives. E.g., Hindi/Urdu has four series of plosives, but also /f ʋ s z/. For sibilants, there is no problem, as Ukaliq has separate letters. For /ʋ/, w’ may be used if there is no /w/ phoneme. For /f/, the dot diacritic should be used, i.e., the undotted letter should be the aspirated plosive, e.g., Hindi/Urdu /f ʋ pʰ/ should be mapped to ​​f̱ v f’.

3.1.2 Sonorants

For liquids, flaps, trills, approximants, nasals, and generally rhotics, there are the following letters. Again, most letters, but not all, follow a general stroke pattern for the point of articulation and type of consonant.

Patternnasalliquidapprox/​flaptrill
Pattern󱴸󱴰󱵀󱵈
labial󱴃
alveolar󱴇
retroflex󱴄
palatal󱴆
velar󱴅
uvular󱴂

There are usually fewer distinctions in a single language than for plosives, fricatives, or affricates. The distinction between ​ɾ r’ is frequent enough to have different letters.

If a language has a single phoneme for /l ɾ/ (Korean or North Greenlandic), the letter l’ or ɾ’ should be used for that phoneme, depending on what is the perceived default pronunciation. The default rhotic should be r’, particularly if there is dialectal variation on actual pronunciation. The letter ɾ’ should be used if there is a phonemic contrast with /r/, or if there is really no dialect that has a pronunciation like /r ɾ ɻ ɹ ʁ ʀ/.

Ukaliq's set of nasals should be enough for most languages. However, Dravidian often distinguishes dental vs. alveolar and also has retroflex nasals. Malayalam has six phonemic nasals, and the dental vs. alveolar are distinguished also for the plosives and fricatives. Here, the dot diacritic should be used to mark the dental consonants, e.g., using ​​ṉ ḏ ṯ’ for /n̪ d̪ t̪/.

3.1.3 Vowels

Vowels come in two main series, primary and secondary, which distinguishes the default roundedness, with primary the more common. There are three heights, front-center-back, and three additional common vowels.

frontcenterback
Pattern1st2ndPattern1st2ndPattern1st2nd
Pattern󱴨󱴠󱴨󱴠󱴨󱴠
high󱴆󱴂󱴃
mid󱴄󱴁
low󱴅󱴇
common

High back vowels use the same pattern as labial consonants (󱴃{CF}’), so that ​f u’ and ​w u’ relate. High front vowels use the same pattern as palatal consonants (󱴆{EF}’), so ​ç i’ relate, and although j’ does not use the standard palatal pattern (it is an exceptional letter), ​j i’ look similar in the same way as ​w u’.

Except for a’, all vowels are marked with either primary or secondary pattern (and a’ uses the primary pattern shifted down because it is lower than æ’). All vowels except the common ones use three segments to indicate their position.

backfronthigh
The 'high' and 'mid' rows use bit-wise pattern selection.
back
The 'low' row uses this pattern selection.

Vowels should be taken from the primary series first, unless it is really inappropriate. Inappropriate would mean that rounding is clearly not correct, e.g., in Japanese, ​​​​a e i o ɯ’ could be used. However, for the sake of cross-linguistic consistency, ​​​​a e i o u’ could also be used for Japanese.

For a typical two-height three vowel system, ​​a i u’ could be used even if more allophones exist (Greenlandic, Inuktitut, Classical Arabic), and a three-height three vowels system could use ​​ɨ ə a’ (Adyghe).

For a typical three-height five vowel system, ​​​​a e i o u’ could be used (Spanish, Basque, (Japanese)), plus additions as necessary, e.g., ​ɨ ə’ (Romanian).

For a typical four-height seven vowel system, ​​​​​​a e ɛ i o ɔ u’ could be used (Italian, Portuguese, Western Catalan), plus additions as necessary, e.g., ə’ (Eastern Catalan).

Generally, if no phonemic distinction is needed, ​​i u y’ should be preferred over ​​e o ø’ (for a two height vowel system), i.e., the 'high' row should be preferred over the 'mid' row. And ​​e o ø’ should be preferred over ​​ɛ ɔ œ’ (for a three height vowel system), i.e, the 'mid' row should be preferred over the 'low' row. E.g., Nahuatl has four vowels: ​​i e a’ and then what could be either o’ or u’ -- generally u’ would be preferrable, unless it is really misleading.

E.g., in German, the vowel system is generally three-height (/iy eø a/ and /u o a/), only for long front vowels, /ɛ:/ is additionally distinguished as ɛ’. So in general, e’ should be used also for short /ɛ/ in German.

For a four or five height vowel system, the dot diacritic may be necessary to distinguish even more vowels. Although it is likely that some vowels are triggered only in certain phonological context, so the dot may still not be necessary. (E.g., for Danish, I honestly don't know.)

3.1.4 Miscellaneous

Length (both vowel and consonant) should usually be marked by double letters. If it makes more sense morphophonemically, the diacritic dot could also be used.

The dot diacritic can also be used to distinguish a variant pronunciation, usually a different phone for the same phoneme, or the change of a phoneme by another phoneme that may have dropped, e.g., in Greenlandic, it can be used on ​​a i u’ before uvulars, to mark the vowel change, as the triggering uvular is not pronounced: ​ukali̱q u̱qa̱ppu̱q’ or u̱ssu̱q’. In Inuktitut, on the other hand, no diacritic is generally needed: ​ukaliq uqaqtuq’ and ursuq’.

The dot can also be used to distinguish points of articulation that have been merged, e.g., to distinguish dental vs. alveolar in Malayalam.

In the following suggested usages for the dot, usually a normal letter is converted to a modifier letter using the dot. In all cases, the dot should only be used if needed, i.e., unless there is a phonemic or morphological reason to distinguish the plain from the modifier letter. E.g., for aspiration or voicelessness, ’ may be used, or preferably just h’, as in nẖ’ or nh’ for /n̥/.

Also, the modifier letter approach should only be used if the pronunciation needs to be marked explicitly. If the phonological structure makes it clear, no additional marking is needed.

The modifier letter usage can be used to mark co-articulation (e.g. 'gb', 'kp', 'Nm'), by putting the dot on the second consonant: gḇa

Another usage of the modifier letter is to mark a diphthong. Generally, for distinguishing a joint phoneme where is it not immediately clear that one is the modifier, (e.g., in diphthongs), the 'weaker' phoneme (e.g., the non-syllabic one) should carry the dot diacritic: i̱a’ for /i̯a/ and au̱’ for /au̯/.

Voice or breathy voice can be marked with a glottal voiced fricative ʕ’, e.g., fʕ̱’. Note that the same letter is also used for a pharyngeal voiced fricative, which is why pharyngealisation is not marked with this letter.

Pharyngealisation can be marked with a pharyngeal voiceless fricative ħ’, e.g., dħ̱’.

Creaky voice or strident voice can be marked with an epiglottal plosive ʡ’, e.g., aʡ̱’.

Ejectives or glottalised consonants can be marked with a glottal stop: tsʔ̱’.

Palatalisation can be marked as consonant + ’, e.g., rj̱’.

Velarisation can be marked with consonant + ɣ̱’, e.g., nɣ̱’.

Labialisation can be marked with consonant + ’ e.g., ʃw̱’.

Syllabic pronunciation can be marked with ə̱’ + consonant, e.g., ə̱n’.

Nasalisation can be marked with vowel + nasal, ’ by default, e.g., ɔṉ’. Other nasals may also be used here, if the language structure is that way (e.g., if a dialectal non-nasal pronunciation uses another nasal consonant).

Trills can be written by using a plosive base letter and adding r’ as a modifier letter, e.g., bṟ’.

Implosives (and other ingressives) can be marked with the ingressive modifier letter after the base letter: ’. This letter most likely needs no dot diacritic at all, because it is mainly meant to be a modifier letter.

The same holds for clicks, which can be marked with the click letter after the base letter that determines the point of articulation: ’ for voiced /ʘ/. Clicks will often need more letters to specify secondary articulation and/or release.

Tone is marked with (undotted) digits after the vowel, preferably just enumerating it and ignoring the glyph shape, i.e., without trying to match the glyph shape with the tone contour. This is because usually, there is dialectal variation of the tone contour anyway. E.g. in Mandarin, tones could be marked ​​​1 2 3 4’, like it is often done in ASCII when no diacritic marks are available. Dialects with unstressed tone ('no tone' or 'fifth tone') may leave out the digit modifier), e.g., ʃwei3’.

Stress can be marked with a dot on a vowel, if there is only one phonemic kind of accent. If there are multiple stress types, numbers are used after the vowel, like for a tone.

Some languages may use the dot for a combination of stress and length (e.g. German), particularly if the length is not obvious in unstressed syllables in all cases.

There is a dedicated chapter on how to use the Ukaliq script for selected languages.

Numbers are written in a special way, see section 'Numbers', where for the base, a dotted digit is used. The dotted digits are reserved for bases, so the dot should not be used for anything else on digits.

3.2 Numbers

Numbers are a way to represent a numeric value in a script. Numbers are distinguished from sequences of digits: numbers primarily represent a numeric value, while sequences of digits do not primarily represent a numeric value, but sequences are usually used for identification or comparison (of lexicographic equality). Sequences of digits may be preceded by #’, just like in 'room #1234': rum#1234’. The rest of this section is about writing numbers.

One goal of Ukaliq number notation is to avoid confusion among Western (based on 1000n: thousand, million, billion, ...), Chinese (based on 10000n), and Indian (mixed) and other base-10 systems, where telling large numbers may require shifting zeros to find the right 'number word'. The following characters are used to represent positive numbers in Ukaliq script. It is a sequence of these characters that constitute a number token in a programming language.

UnicodeImgFontNameTranslit.Description
U+EE50DIGIT ZERO0
U+EE51DIGIT ONE1
U+EE52DIGIT TWO2
U+EE53DIGIT THREE3
U+EE54DIGIT FOUR4
U+EE55DIGIT FIVE5
U+EE56DIGIT SIX6
U+EE57DIGIT SEVEN7
U+EE58DIGIT EIGHT8
U+EE59DIGIT NINE9
U+EE5ADIGIT TEN
U+EE5BDIGIT ELEVEN
U+EE5CDIGIT TWELVE
U+EE5DDIGIT THIRTEEN
U+EE5EDIGIT FOURTEEN
U+EE5FDIGIT FIFTEEN
U+EED0BASE SIXTEEN
U+EED1BASE NEGATIVE TWO
U+EED2BASE TWO
U+EED3BASE THREE
U+EED4BASE FOUR
U+EED5BASE FIVE
U+EED6BASE SIX
U+EED7BASE NEGATIVE EIGHT
U+EED8BASE EIGHT
U+EED9BASE NEGATIVE TEN
U+EEDABASE TEN
U+EEDBBASE NEGATIVE THREE
U+EEDCBASE NEGATIVE FOUR
U+EEDDBASE NEGATIVE FIVE
U+EEDEBASE NEGATIVE SIX
U+EEDFBASE NEGATIVE SIXTEEN
U+EE70ITERATION MARKmarks digit repetition
U+EE05COMMA,separates integer from fractional part

The following are supportive symbols related to numbers, i.e., these are arithmetic and comparison operators.

UnicodeImgFontNameTranslit.Description
U+EE42MINUS marks negative numbers and is used as an infix subtraction operator
U+EE30PLUS+ redundant, but can mark positive numbers, and is used as an infix operator for addition
U+EE49DIVISION÷for writing fractionals and for normal division
U+EE1AMULTIPLICATION×multiplication operator
U+EE71PERCENT%modulo operator
U+EE0FEQUALS=equality comparison operator
U+EE18LESS-THAN<less-than comparison operator
U+EE12GREATER-THAN>greater-than comparison operator

Numbers are constructed in the following way, using the above number constituent characters:

NumberUkaliqTranslit.Comment
10241024 Smaller numbers (maybe up to 5 digits) can be written by just listing the digits, just like in Latin. Confusion is avoided by forbidding the use of digit separators: if the numbers get so large so that there may be a wish to have digit separators, then instead, a different notation is used.
3②11 The Ukaliq number notation is based on base + exponent notation. There are base numerals for bases ​​​​​​​② ③ ④ ⑤ ⑥ ⑧ ⑩ ⑯’, derived by adding a dot diacritic to a digit. Without a base numeral, digits are interpreted as decimal.
200004⑩2 The exponent is prefixed to the base numeral. Numerals for blocks of digits, e.g., a thousand, a million, do not exist.
10003⑩ The number of digits for such digit group words is different in different languages, so Ukaliq numbers avoid the confusion. Speakers just associate 'thousand' with 3,
1.000.0006⑩ ... and a 'million' with 6,
100004⑩ ... and '万' (wàn, 'ten thousand', in Chinese) with 4, etc., and the numbers are still comprehensible without shifting digits.
1000005⑩ So a 5⑩’ is a 'hundred thousand' in one language and 'ten 万' in another language.
23000006⑩23 The digits (also called 'mantissa') follow the base numeral from highest to lowest unit. Trailing zero digits can be omitted.
100004⑩ When using a base numeral, the mantissa may be empty and then defaults to '1'. I.e., 1000 is 3⑩’, which is equivalent to 3⑩1’.
138⑯10Ā Ukaliq provides the digits 0..15 so that even hexadecimal notation can be used: ​​​​​​​​​​​​​​​0 1 2 3 4 5 6 7 8 9 Ā B̄ C̄ D̄ Ē F̄’.
10 Single digit numbers without a base numeral may exceed 10, because the interpretation is clear.
error1Ā For sequences longer than 1 digit and with digits larger than 9, a base numeral is needed, because these are neither decimal digit sequences, nor single digits with a trivial interpretation.
3②11 After a given base digit, no digit larger than or equal to the base numeral must be used.
204811② Before a base digit, there is essentially an Ukaliq number again, so the default base for the exponent of a base numeral is again 10. In effect, a number parser never needs to look backwards for interpreting a sequence of digits: the base is always specified before a digit.
8②11② Another base numeral may be prefixed for the exponent, but it probably confuses readers, because exponents are seldom specified in anything but decimal notation.
100⑩100 When using a base numeral, the prefixed exponent is optional and defaults to the number of digits in the mantissa minus one, but the exponent is at least 1.
20⑩2 An alternative minimum for the exponent would be 0, but then ’ and ’ and all bases on their own would all have value 1, which is not useful. Also, with a minimum exponent of 1, values like 20, 30, 40 are shorter and need no initial '1', e.g., ⑩2’ instead of the more equivalent, but longer, 1⑩2’.
512⑯200 Note that the point of the Ukaliq number notation is to actually specify a number for larger exponents so that no counting of digits is necessary. However, another use of the base numeral is to specify the numeric base of the number, so, e.g., small hexadecimal numbers can be given by just prefixing a base numeral ’, without an exponent: ⑯10’. This is the main reason for the default value for the exponent.
32⑯20
32⑯2 This is not 2, because the minimum implied exponent is 1.
22 For single digit hexadecimal numbers, just use no base numeral.
30.2⑩30,2 For fractions of 1, an UKALIQ COMMA ,’ is inserted, and the fractional digits follow. The comma is mandatory -- digits must not be given past the unit 1 digit, i.e., 30.2 is ⑩30,2’ (and NOT ⑩302’).
error1⑩302 The comma must be used after the unit 1 digit, and it must be exactly after the unit 1 digit, not before, otherwise, the number is ill-formed.
301⑩30, This can be used as a safety check when writing numbers: if the exponent is specified and the comma is given, then the number is only well-formed if the mantissa has exactly the right length. I.e., a missing digit or a wrong exponent would be noticed.
702004⑩70200, This form is meant to be given on accounts and checks and bills, so that the numbers can be aligned at the comma, just like when listing only the digits.
350  2⑩350, This way, larger amounts (with more digitis) stand out, and the number specification is absolutely clear: visually, all digits are given, so alignment works well, and there is an additional safeguard that specifies the magnitude (=exponent).
3.000.000.000.07012⑩300000000007Zero digits may also appear in the mantissa.
3.000.000.000.07012⑩30⋮70 For large numbers with many zeros, sequences of zero may be replaced by a single zero and an iteration mark ’ to fill the mantissa with zeros.
28888888889⑩28⋮ The iteration mark formally fills all remaining digits with the digit given directly before. It stops at the unit digit. I.e., without a comma, this is an integer, i.e., the digit iteration does not continue into the fraction.
28888.755⑩28⋮,75 After the integer, a fraction may follow.
0.10,1 Simple fractions, e.g., those not requiring a string of zeros after, are written by using putting the comma in the string if digits, just like in Latin script (in some languages).
0.22222....0,2⋮ In fractions, the iteration mark repeats the preceding digit indefinitely.
0.2343434343....0,2⋮34⋮ To repeat multiple digits in a fraction, the iteration mark is used before and after the repeated part. This usage is usually not used in the non-fractional part of a number.
3333333.333...6⑩3⋮,⋮ If an iteration mark is used in the non-fractional part, it can be used again without a new digit after the comma and then continues repeating the same digit as before the comma.
error,25 A number cannot start with a comma. This is because a comma is also used for sentences, so there may be confusion: ,25’ could be comma and then the number 25 instead of 0.25.
270270, A comma may end a number, but that does not cause confusion in sentences about the unit of any digit in that number: the value is the same, even if the comma is parsed as part of the surrounding sentence.
300000000000012⑩3 If the exponent of the base is larger than one digit, it may be written with multiple digits if it is small (just like a normal small number can be written that way). Technically, the exponent is just another Ukaliq number prefixed to the base numeral.
3000000000000⑯C̄⑩3 It may be weird (and not helpful), but the exponent can be specified in a different base.
4*101002⑩⑩4 In any case, larger numbers can use base numeral notation in the exponent, too.
1616⑯⑯ To avoid confusion, it is customary not to use non-decimal bases in exponents, because humans usually don't need that.
161616⑯ As mentioned before, decimal is the default also for the exponent.
-1000−3⑩ Negative numbers are prefixed with a minus sign. The minus sign is otherwise also used as an infix operator, just like in Latin script.
0.00023❿2 For smaller fractions, to avoid strings of zeroes following the comma, a negative exponent is available by means of the negative base numerals: ​​​​​​​❷ ❸ ❹ ❺ ❻ ❽ ❿ ⓰’. Negative base numerals are equivalent to the corresponding normal base numerals, but the exponent is taken to be negative. So 3⑩’ is 1000, but 3❿’ is 1/1000 = 1000-3.
0.0002(−3)⑩2 The exponent of a number cannot simply be prefixed with a minus, because that would make the whole number negative. In scientific notation, however, is is exactly how negative exponents may be written (in parenthesis). This is not part of the normal number notation, however.
0.571❿571 For fractional bases, the default exponent is always 1, and not dependent on the number of digits given. ’ without an exponent is, thus, an abbreviation of 0,’.
2/3❸2 The negative bases can also be used for writing simple fractions.
5/6❻5 This is mainly where all the weird base numerals are useful (a base 5 number is probably not very useful otherwise).
5/362❻5 The number notation does allow larger weird fractional notations, though. However, the fractional base also sets the base number, so specifying '7/36' is not possible this way (with a decimal numerator).
7/367÷36 For real fractionals, the ÷’ is used just like the fraction slash '⁄' in Latin script.
-2/5−❺2A negative fraction
9208000001(6⑩80⋮1)2⑩92 In math notation, exponents are written in prefix notation in parenthesis.
x2 + 2xy + y2(2)x+2xχ+(2)χ The exponent can be written the same way with variables. The multiplication sign is mandatory in Ukaliq script after parentheses, because of the possible confusion with the exponent spelling. It is otherwise optional for math notation, however. Whitespace is optional, but advisable. Variable names need to be found, of course, from the set of Ukaliq letters -- I used x’ and χ’ in this case.
error. The full-stop is not used for numbers in Ukaliq script.
#20-3485-667#20_3485_667 For sequences of numbers, spaces or dashes may be inserted (whereever you want -- as this is not a numeric value, grouping may be part of a standard format for a given numeric ID). For numbers, however, no spaces or dash or any other digit group separators are used.

3.3 Punctuation

The Ukaliq punctuation marks are similar to Latin and other scripts, but not equivalent. The chapter on numbers already explained how the comma is used in numbers (but not the full-stop/period). This section gives some more details.

UnicodeImgFontNameUsed ForNot Used For
U+EE04FULL STOP The full-stop is used to end sentences. Just like in Latin, an ellipses for indicating missing text also uses three full-stops. The full-stop is not used in numbers. Instead, the comma separates integer from fractional parts. And there is no digit separator in Ukaliq. The full-stop is also not used in ranges, neither numeric (for intervals, like in many programming languages: 1..10), nor in text, like 'Mon...Fri'. Instead, the range operator is used.
U+EE07EXCLAMATION MARK The exclamation mark is used for terminating exclamatory sentences or phrases just like in Latin. In programming languages, the Ukaliq exclamation mark is not used for the 'NOT' operation. There is a dedicated symbol for that.
U+EE0EQUESTION MARK The question mark is used for terminating interrogative sentences or phrases just like in Latin.
U+EE0DREVERSED EXCLAMATION MARK The reversed exclamation mark is used for starting exclamatory sentences or phrases just like the inverted exclamation mark in Latin in some languages (e.g., Spanish). It is a reversed glyph (like the Arabic question mark), so it is a mirrored version of the normal exclamation mark. It is, therefore, marked as such in the Unicode mirrored glyph list.
U+EE0BREVERSED QUESTION MARK The reversed question mark is used for starting interrogative sentences or phrases just like the inverted question mark in Latin in some languages (e.g., Spanish). It is reversed, so it is a mirrored version of the normal question mark. It is, therefore, marked as such in the Unicode mirrored glyph list.
U+EE08LEFT SINGLE QUOTATION MARK The left single quotation mark starts a quotation. There is no alternative here: this quotation mark cannot be used to end a quotation -- the script defines that this is the opening character, just like for parenthesis. This is unlike the various top, bottom, left, right, reversed, inverted, normally oriented quotation marks in Unicode, which need language context to be used correctly.
U+EE02RIGHT SINGLE QUOTATION MARK The right single quotation mark ends a quotation. There is no alternative here: this quotation mark cannot be used to start a quotation -- the script defines that this is the closing character, just like for parenthesis.
U+EE0ADOUBLE QUOTATION MARK The double quotation mark starts or ends a quotation. This can also be used as an apostrophe. There is just no space for a separate symbol for that. There is no 'fancy' glyph for a left or right alternative: this glyph is symmetric and is used for both start and end quotation. This is not used as a ditto mark: use an isolated iteration mark (i.e., without a number around it) for that.
U+EE10HYPHEN The hyphen is used inside words to separate parts of words. It is also used in programming languages in identifiers to separate words, much like an underscore. This is not used as a word separating dash. Use the dash for that instead. This is not used as a minus sign. Use the minus sign for that.
U+EE40DASH The dash is used to separate words in sentences, e.g., for embedded comment phrases, etc. It may be used in programming languages for a line comment symbol, like Haskell's '--'. This may also be used as a bullet symbol in bullet lists. This is not used inside words as a hyphen. Use the hyphen for that. This is not used as a minus sign. Use the minus sign for that.
U+EE42MINUS The minus sign is used for numeric purposes to indicate subtraction or negative numbers. It is also used inside of Ukaliq numbers after a base numeral to indicate a base fraction. This is not used inside words as a hyphen. Use the hyphen for that. This is not used as a word separating dash. Use the dash for that instead.
U+EE30PLUS The plus sign is used for numeric purposes to indicate addition or make explicit positive numbers. It may also be used in listing multiple words in (informal or abbreviated) texts, instead of the word 'and'. This usage is only appropriate if there is no way to confuse this with mathematical addition. This character's usage is basically just like in Latin script. There is also the 'and' sign, which may be more appropriate in texts to replace the word 'and', particularly if two propositional sentences are connected. E.g., for 'Apples and oranges are delicious', the '+' may be used (but the 'and' sign may also be used) but in 'Cucumbers are green and apples are delicous.', the 'and' symbols may be more appropriate than the '+'.
U+EE1AMULTIPLICATION The multiplication sign is used for numeric purposes to indicate multiplication.
U+EE49DIVISION The division sign is used for numeric purposes to indicate division. This should not be used as a slash when listing alternatives. Use the or sign instead.
U+EE71PERCENT The percent sign is used as a suffix operator for marking a percentage. In programming languages, it may also be used for the 'modulo' operation. It's usage is mostly equivalent to the Latin percent sign.
U+EE45AND The and sign is used for logical and. It is used roughtly like the '&' character in Latin script. As explained for the '+' sign, it may also replace the word 'and' in (informal or abbreviated) texts.
U+EE15OR The or sign is used for logical or. It is sometimes used roughtly like the slash character in Latin script for alternatives like 'Apples/Orange'. It may also replace the word 'or' in (informal or abbreviated) texts.
U+EE4AXOR The xor sign is used for logical xor. It may also replace the word 'or ... but not both' in (informal or abbreviated) texts.
U+EE48NOT The not sign is used for logical negation. It may also replace the word 'not' in (informal or abbreviated) texts. This can also be used as an infix operator for 'and not', much like a minus can function as a prefix or an infix operator as an arithmetic operator.
U+EE0FEQUALS This is used just like a normal equals sign.
U+EE18LESS-THAN This is used just like a smaller-than sign. It can also be used as a left arrow replacement if the display does not allow anything but Ukaliq.
U+EE12GREATER-THAN This is used just like a greater-than sign. It can also be used as a right arrow replacement if the display does not allow anything but Ukaliq.
U+EE38DEGREE The degree sign is used like in Latin to denote temperatures (but not for Kelvin) and angles.
U+EE31CURRENCY This is prefixed to a number for specifying units of money, for any currency (for the nominal or largest unit, i.e., for dollars/euros/pounds/yuan, not for cents/pence/fen/...). Obviously, if context does not make it clear, then more information is needed for define which currency exact this is. This can be an alphabetic abbreviation before the currency symbol, just like with 'AUS$400'.
U+EE01NUMBER Prefixed to numbers, usually sequences of digits used as identification numbers, not as numeric values, like in 'room #1234'.

3.4 Bidirectional Text

The Ukaliq script is written left-to-right.

However, the Unicode properties for bi-directional text are filled in correctly in the given table files. This means that the Ukaliq script can be rendered right-to-left, even it is not supposed to be.

Symbols of the Ukaliq script are not generally mirrored when written right-to-left, but just displayed in the opposite order. That is, except for those that define a bidi mirroring glyph via the Unicode table: those glyphs need to be swapped.

For right-to-left text to be rendered correctly, mirrorable character must be encoded in the codepoint stream based on semantics, i.e., the 'right' (or non-reversed) characters (parenthesis, question mark, exclamation mark, and single quotation mark) are the closing characters even if they end up rendered on the left of the right-to-left text and the 'left' characters are the opening marks. E.g., in codepoint order, the exclamation mark must still be at the end of a sentence.

3.5 Vertical Rendering

The Ukaliq script is natively rendered horizontally.

For vertical rendering, Ukaliq glyphs should be rotated 90°, because they are taller than wide, so unless rotated, they'd take up a lot of space vertically, and would leave space unused horizontally. However, this style has not been elaborated yet.

4 Unicode & Font

To easily extend an existing system that handles Unicode, Ukaliq provides many Unicode tables with script specific codepoint properties. Currently, Ukaliq is in the private use area.

ukaliq_codechart.html
ukaliq_unicodedata.txt
ukaliq_blocks.txt
ukaliq_scripts.txt
ukaliq_proplist.txt
ukaliq_derivedcoreproperties.txt
ukaliq_bidibrackets.txt
ukaliq_bidimirroring.txt
ukaliq_linebreak.txt
ukaliq_wordbreakproperty.txt
ukaliq_sentencebreakproperty.txt
ukaliq_propertyvaluealiases.txt
ukaliq_nameslist.txt

Additional to the standard Unicode files, some script specific data files are provided. The 'ukaliq_comments.txt' is for generating the additional information in the 'ukaliq_nameslist.txt' file and also the character list in the standard documentation of the block. The others are described below.

ukaliq_comment.txt
ukaliq_collationorder.txt
ukaliq_letternames.txt
ukaliq_transliteration.txt
ukaliq_transliterationfull.txt
ukaliq_omniglot.html

4.1 Letter Names and Sort Order

The Ukaliq script defines a default order of sorting the letters (and also the symbols and digits), just like in Latin, where there is a letter order a,b,c,d,...,z. The Unicode Ukaliq block is not ordered using this default character order, but instead, an additional data file is provided to define the collation order.

The provided file is in Unicode standard file format and assigns an integer to each Ukaliq code point. By default, Ukaliq code points are sorted in ascending order of that value.

ukaliq_collationorder.txt

It is recommended to use the default order of sorting if a language decides to use the Ukaliq script. However, languages may choose to use a different order if that makes more sense. E.g., it may be that the start of a word changes based on morphlogical or phonological processes, so that to ease dictionary lookup, letters that are related are sorted equal. E.g., for Celtic languages, this may be sensible.

A file with the letter names in IPA is also provided (the Unicode character names need pure ASCII names, which often produce mainly gibberish names). Due to the amount of letters, educational programs are advised to teach only the letters to speakers of a given language that are actually used for that language. Otherwise, the alphabet's letter names may just be to difficult to pronounce correctly and distinguishably.

Also, the letter names should be pronounced with the sound that letter represents in the given language (e.g., /at͜ɕːa/ instead of /at͜ʃːa/ and /oɦːo/ instead of /oʕːo/), and if consonant length does not exist in a language, or not in that position or context, letter names should be pronounced without it (Greenlandic: /uxːu/, but /uɣu/). If a consonant is not pronounced in a language in that context (or has allophones (in dialects) so that it could cause confusion), the context may also be adjusted (Inuktitut: /aʁa/, but /qa/). The names listed here are just for guidance so that some starting point exists and so that languages do not invent completely different names.

ukaliq_letternames.txt

The following table lists the alphabet in default alphabetic order, with the letter names in IPA:

Code PointCharacter NameImgFontLetter Name
U+EE3BUKALIQ LETTER AMA/amːa/
U+EE43UKALIQ LETTER AWA/awːa/
U+EE63UKALIQ LETTER AVA/avːa/
U+EE6BUKALIQ LETTER AFA/afːa/
U+EE73UKALIQ LETTER ABA/abːa/
U+EE7BUKALIQ LETTER APA/apːa/
U+EE1BUKALIQ LETTER APFA/ap͡fa/
U+EE3FUKALIQ LETTER ONO/onːo/
U+EE37UKALIQ LETTER OLO/olːo/
U+EE47UKALIQ LETTER ORO/oɾo/
U+EE4FUKALIQ LETTER ORRO/orːo/
U+EE67UKALIQ LETTER ODHO/oðːo/
U+EE6FUKALIQ LETTER OTHO/oθːo/
U+EE11UKALIQ LETTER OZO/ozːo/
U+EE19UKALIQ LETTER OSO/osːo/
U+EE77UKALIQ LETTER ODO/odːo/
U+EE7FUKALIQ LETTER OTO/otːo/
U+EE17UKALIQ LETTER ODZO/od͡zo/
U+EE1FUKALIQ LETTER OTSO/ot͡so/
U+EE3CUKALIQ LETTER ENRE/əɳːə/
U+EE34UKALIQ LETTER ELRE/əɭːə/
U+EE44UKALIQ LETTER ERRE/əɻːə/
U+EE64UKALIQ LETTER EZRE/əʐːə/
U+EE6CUKALIQ LETTER ESRE/əʂːə/
U+EE74UKALIQ LETTER EDRE/əɖːə/
U+EE7CUKALIQ LETTER ETRE/əʈːə/
U+EE14UKALIQ LETTER EDZRE/əd͡ʐə/
U+EE1CUKALIQ LETTER ETSRE/ət͡ʂə/
U+EE3EUKALIQ LETTER INJI/iɲːi/
U+EE36UKALIQ LETTER ILJI/iʎːi/
U+EE4CUKALIQ LETTER IJI/ijːi/
U+EE66UKALIQ LETTER IJJI/iʝːi/
U+EE6EUKALIQ LETTER ICJI/içːi/
U+EE46UKALIQ LETTER IZJI/iʒːi/
U+EE4EUKALIQ LETTER ISJI/iʃːi/
U+EE76UKALIQ LETTER IGJI/iɟːi/
U+EE7EUKALIQ LETTER ICI/icːi/
U+EE16UKALIQ LETTER IDZJI/id͡ʒi/
U+EE1EUKALIQ LETTER ITSJI/it͡ʃi/
U+EE33UKALIQ LETTER ALHA/aɬːa/
U+EE13UKALIQ LETTER ATLHA/at͡ɬa/
U+EE3DUKALIQ LETTER UNGU/uŋːu/
U+EE35UKALIQ LETTER ULGU/uɫːu/
U+EE65UKALIQ LETTER UGHU/uɣːu/
U+EE6DUKALIQ LETTER UKHU/uxːu/
U+EE75UKALIQ LETTER UGU/ugːu/
U+EE7DUKALIQ LETTER UKU/ukːu/
U+EE1DUKALIQ LETTER UKKHU/uk͡xu/
U+EE3AUKALIQ LETTER ANQA/aɴːa/
U+EE62UKALIQ LETTER ARHA/aʁːa/
U+EE6AUKALIQ LETTER AQHA/aχːa/
U+EE72UKALIQ LETTER AGQA/aɢːa/
U+EE7AUKALIQ LETTER AQA/aqːa/
U+EE61UKALIQ LETTER OHGO/oʕːo/
U+EE68UKALIQ LETTER OHHO/oħːo/
U+EE78UKALIQ LETTER OHHKO/oʡːo/
U+EE69UKALIQ LETTER AHA/ahːa/
U+EE79UKALIQ LETTER AHKA/aʔːa/
U+EE39UKALIQ LETTER HBAT/ɓat/
U+EE4DUKALIQ LETTER TKOT/ǃot/
U+EE2EUKALIQ LETTER I/iː/
U+EE26UKALIQ LETTER UI/yː/
U+EE2CUKALIQ LETTER E/eː/
U+EE24UKALIQ LETTER OE/øː/
U+EE2DUKALIQ LETTER EH/ɛː/
U+EE25UKALIQ LETTER OEH/œː/
U+EE28UKALIQ LETTER AE/æː/
U+EE41UKALIQ LETTER A/aː/
U+EE27UKALIQ LETTER AH/ɑː/
U+EE2FUKALIQ LETTER OH/ɔː/
U+EE29UKALIQ LETTER O/oː/
U+EE21UKALIQ LETTER EO/ɤː/
U+EE2BUKALIQ LETTER U/uː/
U+EE23UKALIQ LETTER EU/ɯː/
U+EE2AUKALIQ LETTER IH/ɨː/
U+EE22UKALIQ LETTER UH/ʉː/
U+EE60UKALIQ LETTER SCHWA/ʃwəː/

4.2 Transliteration

To ease typing in Ukaliq, a transliteration is proposed. An additional unicode data file is provided with a list of equivalent sequences per Ukaliq character, to allow to type Ukaliq script using IPA letters and/or ASCII only.

The transliteration of letters for consonants and vowels were based on CXS/X-Sampa, but a few changes needed to be made because in a normal text, numbers should not just be reinterpreted as phonetic symbols (so a backslash was added). Hence, all single number CXS/X-Sampa symbols where suffixed with a backslash. IPA is included as a transliteration, too, in case you can easily type that.

Also, to be more useful and mix better with writing normal text, all multi-character transliterations are meant to be applied greedily, unless they are explicitly seperated with a '|' character. E.g., 'ts' is mapped to the Ukaliq affricate letter. The '|' itself should map to an empty string.

The second transliteration file contains additional mappings to multiple code points that should be applied when applying the greedy transliteration rules. This file is not in Unicode format, but it maps codepoints sequences to other codepoint sequences, e.g., '|' to the empty string ''. Column 0 contains the sequence of characters to match greedily, column 1 contains 'single' if the mapping is also in the first transliteration file, i.e., mapping to a single Ukaliq code point, and column 2 is what the sequence should be mapped to: a sequence of Ukaliq code points, which is possibly empty. This table also maps '...' to three Ukaliq full-stops, because the main transliteration maps '..' to an Ukaliq character (range), but '...' should appear as three full-stops.

ukaliq_transliteration.txt
ukaliq_transliterationfull.txt

The following tables lists all transliterations for each Ukaliq code point. The list does not show trivial mappings that just add a dot, which is always done by either '~' or using a diacritic mark U+0331 MACRON BELOW.

Code PointCharacter NameImgFontTransliterations
U+EE00UKALIQ SPACE  
U+EE04UKALIQ FULL STOP.
U+EE0CUKALIQ SEMICOLON;
U+EE03UKALIQ COLON:
U+EE05UKALIQ COMMA,
U+EE0DUKALIQ REVERSED EXCLAMATION MARK¡ !:
U+EE07UKALIQ EXCLAMATION MARK!
U+EE0BUKALIQ REVERSED QUESTION MARK¿ ?:
U+EE0EUKALIQ QUESTION MARK?
U+EE09UKALIQ LEFT PARENTHESIS(
U+EE06UKALIQ RIGHT PARENTHESIS)
U+EE08UKALIQ LEFT SINGLE QUOTATION MARK « `
U+EE02UKALIQ RIGHT SINGLE QUOTATION MARK » '
U+EE0AUKALIQ DOUBLE QUOTATION MARK"
U+EE10UKALIQ HYPHEN_
U+EE20UKALIQ RANGE ..
U+EE40UKALIQ DASH --
U+EE30UKALIQ PLUS+
U+EE42UKALIQ MINUS -
U+EE1AUKALIQ MULTIPLICATION× *
U+EE49UKALIQ DIVISION÷ //
U+EE71UKALIQ PERCENT%
U+EE18UKALIQ LESS-THAN<
U+EE0FUKALIQ EQUALS=
U+EE12UKALIQ GREATER-THAN>
U+EE48UKALIQ NOT¬ !!
U+EE45UKALIQ AND &&
U+EE15UKALIQ OR ||
U+EE4AUKALIQ XOR ^^
U+EE01UKALIQ NUMBER#
U+EE31UKALIQ CURRENCY¤ $ £ ¥ ֏ ฿
U+EE38UKALIQ DEGREE°
U+EE32UKALIQ AT@
U+EE70UKALIQ ITERATION MARK ::
U+EE50UKALIQ DIGIT ZERO0 {0}
U+EE51UKALIQ DIGIT ONE1 {1}
U+EE52UKALIQ DIGIT TWO2 {2}
U+EE53UKALIQ DIGIT THREE3 {3}
U+EE54UKALIQ DIGIT FOUR4 {4}
U+EE55UKALIQ DIGIT FIVE5 {5}
U+EE56UKALIQ DIGIT SIX6 {6}
U+EE57UKALIQ DIGIT SEVEN7 {7}
U+EE58UKALIQ DIGIT EIGHT8 {8}
U+EE59UKALIQ DIGIT NINE9 {9}
U+EE5AUKALIQ DIGIT TEN #A {10}
U+EE5BUKALIQ DIGIT ELEVEN #B {11}
U+EE5CUKALIQ DIGIT TWELVE #C {12}
U+EE5DUKALIQ DIGIT THIRTEEN #D {13}
U+EE5EUKALIQ DIGIT FOURTEEN #E {14}
U+EE5FUKALIQ DIGIT FIFTEEN #F {15}
U+EED2UKALIQ BASE TWO [2]
U+EED3UKALIQ BASE THREE [3]
U+EED4UKALIQ BASE FOUR [4]
U+EED5UKALIQ BASE FIVE [5]
U+EED6UKALIQ BASE SIX [6]
U+EED8UKALIQ BASE EIGHT [8]
U+EEDAUKALIQ BASE TEN [10]
U+EED0UKALIQ BASE SIXTEEN [16]
U+EED1UKALIQ BASE NEGATIVE TWO [1/2]
U+EEDBUKALIQ BASE NEGATIVE THREE [1/3]
U+EEDCUKALIQ BASE NEGATIVE FOUR [1/4]
U+EEDDUKALIQ BASE NEGATIVE FIVE [1/5]
U+EEDEUKALIQ BASE NEGATIVE SIX [1/6]
U+EED7UKALIQ BASE NEGATIVE EIGHT [1/8]
U+EED9UKALIQ BASE NEGATIVE TEN [1/10]
U+EEDFUKALIQ BASE NEGATIVE SIXTEEN [1/16]
U+EE3BUKALIQ LETTER AMAm
U+EE43UKALIQ LETTER AWAw
U+EE63UKALIQ LETTER AVAv
U+EE6BUKALIQ LETTER AFAf
U+EE73UKALIQ LETTER ABAb
U+EE7BUKALIQ LETTER APAp
U+EE1BUKALIQ LETTER APFApf
U+EE3FUKALIQ LETTER ONOn
U+EE37UKALIQ LETTER OLOl
U+EE47UKALIQ LETTER OROɾ 4\
U+EE4FUKALIQ LETTER ORROr
U+EE67UKALIQ LETTER ODHOð D
U+EE6FUKALIQ LETTER OTHOθ T
U+EE11UKALIQ LETTER OZOz
U+EE19UKALIQ LETTER OSOs
U+EE77UKALIQ LETTER ODOd
U+EE7FUKALIQ LETTER OTOt
U+EE17UKALIQ LETTER ODZOdz
U+EE1FUKALIQ LETTER OTSOts
U+EE3CUKALIQ LETTER ENREɳ n`
U+EE34UKALIQ LETTER ELREɭ l`
U+EE44UKALIQ LETTER ERREɻ ɹ r\ r\` r`
U+EE64UKALIQ LETTER EZREʐ z`
U+EE6CUKALIQ LETTER ESREʂ s`
U+EE74UKALIQ LETTER EDREɖ d`
U+EE7CUKALIQ LETTER ETREʈ t`
U+EE14UKALIQ LETTER EDZRE dz`
U+EE1CUKALIQ LETTER ETSRE ts`
U+EE3EUKALIQ LETTER INJIɲ J
U+EE36UKALIQ LETTER ILJIʎ L
U+EE4CUKALIQ LETTER IJIj
U+EE66UKALIQ LETTER IJJIʝ j\
U+EE6EUKALIQ LETTER ICJIç C
U+EE46UKALIQ LETTER IZJIʒ ʑ Z z\
U+EE4EUKALIQ LETTER ISJIʃ ɕ S s\
U+EE76UKALIQ LETTER IGJIɟ J\
U+EE7EUKALIQ LETTER ICIc
U+EE16UKALIQ LETTER IDZJI dZ dz\
U+EE1EUKALIQ LETTER ITSJI tS ts\
U+EE33UKALIQ LETTER ALHAɬ K
U+EE13UKALIQ LETTER ATLHA tK
U+EE3DUKALIQ LETTER UNGUŋ N
U+EE35UKALIQ LETTER ULGUɫ ʟ L\
U+EE65UKALIQ LETTER UGHUɣ ɰ G
U+EE6DUKALIQ LETTER UKHUx ɧ
U+EE75UKALIQ LETTER UGUg
U+EE7DUKALIQ LETTER UKUk
U+EE1DUKALIQ LETTER UKKHUkx qX
U+EE3AUKALIQ LETTER ANQAɴ N\
U+EE62UKALIQ LETTER ARHAʁ R
U+EE6AUKALIQ LETTER AQHAχ X
U+EE72UKALIQ LETTER AGQAɢ G\
U+EE7AUKALIQ LETTER AQAq
U+EE61UKALIQ LETTER OHGOʕ ɦ h\
U+EE68UKALIQ LETTER OHHOħ X\
U+EE78UKALIQ LETTER OHHKOʡ >\
U+EE69UKALIQ LETTER AHAh
U+EE79UKALIQ LETTER AHKAʔ ?\
U+EE39UKALIQ LETTER HBAT˂ <<
U+EE4DUKALIQ LETTER TKOTǂ |\ ǀ ʘ ǁ ǃ
U+EE2EUKALIQ LETTER Ii
U+EE26UKALIQ LETTER UIy
U+EE2CUKALIQ LETTER Ee
U+EE24UKALIQ LETTER OEø 2\
U+EE2DUKALIQ LETTER EHɛ E
U+EE25UKALIQ LETTER OEHœ 9\
U+EE28UKALIQ LETTER AEæ &\
U+EE41UKALIQ LETTER Aa
U+EE27UKALIQ LETTER AHɑ A
U+EE2FUKALIQ LETTER OHɔ O
U+EE29UKALIQ LETTER Oo
U+EE21UKALIQ LETTER EOɤ 7\
U+EE2BUKALIQ LETTER Uu
U+EE23UKALIQ LETTER EUɯ M
U+EE2AUKALIQ LETTER IHɨ i\
U+EE22UKALIQ LETTER UHʉ u\
U+EE60UKALIQ LETTER SCHWAə e\
U+F1D00UKALIQ PATTERN BLANK󱴀{Z} {pharyngeal} {punctuation} {weak} {2nd}
U+F1D01UKALIQ PATTERN SEGMENTS-C󱴁{C} {glottal} {backmid}
U+F1D02UKALIQ PATTERN SEGMENTS-F󱴂{F} {uvular} {centerhigh}
U+F1D03UKALIQ PATTERN SEGMENTS-CF󱴃{CF} {labial} {backhigh}
U+F1D04UKALIQ PATTERN SEGMENTS-E󱴄{E} {retroflex} {frontmid}
U+F1D05UKALIQ PATTERN SEGMENTS-CE󱴅{CE} {velar} {frontlow}
U+F1D06UKALIQ PATTERN SEGMENTS-EF󱴆{EF} {palatal} {fronthigh}
U+F1D07UKALIQ PATTERN SEGMENTS-CEF󱴇{CEF} {dental} {alveolar} {backlow}
U+F1D08UKALIQ PATTERN SEGMENTS-B󱴈{B} {strong} {1st}
U+F1D09UKALIQ PATTERN SEGMENTS-BC󱴉{BC}
U+F1D0AUKALIQ PATTERN SEGMENTS-BF󱴊{BF}
U+F1D0BUKALIQ PATTERN SEGMENTS-BCF󱴋{BCF}
U+F1D0CUKALIQ PATTERN SEGMENTS-BE󱴌{BE}
U+F1D0DUKALIQ PATTERN SEGMENTS-BCE󱴍{BCE}
U+F1D0EUKALIQ PATTERN SEGMENTS-BEF󱴎{BEF}
U+F1D0FUKALIQ PATTERN SEGMENTS-BCEF󱴏{BCEF}
U+F1D10UKALIQ PATTERN SEGMENTS-D󱴐{D} {affricate} {weakaffricate}
U+F1D11UKALIQ PATTERN SEGMENTS-CD󱴑{CD}
U+F1D12UKALIQ PATTERN SEGMENTS-DF󱴒{DF}
U+F1D13UKALIQ PATTERN SEGMENTS-CDF󱴓{CDF}
U+F1D14UKALIQ PATTERN SEGMENTS-DE󱴔{DE}
U+F1D15UKALIQ PATTERN SEGMENTS-CDE󱴕{CDE}
U+F1D16UKALIQ PATTERN SEGMENTS-DEF󱴖{DEF}
U+F1D17UKALIQ PATTERN SEGMENTS-CDEF󱴗{CDEF}
U+F1D18UKALIQ PATTERN SEGMENTS-BD󱴘{BD} {strongaffricate}
U+F1D19UKALIQ PATTERN SEGMENTS-BCD󱴙{BCD}
U+F1D1AUKALIQ PATTERN SEGMENTS-BDF󱴚{BDF}
U+F1D1BUKALIQ PATTERN SEGMENTS-BCDF󱴛{BCDF}
U+F1D1CUKALIQ PATTERN SEGMENTS-BDE󱴜{BDE}
U+F1D1DUKALIQ PATTERN SEGMENTS-BCDE󱴝{BCDE}
U+F1D1EUKALIQ PATTERN SEGMENTS-BDEF󱴞{BDEF}
U+F1D1FUKALIQ PATTERN SEGMENTS-BCDEF󱴟{BCDEF}
U+F1D20UKALIQ PATTERN SEGMENTS-A󱴠{A} {vowel} {secondary}
U+F1D21UKALIQ PATTERN SEGMENTS-AC󱴡{AC}
U+F1D22UKALIQ PATTERN SEGMENTS-AF󱴢{AF}
U+F1D23UKALIQ PATTERN SEGMENTS-ACF󱴣{ACF}
U+F1D24UKALIQ PATTERN SEGMENTS-AE󱴤{AE}
U+F1D25UKALIQ PATTERN SEGMENTS-ACE󱴥{ACE}
U+F1D26UKALIQ PATTERN SEGMENTS-AEF󱴦{AEF}
U+F1D27UKALIQ PATTERN SEGMENTS-ACEF󱴧{ACEF}
U+F1D28UKALIQ PATTERN SEGMENTS-AB󱴨{AB} {primary}
U+F1D29UKALIQ PATTERN SEGMENTS-ABC󱴩{ABC}
U+F1D2AUKALIQ PATTERN SEGMENTS-ABF󱴪{ABF}
U+F1D2BUKALIQ PATTERN SEGMENTS-ABCF󱴫{ABCF}
U+F1D2CUKALIQ PATTERN SEGMENTS-ABE󱴬{ABE}
U+F1D2DUKALIQ PATTERN SEGMENTS-ABCE󱴭{ABCE}
U+F1D2EUKALIQ PATTERN SEGMENTS-ABEF󱴮{ABEF}
U+F1D2FUKALIQ PATTERN SEGMENTS-ABCEF󱴯{ABCEF}
U+F1D30UKALIQ PATTERN SEGMENTS-AD󱴰{AD} {liquid}
U+F1D31UKALIQ PATTERN SEGMENTS-ACD󱴱{ACD}
U+F1D32UKALIQ PATTERN SEGMENTS-ADF󱴲{ADF}
U+F1D33UKALIQ PATTERN SEGMENTS-ACDF󱴳{ACDF}
U+F1D34UKALIQ PATTERN SEGMENTS-ADE󱴴{ADE}
U+F1D35UKALIQ PATTERN SEGMENTS-ACDE󱴵{ACDE}
U+F1D36UKALIQ PATTERN SEGMENTS-ADEF󱴶{ADEF}
U+F1D37UKALIQ PATTERN SEGMENTS-ACDEF󱴷{ACDEF}
U+F1D38UKALIQ PATTERN SEGMENTS-ABD󱴸{ABD} {nasal}
U+F1D39UKALIQ PATTERN SEGMENTS-ABCD󱴹{ABCD}
U+F1D3AUKALIQ PATTERN SEGMENTS-ABDF󱴺{ABDF}
U+F1D3BUKALIQ PATTERN SEGMENTS-ABCDF󱴻{ABCDF}
U+F1D3CUKALIQ PATTERN SEGMENTS-ABDE󱴼{ABDE}
U+F1D3DUKALIQ PATTERN SEGMENTS-ABCDE󱴽{ABCDE}
U+F1D3EUKALIQ PATTERN SEGMENTS-ABDEF󱴾{ABDEF}
U+F1D3FUKALIQ PATTERN SEGMENTS-ABCDEF󱴿{ABCDEF}
U+F1D40UKALIQ PATTERN SEGMENTS-G󱵀{G} {approximant} {flap}
U+F1D41UKALIQ PATTERN SEGMENTS-CG󱵁{CG}
U+F1D42UKALIQ PATTERN SEGMENTS-FG󱵂{FG}
U+F1D43UKALIQ PATTERN SEGMENTS-CFG󱵃{CFG}
U+F1D44UKALIQ PATTERN SEGMENTS-EG󱵄{EG}
U+F1D45UKALIQ PATTERN SEGMENTS-CEG󱵅{CEG}
U+F1D46UKALIQ PATTERN SEGMENTS-EFG󱵆{EFG}
U+F1D47UKALIQ PATTERN SEGMENTS-CEFG󱵇{CEFG}
U+F1D48UKALIQ PATTERN SEGMENTS-BG󱵈{BG} {other}
U+F1D49UKALIQ PATTERN SEGMENTS-BCG󱵉{BCG}
U+F1D4AUKALIQ PATTERN SEGMENTS-BFG󱵊{BFG}
U+F1D4BUKALIQ PATTERN SEGMENTS-BCFG󱵋{BCFG}
U+F1D4CUKALIQ PATTERN SEGMENTS-BEG󱵌{BEG}
U+F1D4DUKALIQ PATTERN SEGMENTS-BCEG󱵍{BCEG}
U+F1D4EUKALIQ PATTERN SEGMENTS-BEFG󱵎{BEFG}
U+F1D4FUKALIQ PATTERN SEGMENTS-BCEFG󱵏{BCEFG}
U+F1D50UKALIQ PATTERN SEGMENTS-DG󱵐{DG} {numeral}
U+F1D51UKALIQ PATTERN SEGMENTS-CDG󱵑{CDG}
U+F1D52UKALIQ PATTERN SEGMENTS-DFG󱵒{DFG}
U+F1D53UKALIQ PATTERN SEGMENTS-CDFG󱵓{CDFG}
U+F1D54UKALIQ PATTERN SEGMENTS-DEG󱵔{DEG}
U+F1D55UKALIQ PATTERN SEGMENTS-CDEG󱵕{CDEG}
U+F1D56UKALIQ PATTERN SEGMENTS-DEFG󱵖{DEFG}
U+F1D57UKALIQ PATTERN SEGMENTS-CDEFG󱵗{CDEFG}
U+F1D58UKALIQ PATTERN SEGMENTS-BDG󱵘{BDG}
U+F1D59UKALIQ PATTERN SEGMENTS-BCDG󱵙{BCDG}
U+F1D5AUKALIQ PATTERN SEGMENTS-BDFG󱵚{BDFG}
U+F1D5BUKALIQ PATTERN SEGMENTS-BCDFG󱵛{BCDFG}
U+F1D5CUKALIQ PATTERN SEGMENTS-BDEG󱵜{BDEG}
U+F1D5DUKALIQ PATTERN SEGMENTS-BCDEG󱵝{BCDEG}
U+F1D5EUKALIQ PATTERN SEGMENTS-BDEFG󱵞{BDEFG}
U+F1D5FUKALIQ PATTERN SEGMENTS-BCDEFG󱵟{BCDEFG}
U+F1D60UKALIQ PATTERN SEGMENTS-AG󱵠{AG} {fricative} {weakfricative}
U+F1D61UKALIQ PATTERN SEGMENTS-ACG󱵡{ACG}
U+F1D62UKALIQ PATTERN SEGMENTS-AFG󱵢{AFG}
U+F1D63UKALIQ PATTERN SEGMENTS-ACFG󱵣{ACFG}
U+F1D64UKALIQ PATTERN SEGMENTS-AEG󱵤{AEG}
U+F1D65UKALIQ PATTERN SEGMENTS-ACEG󱵥{ACEG}
U+F1D66UKALIQ PATTERN SEGMENTS-AEFG󱵦{AEFG}
U+F1D67UKALIQ PATTERN SEGMENTS-ACEFG󱵧{ACEFG}
U+F1D68UKALIQ PATTERN SEGMENTS-ABG󱵨{ABG} {strongfricative}
U+F1D69UKALIQ PATTERN SEGMENTS-ABCG󱵩{ABCG}
U+F1D6AUKALIQ PATTERN SEGMENTS-ABFG󱵪{ABFG}
U+F1D6BUKALIQ PATTERN SEGMENTS-ABCFG󱵫{ABCFG}
U+F1D6CUKALIQ PATTERN SEGMENTS-ABEG󱵬{ABEG}
U+F1D6DUKALIQ PATTERN SEGMENTS-ABCEG󱵭{ABCEG}
U+F1D6EUKALIQ PATTERN SEGMENTS-ABEFG󱵮{ABEFG}
U+F1D6FUKALIQ PATTERN SEGMENTS-ABCEFG󱵯{ABCEFG}
U+F1D70UKALIQ PATTERN SEGMENTS-ADG󱵰{ADG} {plosive} {weakplosive}
U+F1D71UKALIQ PATTERN SEGMENTS-ACDG󱵱{ACDG}
U+F1D72UKALIQ PATTERN SEGMENTS-ADFG󱵲{ADFG}
U+F1D73UKALIQ PATTERN SEGMENTS-ACDFG󱵳{ACDFG}
U+F1D74UKALIQ PATTERN SEGMENTS-ADEG󱵴{ADEG}
U+F1D75UKALIQ PATTERN SEGMENTS-ACDEG󱵵{ACDEG}
U+F1D76UKALIQ PATTERN SEGMENTS-ADEFG󱵶{ADEFG}
U+F1D77UKALIQ PATTERN SEGMENTS-ACDEFG󱵷{ACDEFG}
U+F1D78UKALIQ PATTERN SEGMENTS-ABDG󱵸{ABDG} {strongplosive}
U+F1D79UKALIQ PATTERN SEGMENTS-ABCDG󱵹{ABCDG}
U+F1D7AUKALIQ PATTERN SEGMENTS-ABDFG󱵺{ABDFG}
U+F1D7BUKALIQ PATTERN SEGMENTS-ABCDFG󱵻{ABCDFG}
U+F1D7CUKALIQ PATTERN SEGMENTS-ABDEG󱵼{ABDEG}
U+F1D7DUKALIQ PATTERN SEGMENTS-ABCDEG󱵽{ABCDEG}
U+F1D7EUKALIQ PATTERN SEGMENTS-ABDEFG󱵾{ABDEFG}
U+F1D7FUKALIQ PATTERN SEGMENTS-ABCDEFG󱵿{ABCDEFG}

4.3 Font

53042167
The .ttf and .svg font files of the Ukaliq script provide two ranges for the 256 character shapes, one starting at U+EE00 for the semantic mapping, and one starting at U+F1D00 for the graphical mapping. The glyphs of the two ranges look the same, but the Unicode properties are different: in the semantic range, the letters and punctuation etc. are marked, while the graphical range is marked like box drawing and is meant to be used when the shape of a symbol is significant instead of its meaning. The graphical range has compatibility decomposition into the semantic range.

Within each range of 256 glyphs, the glyphs are mapped by shape: each of the 8 lower bits corresponds to a segment, with bits 0,1,2,3,4,5,6,7 corresponding to the 7-segment display segments C,F,E,B,D,A,G,H where segments A,B,C,D,E,F are clockwise with A top-most, G is the center segment and H is the dot.

The following font variants are part of the Ukaliq package:

Ukaliq Serif

Ukaliq Serif Bold

Ukaliq Sans Light

Ukaliq Sans

Ukaliq Sans Bold

Ukaliq Sans Black

Ukaliq Pen Light

Ukaliq Pen

Ukaliq Pen Bold

Ukaliq Pen Black

Ukaliq Serif Oblique

Ukaliq Serif Bold Oblique

Ukaliq Sans Light Oblique

Ukaliq Sans Oblique

Ukaliq Sans Bold Oblique

Ukaliq Sans Black Oblique

Ukaliq Pen Light Oblique

Ukaliq Pen Oblique

Ukaliq Pen Bold Oblique

Ukaliq Pen Black Oblique


5 Spelling Suggestions Per Language

5.1 German

In Ukaliq script, German is spelled morphophonemically with standard German pronunciation, even for loans and names, i.e., no 'foreign' spelling is retained once a word is fully loaned. E.g. 'Orange' is spelled ora̱ŋʒə’ and the colour 'orange' is spelled ora̱ŋʒ’ (it is pronounced with ʃ’, but final devoicing is predictable and not written). 'Operation' is operatsjon’.

German has the five vowels a, e, i, o, u, plus two rounded front vowels y, ø. All of these vowels can be long or short. As a bit of a complication, there is an additional long front vowel ɛ: (which becomes less popular, it seems) and a schwa ə. The German vowels are written that way: ​​​​​​​​a e i o u y ø ɛ ə’.

The short versions of ​​​​​e i o u y ø’ are reduced to /ɛ ɪ ɔ ʊ ʏ œ/. Since this is predictable, this is not expressed in spelling.

Stress is traditionally not written in German. But it does have a big influence on vowel length -- so much so that for some native speakers, it is not always clear whether a vowel is long or short in an unstressed syllable, and also dialects may differ, particularly for loans (most inherited Germanic unstressed vowels became ə anyway). Half-long vowels are sometimes proposed for German unstressed syllables. Because of this, long vs. short distinction is only written in stressed syllables in this script, and because long seems to be the default for unstressed syllables except those with ə, short stressed vowels are marked with a dot diacritic. Compound words have primary and secondary stress, but this is not distinguished, because it's predictable, so any stressed short vowel will be dotted, also secondarily stressed ones.

This method of marking only the stressed /ɛ/ has some drawbacks: in some words, there will be no marking anymore, despite being clear whether it is long or short, e.g., in 'Exemplar' eksemplar’ (stressed on the last syllable, hence no marking of short /ɛ/) or 'empfehlen' empfelən’. This is not perfect, but there probably is no perfection for unifying spelling that works for many people with slightly different dialects who would all say that they speak Standard German. For the sake of unification, we cannot spell in exact IPA.

Syllabic and vocalic endings -en, -em, -er are still written with the schwa, because there is some variation in pronunciation (some schwas are optional or may reemerge if another endings is added), and the traditional spelling with a schwa (spelled 'e' in Latin) seems to handle the phenomena well, e.g., 'lecker' [lɛkɐ] + -e [ə] becomes 'leckere' [lɛkəʁə] and may also be pronounced [lɛkʁə].

Diphtongs are roughly ai, au, oi and ei (in loans) and are written just like that, without dots: ​​​ai au oi ei’. For /ai/, no distinction between 'ei', 'ai', 'ay', 'ey' is made like in German Latin script spelling. Dialectal words may have another diphthong: 'ui', which is then spelled ui’, as in 'Gewürzluike' gəvy̱rtsluikə’. There is generally no distinction between diphthongs vs. two vowels (except maybe very rarely in loans in careful or educated pronunciation). Non-syllabic 'i' before a vowel (mainly in loans) is written as j: operatsjon’.

Nasal vowels in loan words, which are usually not pronounced nasalised except in educated pronunciation, are written as pronounced: with ŋ’. There is some dialectal variation of whether the vowel is lowered. The non-lowered vowel is used in spelling, just as in Latin script: 'Orange' ora̱ŋʒə’, not oro̱ŋʒə’. Educated pronunciation of these originally nasal vowels may still be nasal -- this is just a proposal for spelling in Ukaliq script, not a proposal for pronunciation changes.

For consonants, the situation is easier in German: ​​​​​​​​​​​​​​​​​​b p d t g k pf ts tʃ dʒ f v s z ʃ ʒ m n l’ are written just like that (affricates are used because they are available, and because German does have them). Because it's available, ŋ is used if appropriate (e.g., ŋk’).

Latin script 'ng' is usually just 'ŋ', i.e., no plosive is usually pronounced if a vowels follows (though at ends of words, a 'k' may be pronounced in some words, but this is free variation and non-phonemic and, therefore, not written). If a plosive is pronounced (in loans), it is written: 'Bingo' = bi̱ŋgo’, 'Menge' = me̱ŋə’, 'Ring' = ri̱ŋ’, 'Schrank' = ʃra̱ŋk’.

x, χ and ç are all written using x’, because the distinction is non-phonemic in all but very few cases (e.g., 'Frauchen' = frauxən’).

If still visible/audible, then the underlying morphophonetic sound is written, even if it surfaces e.g., unvoiced. 'Hund' is written with d’ despite being pronounced with t’, because in plural, d’ reemerges. But 'und' is spelled unt’, because no 'd' is visible/audible anymore.

The only rhotic is spelled r’ because there is dialectal variation for the exact pronunciation (ʁ, r, ɾ, or ɹ), and thus not phonemic.

Within morphemes, epenthesis, devoicing, and homorganic adjustment are represented, but across morphemes, they are not: 'Senf' ze̱mpf’, 'in Bonn' [im bɔn] ​in bo̱n’.

'h' is only written stem initially, because it is otherwise mute: 'Hase' hazə’, 'sehe' zeə’.

When a /ts/ appears from /t/ + ending /s/, the sequence is written ts’ instead of ts’. This is just like in Latin script, where ts’ is 'z' or 'tz', but 't' + 's' is 'ts': 'Schweiz' ʃvaits’ but 'rätst' rɛtst’ from 'raten' + 'st'.

To see how Ukaliq looks on a webpage that is completely converted to the script, you can click here to see one of my blog pages.

5.1.1 Text in German (SVG)

5.1.2 Text in German

​​​​​​​​

​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​...

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

5.2 Kalaallisut (West Greenlandic)

Modern Latin spelling is already close to the goal for this script, so not much is done differently.

This script has all necessary plosives, fricatives, nasals, liquids, so they are written as pronounced: ​​​​​​​​​​​​​​​​​p t k q j f v s x ɣ χ ʁ m n ŋ ɴ l ɬ’.

Long fricatives/liquids are devoiced, and although this may not be phonemic, it is actually important that a fricative, not a plosive, is the result of the gemination. Therefore, this is still written, just like in the Latin script (and consistently also for 'l' vs. 'ɬ'), so short fricatives/liquids are ​​​​v s ɣ ʁ l’ and long are ​​​​ff ss xx χχ ɬɬ’, e.g. 'qallunaaq' is qaɬɬuna̱a̱q’.

Since all four nasals can be prepresented, ​​​ŋ ɴ ŋŋ ɴɴ’ can all be properly distinguished, and are written as pronounced.

One complication is the one remaining cluster: ts, which is a long affricate. It is a complication, because a t followed by i is also pronounced ts, and this can be short and long, so there are two origins for long ts. These two uses should be spelled the same to avoid complications. To make this work, the short [ts] in (Latin script) 'ti' is also spelled ts’ (affricate) to represent the sound overtly, and the long version of it as well as the long cluster is spelled tsts’, with a double affricate letter.

In some Greenlandic dialects, the intervocalic 'q' is pronounced as [ʁ], e.g. 'killeqanngitsoq' may be pronounced [kɪɬ:ɜʁɛŋ:ɪtsɔq]. For the sake of a common spelling, this is not represented in spelling, but q’ is used: kiɬɬi̱qaŋŋitsu̱q’.

There are three phonemic vowels: ​​a i u’. They have a wide range of actual pronunciations, which are mostly non-phonemic. However, uvular consonants have a lasting impact on the preceding vowel, while the consonants themselves may not be pronounced anymore if they are first in a cluster. This is in contrast to several other Inuit languages. To express this in spelling, in Latin script, a 'r' is retained to represent a uvular original consonsant, but since it is not really pronounced (although sometimes the vowel sounds diphthongized into a uvular approximant), in this script, the long consonant is spelled as pronounced, and instead the vowel is dotted to show uvular pronunciation. This seems closer to how Greenlandic works today. E.g., 'oqarpoq' u̱qa̱ppu̱q’.

5.2.1 Text in Kalaallisut

​​​​​​​​​​​​​​​​​​

​​​​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

5.2.2 Text in Kalaallisut, Font 2

​​​​​​​​​​​​​​​​​​

​​​​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

5.2.3 Text in Kalaallisut, Font 3

​​​​​​​​​​​​​​​​​​

​​​​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

5.2.4 Text in Kalaallisut, Font 4

5.3 Inuktitut and Inuktun (North Greenlandic)

Inuktitut is closely related to Kalaallisut. Since there are many Inuktitut dialects, the goal of the spelling rules need to be clarified before starting to define a spelling. One way would be to spell each dialect just as it is pronounced. Another would be to try to unify dialects into a common spelling. The latter is more difficult, because tradeoffs need to be resolved. There will be a tradeoff between unified spelling and clarify of speling for individual speakers, which may not know how other dialects work, so they may need to learn more to be able to spell their dialect according to the common spelling.

This text will present a proposal to spell each dialect as it is written. A few notes for possible unifications of typical, systematic differences and variations are given.

A major difference between Kalaallisut and Inuktitut is in the vowels: as far as I know, there is no Inuktitut dialect (including Inuktun) that completely removes pronunciation of the uvulars that trigger the uvular vowel pronunciation. For this reason, the dot diacritic is not used for Inuktitut spelling: the vowels are just always spelled ​​a i u’. For dialects that retain a schwa, it can be written thus: ə’.

The consonants needed for a typical Inuktitut dialect are usually: ​​​​​​​​​​​​​p t k q j v s ɣ ʁ m n ŋ ɴ l’. Dialects may have more consonants, like ​ʃ ɻ’, which can then be written as pronounced. Also, Inuktun may need ʔ’ in clusters.

Intervocalic 'q' may be pronounced [χ] in some dialects and maybe [ʁ] in others. In a common spelling, this should not be represented, but q’ should be used. In a spelling for a single dialect, χ’ and ʁ’ may be used. Note, however, that this may cross word boundaries, so initial 'q' may change pronunciation on different contexts (e.g., 'una qimmiujuq' may be [una χɪm:ɪujoq]). For this reason, such a phonetic spelling rule may cause various complications (e.g., for lexical ordering of dictionaries), so morphological spelling with q’ may still be better suited.

Some dialects, including Inuktun (North Greenlandic), have a 'h' sound where other dialects have 's'. In a common or morphophonemic spelling, this should still be written s’, because the original 's' resurfaces in geminate and clusters. On the other hand, to be closer to the actual pronunciation, h’ could be used.

There is also variation in pronouncing 'ti' as 'tsi' or not as in Kalaallisut. In Inuktun, this is done, while in most other Inuktitut, it is usually 'ti' as in 'ati' ati’. For a unified spelling, 'ti' should be spelled ti’, otherwise, tsi’ may be more appropriate for the same reasons as in Kalaallisut: the long version may collapse with the normal ts’ cluster, so spelling tti’ vs. tstsi’ may cause confusion if both are pronounced the same.

Example: ​​​​​​ullukkut! tuŋŋasuɣissi anaanaup tupiŋanut. ittuaŋaujuŋa, una qimmiq.’ And my favorite greeting: aiŋai!

(This needs more elaboration, particularly for the geminates, which are not trivial, e.g., in Inuktun.)

5.4 Catalan

Catalan has basically the typical Romance seven vowels inherited from Vulgar Latin: /a ɛ e i o ɔ u/. In unstressed syllables and as a variant pronunciation in some dialects (non-phonemic), there is also ə’ in Western Catalan. This should not be expressed in spelling. In unstressed syllables in general, vowels are reduced to either five /a e i o u/ or three /ə i u/ or something in between.

Since there is dialectal variation on exact pronunciation, and because of the reduction (also with dialectal variation) in unstressed syllables, the vowel symbols proposed to be used are the five ​​​​a e i o u’, so that nothing much changes compared to Latin spelling. The dot diacritic can be used to mark stressed /ɛ ɔ/, so that the two are spelled ​ɛ̱ ɔ̱’. This way, the spelling does not change much when stress shifts: in the worst case, the dot is dropped. OTOH, all phonemic differences are expressed in all dialects. For some dialects, there is still too much distinction in the unstressed vowels -- this can be handled by either ignoring the problem, or by allowing those dialects to use only ​​a i u’ in unstressed syllables (some dialects do lower /ə/ to /a/, so not using ə’ is advised to avoid another vowel symbol).

Diphthongs are written using ​j w’ as second element, as there seems to be phonemic difference with two vowel sequences, and w’ and j’ both occur as consonants (but not contrasting with diphthong usae) in dialects and loan wirds.

The consonants seem more straightforward, and the following letters are used: ​​​​​​​​​​​​​​​​​​​m n ɲ ŋ p t k b d g w f z dz dʒ j l ʎ ɾ r’. /b v/ may both be spelled b’ as there is no phonemic difference. And /z s/ may both be spelled z’ for the same reason. ​ɾ r’ are distinguished intervocalically. In initial position, r’ can be used, otherwise, ɾ’ can be used. Geminates are written in dialects where they occur. ​dz dʒ’ are used also for the unvoiced affricates because devoicing is predictable, and geminated ​dz dʒ’ is probably also predictable and does not need to be spelled in this case.

In the following, a mix of Central and Western Catatan pronunciaton is used as some kind of idealised pronunciation template. This is what's probably best needed for defining a common spelling. Western is used for the vowels, because it retains five vowels in unstressed syllables, and Central for the consonants, because it reflects the majority pronunciation proposed for spelling here. If in doubt, the same principle could be followed for spelling if there are dialectal differences: Western for vowels, Central for consonants.

LatinPronunciationUkaliq
gelʒɛlʒe̱l
gelatʒe'latʒelat
peraperapera
pererape'reraperera
peupɛu̯pe̱w
raigratʃradʒ
dotze'dodzedodze

5.5 Tirkunan

Let's do a conlang that I know well, and that is closely related to Catalan: Tirkunan.

Tirkunan has very simple needs for vowels: ​​​​a e i o u’ will be used in stressed syllables and ​​a i u’ in unstressed. Diphthongs, if they occur, are non-phonemic and are not marked.

Consonants are also straight-forward: ​​​​​​​​​​​​​​b d g p t k m n ŋ f v s l r j’. The only difference with Latin is that ŋ’ is used, which is phonemic at ends of stems, written 'ng' in Latin. There is no gemination. j’ is only used in monosyllabic words that start with /j/ (like ja’) and in names for writing disyllabic (and foreign) /iji/, like in kijiu’.

5.5.1 Text in Tirkunan

​​​​​​​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​

​​​​​​​​​​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​

​​​​​​

6 Dot Matrix

Since Ukaliq letter shapes originate from what a 7-segment display can do, I wondered how well other digital displays would work, particularly dot matrix displays. The simplest one for Latin script is the 5x7 pixel display. It is used with LEDs, but more often with VFDs and LCDs. Here's a single 5x7 dot matrix display with all pixels switched on:

Single 5x7 dot matrix displays do exist, but more often, displays come with a few columns and rows on a single LCD, typically 20 columns in 4 rows. Different models of such displays may have different letter and row spacing, but the most common seems to be 1 pixel of letter spacing and 1 or 2 pixels of row spacing. Here's a typical display with 20 columns and 4 rows:

Displaying main 7 segments well is the most important. The dot diacritic is a second class citizen, so we will not optimise for its nice appearance, but just put it somewhere. To make the 7 segments nice and clear, the whole size of the display will be used: the segments will occupy the outer ring of pixels, plus the center line. Furthermore, to get close to the proposed smooth shape of the letters, the top left and right top and bottom corner pixels will be switched off of there are adjacent segments switched on, to round the corner off a bit. The diacritic will go somewhere in the right bottom corner. This means that only the following pixels are used (of course, other font designers may use different pixels):

Some text in Kalaallisut:

7 Glyph Variants

This section is mainly for debugging and very low level documentation. It is a table of all stroke variants for all 7-segment patterns (plus dot, just to check that it's working).

Z
A
B
AB
C
AC
BC
ABC
D
AD
BD
ABD
CD
ACD
BCD
ABCD
E
AE
BE
ABE
CE
ACE
BCE
ABCE
DE
ADE
BDE
ABDE
CDE
ACDE
BCDE
ABCDE
F
AF
BF
ABF
CF
ACF
BCF
ABCF
DF
ADF
BDF
ABDF
CDF
ACDF
BCDF
ABCDF
EF
AEF
BEF
ABEF
CEF
ACEF
BCEF
ABCEF
DEF
ADEF
BDEF
ABDEF
CDEF
ACDEF
BCDEF
ABCDEF
G
AG
BG
ABG
CG
ACG
BCG
ABCG
DG
ADG
BDG
ABDG
CDG
ACDG
BCDG
ABCDG
EG
AEG
BEG
ABEG
CEG
ACEG
BCEG
ABCEG
DEG
ADEG
BDEG
ABDEG
CDEG
ACDEG
BCDEG
ABCDEG
FG
AFG
BFG
ABFG
CFG
ACFG
BCFG
ABCFG
DFG
ADFG
BDFG
ABDFG
CDFG
ACDFG
BCDFG
ABCDFG
EFG
AEFG
BEFG
ABEFG
CEFG
ACEFG
BCEFG
ABCEFG
DEFG
ADEFG
BDEFG
ABDEFG
CDEFG
ACDEFG
BCDEFG
ABCDEFG
H
AH
BH
ABH
CH
ACH
BCH
ABCH
DH
ADH
BDH
ABDH
CDH
ACDH
BCDH
ABCDH
EH
AEH
BEH
ABEH
CEH
ACEH
BCEH
ABCEH
DEH
ADEH
BDEH
ABDEH
CDEH
ACDEH
BCDEH
ABCDEH
FH
AFH
BFH
ABFH
CFH
ACFH
BCFH
ABCFH
DFH
ADFH
BDFH
ABDFH
CDFH
ACDFH
BCDFH
ABCDFH
EFH
AEFH
BEFH
ABEFH
CEFH
ACEFH
BCEFH
ABCEFH
DEFH
ADEFH
BDEFH
ABDEFH
CDEFH
ACDEFH
BCDEFH
ABCDEFH
GH
AGH
BGH
ABGH
CGH
ACGH
BCGH
ABCGH
DGH
ADGH
BDGH
ABDGH
CDGH
ACDGH
BCDGH
ABCDGH
EGH
AEGH
BEGH
ABEGH
CEGH
ACEGH
BCEGH
ABCEGH
DEGH
ADEGH
BDEGH
ABDEGH
CDEGH
ACDEGH
BCDEGH
ABCDEGH
FGH
AFGH
BFGH
ABFGH
CFGH
ACFGH
BCFGH
ABCFGH
DFGH
ADFGH
BDFGH
ABDFGH
CDFGH
ACDFGH
BCDFGH
ABCDFGH
EFGH
AEFGH
BEFGH
ABEFGH
CEFGH
ACEFGH
BCEFGH
ABCEFGH
DEFGH
ADEFGH
BDEFGH
ABDEFGH
CDEFGH
ACDEFGH
BCDEFGH
ABCDEFGH

8 Download & License

The files in this directory, particularly the Ukaliq font files, are licensed under the terms of the Creative Commons Attribution NonCommercial (CC-BY-NC) 4.0 license (local text file copy). This means you can use this work non-commercially, but not commercially. Do not hesitate to contact me if you want to do more, or if you have any questions or suggestions or remarks: ukaliq@theiling.de .

To download everything in a single file, including fonts and data tables, use the following link:

ukaliq_all.zip

The .zip contains the Unicode tables, the fonts files, this HTML file, logo images, and a Perl module for handling Ukaliq transliteration.