My current process

My Current Process

When I lost free webspace, my choice of contract gave me a lot more space and I have attempted to automate the process. As before, I use scripts to list the individual codepoints ('glyphs') in a font, and then to produce lists of what these cover.

The process has some errors of omission, but mostly it seems to work adequately. Please note that AFAICS all the fonts here have libre licenses with the exceptions of the Luxi fonts first made available with XFree86 and (possibly) the IPA Fonts which respectively cannot, and perhaps cannot, be modified.

Unfortunately, despite the partial automation the process still requires a lot of judgement, particularly when deciding which languages are supported, and I now mention so many fonts that is not posssible for me to relgularly look for, and review, newer releases.

Please be aware that I only care about current languages and scripts. I did waste some weeks looking at all of the ancient writing systems in Noto-hinted, but they were a distraction from rendering the text on current web pages. I have a rule of thumb - if I can find pastable HTML text of Article 1 of the UDHR, I generally treat the writing system as current.

What I now do is two-fold :

First I take the list of codepoints and process that to generate a text file containing the glyphs, ordered by their block in unicode and listed in blocks of 32 glyphs (with spaces for absent glyphs). I then open that file with libreoffice writer, change everything to the font I am reporting (provided it covers the English text and digits), add footers and ensure there is a block heading at the start of each new page. Not fully automated, but not too tedious. I then export this to a PDF to show what the font contains.

For some of the fonts, particularly those from Noto, the font only contains glyphs for one script. For these I usually use Liberation Mono for the English text and the codepoint numbers, because it is my default in libreoffice writer when I open a formatted .txt file (which is what I initially create during this process).

Secondly I use xelatex (from texlive) to list some alphabets (chosen to contain a lot of variant accents and diacriticals, or extra letters, for latin and cyrillic alphabets), to show Article 1 of the UDHR in various languages, and also in other scripts, and to show a few other symbols which might be useful. Then, I edit that to produce a smaller file without the many things it does not contain: I may show some alphabets where some variant glyphs are missing (e.g. Catalan l-with-middle-dot, or Dutch IJ digraph).

Please be aware that when xelatex typesets a monospace font it can add micro-spacing to justify the text.

I try to trim the quotation mark variants to only what is supported (except I always show ASCII single and double quotes even if they are not supported), but for some other symbols I will show everything including spaces or other marks where the symbol is not present. However, if a font only covers one script and does not cover ASCII then I will create a separate smaller file without quotation marks and symbols.

I do not normally make any attempt to improve the layout of the Article 1 text in latin or cyrillic alphabets - in some fonts text may extend into the right margin. Nor do I normally reset my English text comments where the hyphenation looks odd.

Why I mention particular languages

If they are present, the European latin alphabets are used for the following purposes:

Azeri : schwa as well as dotted/dotless i and g with breve (and yes, some geographers consider that Azerbaijan is in Europe).
Catalan : precomposed l with middle dot, separate middle dot (the precomposed versions are uncommon and deprecated, so I show catalan even if those are missing)
Czech : carons, particularly on e, n, r
Danish : ae, o with stroke, a with ring
Dutch : the ij digraph (again, uncommon)
French : the usual accents, c-cedilla, ae ligature, oe ligature also n-tilde
German : umlauts on a,o,u and sharp-s (ß)
Hungarian : double acutes on o and u
Icelandic : eth, thorn, ae
Italian : a few more accents
Latvian : cedillas on other letters, macrons
Lithuanian : dotted e, i and u with ogonek
Maltese : c and g with dot above, h-stroke
Northern Sami : d and t with stroke, eng
Polish : ogoneks on a,e,o, acute on c,s,z, l-stroke, dotted z
Portuguese : tilde on a,o
Romanian : a with breve, s and t with comma below / the incorrect s and t with circumflex
Serbo-Croat : d-stroke, digraphs for d-z-caron, lj, nj (the digraphs are uncommon so again I show the alphabet even where they are not present).
Slovenian : carons on (only) c,s,z - at the moment I don't think I have any fonts which can do Slovenian but not Czech nor Serbo-Croat.
Spanish : n tilde. I could have dropped this since I cover that in French and no fotns support Spanish but not French - too late!
Turkish : dotted/dotless i, g with breve [ only shown if azeri is not supported ]
Welsh : accents and diaeresis on w and y

NB - I use 'serbo-croat' to cover Bosnian, Croatian, Montenegran, Serbian. For other writing systems I follow the Unicode naming, even where the names have now fallen into disfavour.

Non-European languages using variations of latin alphabets (various African alphabets, also Vietnamese) are covered separately at the end of the PDF languages files for those fonts which support them.

Similarly, the cyrillic alphabets are used for the following purposes:

Abkhazian : ghe, ka, pe, te, xa with descender (or ghe, pe with old middle hook), ka with stroke, abkhazian chei, lowercase schwa. Where a font does not include the ge and pe versions with descender, I now consider that the font is not suitable - enough time has elapsed. The Article 1 text I copied included the old middle hook, but while checking for updates of UDHR translations in 2023 I updated that character.
Adyghe : palochka
Kazakh : straight u, ghe with stroke, barred o, straight u with stroke, en with descender. The Article 1 text I pasted has non-breaking hyphens, in some fonts those are not present. I have attempted to correct this by replacing them with dashes and attempting to retain the formatting from the default glyph with often overlong lines, but in one or two cases I have reformatted the text so that the word is at the start of a line to avoid xelatex breaking it.
Macedonian : gje, kje (ghe and ka with acute), lje, nje and ie and i with grave.
Serbo-Croat : dje, tshe, lje, nje.
Tatar : schwa, en and zhe with descenders. When I started doing this I apparently pasted an uppercase zhe with descender in place of the intended lowercase letter, it has taken me until October 2023 to notice that.
Ukrainian : ghe with upturn, ukrainian ie, i, yi.

About my classifications and choices of styles

I concentrate on the Regular style shown in fontconfig - many of these fonts have other weights such as Bold. I am now starting to also cover italic faces where available, fuller details of italic coverage will be added when I create the updated main lipsum files. A few fonts, particularly CJK fonts, do not have a font labelled as Regular. For those I take the most-regular weight if there is more than one.

For latin fonts, wikipedia indicates that Serif and Sans-Serif fonts are traditionally classified in several types. For serif the main classes for normal text are Old-Style, Transitional, and Modern (or 'Didone').

For sans-serif the common fonts are typically Neo-Grotesque or Humanist, but a few are Geometric. I have now found some 'Grotesque' fonts, but for simplicity I will batch these with the Neo-Grotesques when I eventually rework the Lorem Ipsum examples. Because I am not a typographer, where I could not find any web pages identifying the classification of a font, or of the font on which it is based, I have made a guess.

For monospace I have currently divided it into Sans and Serif - many 'Sans' monospace fonts include some serifs to distinguish certain things, e.g. 1,I,l (digit one, uppercaseI, lowercase l).

For cyrillic and greek fonts I am unsure if the same distinctions are appropriate.

2023 Revisions to the languages files

When I started to look at updating, or adding, fonts in 2023 I discovered the the 'pilcrow' codepoint I had been showing (or, more commonly, indicating as not present) was in fact a 'reversed pilcrow' (unicode seems to enjoy adding obscurities). I decided to revise the languages files for current vesions of all 'main' fonts (i.e. those covering at least several languages using latin alphabets), but not CJK fonts .

I took the opportunity to show all 26 letters in the italic font, where available.

I also aim to show the version (if known) or else the date of the font in the languages file, and also in the glyphs file where the font is either new to me, or a newer version.

Because I am using xelatex to render example text, it amuses me to show Small Capitals where they are available. Very occasionally a font has a separate SC file, in those cases I can look to see which codepoints are available, and compare those to the main font. More usually, the Small Caps are in a Private Use area, or unmapped, I could not determine exactly which codepoints were available. In October 2023 I got some suggestions for identifying them and decided to use otfinfo. I'm now working through the Transitional Serif fonts doing this, and also reporting on the available font weights, and coverage of italics, and if small caps italics exist. I intend to work through the (latin,cyrillic) fonts one style at a time and then update the details for that batch. However, for a few fonts such as SourceSerif4 I am unable to find any glyph name information using otfinfo.

I have also updated one or two of the texts for Article 1 of the Universal declaration of Human Rights, and made some corrections to some of my example cyrillic alphabets.

With xelatex from TeXLive 2023 I notice that monospaced fonts get added spaces to justify Article 1, at least in latin languages, sometimes letters run closer together than I was expecting (e.g. precomposed catalan l with middle for followed by l) and the cyrillic Article 1 text often overflows into the right margin for less-common languages. And the dot-below diacriticals in Yoruba seem to be a mix of '+' (I have seen that on the web in sites which cover the Yoruba language)) and 'dot', and/or a dot to the side in some fonts.

Perhaps I should mention that I try not to break up the details for a subsection across multiple pages, so some of the pages might be quite short. I would like to keep both greek variants (monotonic, polytonic) on a page, but often the text is too long so I put polytonic on a separate page. And for the Pan-Nigerian alphabet I allow some languages to overflow onto a new page rahter than trying to fit all the supported examples on one page. On some of the old files which I have not updated, the paging is rather more random with items split across pages.