Character set encoding and rendering — ASCII/Unicode and code page

Character encoding: Source: https://www.digital-detective.net/character-encoding-quick-primer/

Character encoding — Why ?

If you use anything other than the most basic English text, people may not be able to read the content you create unless you say what character encoding you used.

What is character encoding?

Words and sentences in text are created from characters. Examples of characters include the Latin letter á or the Chinese ideograph or the Devanagari character . For computer to refer to characters in an unambiguous way, each character is associated with a number, called a code point. A set of characters that are needed for a specific purpose (typically to represent a language) are grouped into a character set.

character set encoding and rendering

Fonts files and Glyphs mapping with code points:

A font is a collection of glyph definitions, ie. definitions of the shapes used to display characters. Once your browser or app has worked out what characters it is dealing with, it will then look in the font for glyphs it can use to display or print those characters. (Of course, if the encoding information was wrong, it will be looking up glyphs for the wrong characters.)

ASCII

Most of us use ASCII by default and are unaware of what it exactly it means and how it is related to displaying encoded characters. American Standard Code for Information Interchange (ASCII) is nothing but a character encoding system based on the English alphabet (the numbers 0–9, the letters a-z and A-Z, some basic punctuation symbols).

Code page

‘Code page’ is a mapping of values for a character set (for encoding a particular language). It all started with IBM assigning unique numbers to characters in EBCDIC encoding scheme for mainframe systems, later every system vendors used their own scheme for characters encoding.
We can also view ‘code page’ as graphical glyph set used for rendering an encoded character. These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones.

Unicode and Non-Unicode Encoding of Indic Languages

To properly view any language encoding schemes in the browser/notepad or any text processing application following things are mandatory:

  • Font files for the OS/Application of the choice

More details: