– Typee Documentation –

3 – Typee Language Description

3.3 – Literals and Constants

This is a more formal description of the construction of literals – i.e. howto form constant values in Typee.

3.3.1 Boolean Literals

Booleans literals are built-in constant names: True and False.
To ease typing as well as to take into account programming habits, those language boolean literals may be written true and false.

The EBNF (Extended Backus-Naur Form) expression for boolean literals is

<boolean> ::= <TRUE>  | <FALSE>
<TRUE>    ::= 'True'  | 'true'
<FALSE>   ::= 'False' | 'false'

3.3.2 Character Literals

Whatever the type of a character literal (either char or char16), it is always written with the same syntax. A character literal is always embedded within single quotes or double quotes. in between, the literal is defined this way:

<single char>        ::= "'" (<any escaped char> |
                              <any source character except '\', newline 
                                  or single-quote>) "'"  |
                         '"' (<any escaped char> |
                              <any source character except '\', newline 
                                  or double-quote>) '"'

<any escaped char>   ::= '\' (<any source character except newline or '0'> |
                              "0" <octal or hexa char>)

<octal or hexa char> ::= <octal char> <octal char> <octal char> |
                         ('x'|'X') <hexa char> <hexa char>
                             [<hexa char> <hexa char>]

<hexa char>          ::= '0'...'9'  |  'A'...'F'  |  'a'...'f'
<octal char>         ::= '0'...'7'

The type of a character literal will be inferred from its content and its environment. If it is assigned to a typed variable, it will have to conform the type of this variable. For instance, if a char variable containing 8-bits characters is assigned with a character literal, this literal will be considered to be of type char. No char16 literal will be assignable to this variable (this will detect a type checking error at translation time). Assigning a char16 variable, the inferred type of the character literal will be char16.

Meanwhile, according to its content, a character literal will be considered to be either 8-bits or 16-bits character according to the variable it is assigned to, while its inferred type will be char16 as soon as it contains a 16-bits character.

Any 8-bits character can be specified with octal notation \0ooo with o representing any octal digit (from 0 to 7). 16-bits characters can be specified, in character literals, with escape sequence \0xHHHH where H is any hexa-decimal character (i.e. any character within “0123456789abcdefABCDEF”). At any time a character literal will be specified with a four digits hexadecimal escape sequence, it will be considered as a char16 character literal and will be assignable to only a char16 variable. A type checking error will be detected at compile time if the assigned variable is not of this type.

Casting is also a way to specify the type of a character literal. Casting is a concept that allows the transformation of the type of an object into another type. Syntax is simple, and for character literals this would be

char16( 'T' );

Caution there: do not attempt to cast a 16-bits character literal into an 8-bits one. A type checking error would be detected at translation time.

char( '\0x10ff' )  // --> type error !

Finally, built-in library String provides functions and methods to specify the type of a character or of a string literal. This way, programmer can force the type of a character literal at translation time. See the documentation of this library.

The mostly used or known escape sequences are

\\  backslash (i.e. "\")
\'  single quote          (i.e. "'", to be used in literal '\'')
\"  double quote          (i.e. '"', to be used in literal "\"")
\n  ASCII linefeed        (i.e. LF, often misnamed 'newline')
\r  ASCII carriage return (i.e. CR, which combined with LF is a true newline in some Operating Systems)
\t  ASCII horizontal tab  (i.e. TAB)
\v  ASCII vertical tab    (i.e. VTAB, rarely used today)
\b  ASCII backspace       (i.e. BS, not that much used today)
\a  ASCII bell            (i.e. BEL, or a for alarm)

3.3.3 String Literals

Well, string literals are the pendant of character literals. Any string is a series of characters. A string literal is a literal which contains a series of characters.

As for character literals, a string literal starts and ends with either a single quotes or a double quotes, with no mixing of both types of quotes. They may contain escaped sequences also.

<string>        ::= <single string> (<single string>)*

<single string> ::= "'" (<any escaped char> |
                              <any source character except '\', newline 
                                  or single-quote>)* "'"  |
                     '"' (<any escaped char> |
                          <any source character except '\', newline 
                              or double-quote>)* '"'

According to previous specification, string literals may be split on many code lines, coded as a succession of single strings separated by spaces, tabs or newlines:

const str16 k_text = "This is a "
"multi-lines string literal.\n"
'This kind of notation is useful when ' "specifying very long string constants.";

The type of a string literal is inferred from its environment as well as from its content. As for character literals, af a string literal is assigned to a typed variable, it will have to conform the type of this variable. For instance, if an str variable containing 8-bits characters is assigned with a string literal, this literal will be considered to be of type str. No str16 literal will be assignable to this variable (this will detect a type checking error at translation time). Assigning a str16 variable, the inferred type of the string literal will be str16.

Meanwhile, according to its content, a string literal will be considered to be either 8-bits or 16-bits string according to the variable it is assigned to, while its inferred type will be str16 as soon as it contains any 16-bits character.

Casting is also a way to specify the type of a string literal. Casting is a concept that allows the transformation of the type of an object into another type. Syntax is simple, and for string literals this would be

str16( 'This is now a 16-bits string literal!' );

Caution there: do not attempt to cast a 16-bits string literal into an 8-bits one. A type checking error would be detected at translation time.

The truly implemented rule for string literals is the next one which is a little bit more complex than the one we have previously shown:

<string> ::= <single string> (<single string>)* 
                ('.' <identifier> <function arguments list>)*

This specification shows that in Typee strings are associated with built-in functions that can be directly applied to strings as if strings where instances of a dedicated class. This is true for string variables and for string literals also. We will study this later. For now, let’s just be aware of this and see a short example of it:

print( 'Let\'s try something'.to_upper().replace('E','ee') );

Once translated to some programming language, maybe compiled and eventually run, this should print

LeeT'S TRY SOMeeTHING

strings vs. characters

Strings are not characters. They are series of. Characters are not strings. They are part of. Then, it is NOT allowed to assign strings with caracters and characters with strings. Meanwhile, any indexed string (indexed, not sliced) IS a character. Next code is legal:

str16 s = "abcd";
s[2] = 'E';
print( s )

and would print abEd. Moreover, s[i], for any i in 0, 1, 2 and 3, is a char16 in above code, and literal 'E' is considered to be of type char16.

Finally, built-in library String provides functions and methods to specify the type of a character or of a string literal. This way, programmer can force the type of a string literal at translation time. See the documentation of this library.

3.3.4 Integer Literals

Any programmer knows how to write integer literals. They start with a digit and only contain digits. Ok. But what about digits?
In Typee, integer numbers are accepted in base 2, 8, 10 and 16 as for nearly any other computer language.

The formal specification of the related grammar rules is:

<integer number>     ::= '1'...'9' [<decimal part>]
                      |  '0' <octal hexa binary>

<decimal part>       ::= <num_char> (['_'] <num_char>)*

<octal hexa binary>  ::= <octal number>  |
                         ('b' | 'B') <binary number>  |
                         ('x' | 'X') <hexadecimal number>

<hexa char>          ::= <num char>  |  'A'...'F'  |  'a'...'f'

<hexadecimal number> ::= <hexa char> (['_'] <hexa char>)*
<num char>           ::= '0'...'9'
<octal char>         ::= '0'...'7'
<octal number>       ::= <octal char> (['_'] <octal char>)*

As with (at least) Python, integer literals accept the underscore in between digits, never as the first or the last character of the literal. They may be used as a convenience to make literals easier to read.

Examples of legal integer literals are:

0  12  345  6_789
0b1010_0110  0777  0x0123_4567_89ab_CDEF

Please notice that integer literals are never signed. Signed integers may be used in expressions and assignments, but the minus sign preceeding any integer literal is the unary operator -, in Typee.

Integer literals always implicitely conform to the largest type used in the expression they appear in, or conform to the type of the variable they are assigned to when they appear alone in the right side of this assignment, or are considered as being of the smallest integer size (int-8, 16, 32 or 64) when assigned alone to an auto-typed variable (which, then, gets this type as its current type).

3.3.5 Float Literals

In Typee, float literals are very classic. Either coded on 32 or 64 bits, their types are named float32 and float64.

The formal description of their grammar rules is:

<float literal>     ::= <num char> [<decimal part>] 
                            [<fraction part> [<exponent part>]]

<decimal part>      ::= <num_char> (['_'] <num_char>)*
<fraction part>     ::= '.' <decimal part>
<exponent part>     ::= ('e' | 'E') ['+' | '-'] <decimal part>
<num char>          ::= '0'...'9'

As with (at least) Python, float literals accept the underscore in between digits, never as the first or the last character of the part of the literal they belong to. They may be used as a convenience to make literals easier to read.

Examples of legal float literals are:

0   0.  0.0   1.23  4.567_89
1_023.456_789
1.2e3   1.2e+3   1.2E-3
1E23    123e-2

Please notice that float literals are never signed. Signed floats may be used in expressions and assignments, but the minus sign preceeding any float literal is the unary operator -, in Typee.

Float literals always implicitely conform to the largest float type used in the expression they appear in, or conform to the type of the variable they are assigned to when they appear alone in the right side of this assignment, or are considered as being of the smallest float size (float- 32 or 64) when assigned alone to an auto-typed variable (which, then, gets this type as its current type).

Next section formerly explains identifiers naming and the construction of expressions.

< previous (3.2 types) | (3.4 expressions) next >