– Typee Documentation –

3 – Typee Language Description

3.4 – Identifiers naming and Expressions

Let’s first specify the naming conventions for identifiers names. Then, we will be able to fully specify expressions constructions in Typee.

3.4.1 Identifiers naming conventions

Identifiers in Typee are specified as usual for so many computer languages. An identifier begins with either a letter or an underscore and contains then a series of letters, decimal digits and underscores. Identifiers are case sensitive in Typee. Programmers should be cautious about this in the case when the targeted programming language would be case unsensitive (well, this should not be the case for first targets, i.e. Python, C++ and Java). FInally, there is no limitation on signs number in an identifier. Here again, programmers should be cautious since some programming languages do not accept very long identifiers.

The EBNF formal specification of identifiers in Typee is described here below:

<identifier>     ::= (<alpha char> | '_') (<alpha num char> | '_')*

<alpha char>     ::= 'A'...'Z'  |  'a'...'z'
<alpha num char> ::= <alpha char>   |  <num char>
<num char>       ::= '0'...'9'

Should some targeted computer language not accept identifiers formed as this, programmers are encouraged to use for the naming of their identifiers only accepted signs and no other ones. Currently, Typee translators do not modify the identifiers used by your side, but this could be the case in some future to help translation in some programming languages that are not yet targeted.

What are identifiers for? In Typee, they are used to name constants, variables, functions, classes and methods. Once this and their naming conventions are understood, we can explain how expressions are constructed.

3.4.2 References

References are a convenient way to refer to a same entity or object with different identifiers. There will be cases when this will really be valuable.

In Typee, references are assigned with token @ and can only be applied to identifiers.

<reference>   ::= '@' <dotted name>
<dotted name> ::= <identifier> ('.' <identifier>)*

Examples:

str s = "abc";
? s_ref = @s;       // s_ref is now of type str
print( s, s_ref );  // prints: abc abc
s_ref[1] = B;
print( s, s_ref );  // prints: aBc aBc

3.4.3 Expressions – Introduction

There are many kinds of expressions in every computer language. By no way is Typee different from them all. The formal specification, even in EBNF, of expressions may be very complex. We will split the associated grammar rules into different kinds of expressions.

A generic specification of expressions in Typee, fromerly expressed in its EBNF form, is:

<expression> ::= <artihmetic expression> |
                 <boolean expression> |
                 <string expression> |
                 <special assignment expression>

Most types of Typee expressions are Python-like. In next sub-section, we specify arithmetic expressions. We then specify boolean expressions, string expressions and special assignment expressions. Comprehensions are explained in the sub-section after. Attribute reference, subscription, slicing and function (and method) call expressions are specified next. Finally, we specify unnamed functions. For each kind of expression, we specify also what are the built-in operators, their meaning and their precedence over others.

3.4.4 Arithmetic Expressions

Arithmetic expressions are expressions that finally evaluate as an arithmetic value, i.e. integer or float value. They mix entities which are of an integer or a float type, arithmetic operators and enclosing parenthesis. Those entities may be integer literals, float literals, identifiers of variables or identifiers of constant values of an interger type or a float type, and functions or methods call that return an integer value or a float value.
Furthermore, unary operators can be applied to identifiers of integer and float variables, to literals, to identifiers of constant values and to functions and methods calls. But unary operators -- and ++ can only be applied to identifiers of integer and float variables. Both operators are in-place modifiers and therefore cannot be applied to these listed entities.

Remember, built-in integer and float types in Typee are:

integer types     float types
-------------     -----------
int8 , uint8
int16, uint16
int32, uint32     float32
int64, uint64     float64

and are associated with the generic scalar types:

_int_, _uint_, _float and _numeric_
3.4.4.1 Operators precedence

Arithmetic operators get some precedence between themselves. From top to bottom precedence, the artihmetic operators in Typee are:

<op_unary>  ::= '++'  |  '--'  |  '+'  |  '-'  |  '~'  // higher precedence
<op_power>  ::= '**'  |  '^^'
<op_mul>    ::= '*'   |  '/'  |  '%'  |
                '@'   |  '><'  |  '!!'  |  '::'
<op_add>    ::= '+'   |  '-'
<op_shift>  ::= '<<'  |  '<<<'  | '>>'  |  '>>>'
<op_bitand> ::= '|'
<op_bitxor> ::= '^' 
<op_bitor>  ::= '&'                                    // lower precedence

Enclosing parenthesis, ( and ) finally take precedence over all the arithmetic and unary operators.

The EBNF formal specification of arithmetic expressions in Typee is this:

<arithmetic expr>       ::= <bitor expr>

<bitor expr>            ::= <bitxor expr> ( '|' <bitxor expr> )*
<bitxor expr>           ::= <bitand expr> [ '^' <bitand expr> ]
<bitand expr>           ::= <shift expr> ( '&' <shift expr> )*
<shift expr>            ::= <term expr>
                               ( ('<<' | '<<<' | '>>' | '>>>') <term expr> )*
<term expr>             ::= <term>  (<op_add>  <term>)*
<term>                  ::= <factor> (<op_mul> <factor>)*
<factor>                ::= <atom element>  (<op_power> <unary expr>)*

<atom element>          ::= <atom>  |  <dotted name> (<scalar type casting> |
                                                          <function call>)*
<atom>                  ::= ['++' | '--']  <dotted name>  ['++' | '--']  |
                          <parenthesis expr>  |  <scalar>
<dotted name>           ::= <identifier>  ('.' <identifier>)*
<ellipsis>              ::= '...'
<function call>         ::= '(' <function call args> ')'
<function call args>    ::= [ ( (<arithmetic expr> |
                                 <dotted name> (<function call>)* |
                                 <unnamed function> <function call>)
                                (',' (<arithmetic expr>  |
                                      <dotted name> (<function call>)*  |
                                      <unnamed function> <function call>)*
                                [',' <ellipsis> <identifier>)] )  |
                             <for comprehension> ]
<parenthesis expr>      ::= '(' <arithemetic expr> (', <arithemetic expr>)* ')'
<scalar>                ::= <integer literal>  |  <float literal>
<scalar type casting>   ::= ['const'] <scalar type> '(' <arithmetic expr> ')'
<unary expr>            ::= ['+' | '-' | '~' | '#']  <factor>  |
                                  '#' 
<unnamed function>      ::= 'unnamed'  |  'lambda'

The attentive reader will notice an ambiguity in this specification of Typee grammar, when parsing a ‘(‘. We have simplified the grammar specification here for an easier-understanding purpose. Implementing Typee with this grammar description would lead to no a priori knowledge of what grammar rule should be processed on ‘(‘ succeeding some identifier: would this be a type casting or a function (or method) call? The truly formal description of Typee grammar alleviates this ambiguity.

For simplification purpose also, we have ommitted all templated forms of operators, functions and methods calls. Templating will be seen later in this documentation, with its formal specification also.
We do not provide here the specification of the ‘for comprehension’ grammar rule, which is described in the next numbered sub-section.

String expressions are defined a little bit later in this page. The hash operator which can be applied to strings returns an integer value: the integer hashing value of the string. For simplification purpose, the described rules about the hash operator are a priori ambiguous, here again. According to previous rules, when parsing a hash what should be the next processed rule: the factor or the string expression one? This is not addressed here but the truly formal specification of Typee grammar alleviates this a priori ambiguity.

3.4.4.2 Arithmetic Operators definitions

They are very usual, as in most programming languages.

Unary operators

+ is the unary plus. Numbers are put to positive and values keep their internal sign after this sign. This is the default setting when no unary operator is used.
- is the unary minus. Numbers are put to negative and values change their positive/negative sign after this sign.
~ is the unary negation of binary values. When put after this sign, values get every of their bits changing of value: 0 becomes 1 and 1 becomes 0.
# is the hashing operator. When applied to a literal, it delivers the hash value of this literal. This operator is available also for other types of entities, such as strings and instantiations of classes.
++ is the increment operator. Its built-in form can only be applied to identifiers of integer variables. When put before the identifier, it pre-increments (i.e. it internally adds 1 to) the value of the variable and then returns the new value of this variable. When put after the identifier, it returns the current value of the variable and then post-increments its value (i.e. it internally adds 1 to it) which get modified as well.
-- is the decrement operator. Its built-in form can only be applied to identifiers of integer variables. When put before the identifier, it pre-decrements (i.e. it internally subtracts 1 to) the value of the variable and then returns the new value of this variable. When put after the identifier, it returns the current value of the variable and then post-decrements its value (i.e. it internally subtracts 1 to it) which get modified as well.

Bitwise binary operators

& is the bitwise and operator.
| is the bitwise or operator.
^ is the bitwise xor operator.

Examples:

0b0011& 0b0101 --> 0b0001
0b0011 | 0b0101 --> 0b0111
0b0011 ^ 0b0101 --> 0b1010

Bit-shift binary operators

<< is the signed left bit-shift operator.
<<< is the unsigned left bit-shift operator.
>> is the signed right bit-shift operator.
>>> is the unsigned right bit-shift operator.

The bit-shift operators shift bits of a value, to the left or to the right, inserting ‘0’ in the newly empty bits of the value, except for the signed right-shift and left-shift which preserve the bit sign (the leftmost bit of the value).

Examples:

0b1010_0011 << 3 --> 0b1001_1000
0b1010_0011 <<< 3 --> 0b0001_1000
0b1010_1100 >> 3 --> 0b1000_0101
0b1010_1100 >>> 3 --> 0b0001_0101

Other binary operators

+ is the addition operator.
- is the subtraction operator.
* is the multiplication operator.
/ is the division operator. Division between integer values evaluates to an integer value, truncated to the low integer value. Division between float values evaluates to a float value. When mixing integer and float values, the integer values are promoted to floats and the division evaluates to a float value.
% is the modulo operator.
** is the power operator.
^^ is the power operator also. Both notations are valid in Typee.

Typee specifies five new and undefined binary operators. Those five undefined operators are specified but not defined as are built-in operators. They can be defined at the convenience of programmers who can use them for any of their specific needs (e.g. matrix multiplication, vector dot product, etc.)

These operators are: @, ><, !!, :: and ??.

Their precedence is specified in Typee language description and is the same as for multiplicative operators (i.e. *, / and %).

Programmers are free to overwrite the definition of built-in operators, as well as they are frre to define the undefined operators. Operators overwriting will be explained in a later section.

3.4.4.3 Type promotion

When using values or entities of different types in an arithmetic expression, type promotion takes place. If integer and float types are mixed, integer types are promoted to float and the finally evaluated value of the arithmetic expression is a float. Furthermore, type promotion always takes place when mixing types of different lengths (i.e. 8-, 16-, 32- and 64- bits encoded values). Every value used in the arithmetic expression gets its type promoted to the largest type used in the expression and the finally returned value is of this largest encountered type.

3.4.5 Boolean Expressions

Boolean expressions are used to evaluate conditions. Their results are booleans, i.e. either True or False. They may embed arithmetic expressions and string expressions and operate with specific operators: boolean operators and comparison operators.

The formal specification (EBNF one) is quite complex but examples make it simple to understand.

<boolean expression> ::= <and test>  [ 'or' <and test> ]

<and test>           ::= <not test>  [ 'and' <not test> ]
<not test>           ::= 'not' <not test>  |  <comparison>

<comparison>         ::= <arithmetic expression> 
                             (<op_comp> <arithmetic expression> |
                              <op_cont> <container>) |
                         <string expression> 
                             (<op_comp> <string expression> |
                              <op_cont> <container>)

<container>          ::= <dotted name> | <list> | <set> | <map> | <array>

Lists, Sets, Maps and Arrays are explained later in this documentation. They are containers and any literal of their types can be used for the definition of a container. The as the resolution of rule has to be the identifier of a container, otherwise it will lead to a Type Error.

The related operators are as usual in most programming languages:

<op_comp>     ::= '<=' | '==' | '!=' | '>=' |
                  '<' | '>' | '<=>' | 'is' [ 'not' ]
<op_cont'>    ::= 'in' | 'not' 'in'

Notice: a not-that-usual operator is available in Typee, operator <=>. This operator is well-known by PHP programmers. Its signature and explanation are:

int32 operator <=> ( const ? a, const ? b )

Returns a negative value if a < b.
Returns 0 if a == b.
Returns a positive value if a > b.

The strictly boolean operators and, or and not only operate on boolean values.

Operator and returns True if left and right operands both evaluate to True and False if they do not both.
Operator or returns True if either left operand or right operand evaluates to True and False if they both evaluate to False
Operator not is a unary operator. It returns the inverse of the value of its operand: not True evaluates to False, and not False evaluates to True.

The comparison operators are the classic ones. Meanwhile, Typee accepts series of comparison operators as does Python.
Typee adds to these the Python operator is and its pendant is not.
Operator is evaluates to True when left and right operands are referencing the same entity. Operator is not evaluates to True when both operands do not refer to the same entity.

Examples:

int16 x = 20, y = 21;
print( 0 <= x <= y <= 21 ); // prints true
print( 35 > y > x > 20 );   // prints false - due to rightmost comparison
int16 xx = x;
int16 x_ref = @x;
print( xx is x );           // prints false
print( x_ref is x );        // prints true
print( x_ref is not xx );   // prints true

The containment operators operate on values and containers. Operator in evaluates to True if the left operand (the value) is contained in the right operand (the container). It evaluates to False otherwise.
Inserting keyword not before in inverses this evaluation.

Examples:

const char a = 'A';
const str  s = 'dDeEaAdDbBeEeEfF';
print( a in s );       // true
print( 'Z' not in s ); // true

Types precedence

Operator not takes precedence on operator and which takes precedence on operator or.

Those three boolean operators take a lower precedence on comparison and containment operators which are processed from left to right, sharing the same precedence level, lower than the lowest precedence og arithmetic operators.

Types promotion

When comparing values, types promotion may happen. The rules for arihtmetic expression apply when comparing expressions. The same is true when comparing string expressions. Meanwhile, string expression have to be understood as extended here to type character when evaluating the containment or not of a character in a string: char or char16 is the type of the left operand value and str or str16 is the type of the right operand container.

3.4.6 String Expressions

There are only two valid operators that can be applied to them in Typee, the concatenation and the hashing operators.
The hash operator is a unary operator that has been described in the sub-section about arithmetic expressions. It returns an integer value, the hash value of the string expression. If the hash operator is to be applied on a concatenation of strings, this concatenation should be enclosed within parenthesis, due to built-in operators precedence in Typee.

The concatenation operator concatenates strings together, from left to right. This operator is +.

The formal EBNF specification of string expressions is very simple:

<string expression> ::= <string> (<op_concat> <string>)*  |
                            '(' <string expression> ')'
<hash string>       ::= <op_hash> (<string>  |  '(' <string expression> ')')
<op_concat>         ::= '+'
<op_hash>           ::= '#'

Type promotion
When strings of different types are concatenated, the resulting string is of the largest mixed string type (i.e. str16 with current specification of Typee).

3.4.7 Special Assignment Expressions

There is only one valid operator that can be applied to special assignment expression. This is operator ??. Its meaning is the same as in PHP and is somewhat equivalent to the use of keyword or in Python when applied in assignement statements.

a = b ?? c is exactly equivalent to a = b if b is not none else c.

Types of a, b and c have to be compatible of course.

Typee Operator ?? is somewhat equivalent to keyword or in Python because operator ?? in Typee only applies to the none status or not of the identifier while Python accepts to check also for the maybe boolean value of the identifier.

Operator ?? takes precedence over all other arithmetic operators. It may be used multiple times in an assignment statement and is applied with left precedence:

a = b ?? c ?? d ?? e

is equivalent to longer code

a = b if b is not none else c if c is not none else d if d is not none else e

This operator gets the augmented ??= whose specification is explained in a later section.
a ??= b is equivalent to a = a if a is not none else b.

3.4.8 Comprehensions

Python programmers are used to use comprehensions. This concept is a set of loops and tests that generate series of values. They can be used to initialize lists, sets or maps as well as to generate a list of arguments within a function (or method) call.

Their formal EBNF specification is:

<for comprehension>  ::= 'for' '(' <target list> 'in' <or test>
                             <iter comprehension> ')'

<if comprehension>   ::= 'if' '(' (<or test> | <unnamed func>) ')'
                             <iter comprehension>

<iter comprehension> ::= [ <for comprehension>  |  <if comprehension> ]

Rules <unnamed func> and <or test> are specified later in this page – see sections on boolean expressions and on unnamed functions.
Rule <target list> will be later specified in this documentation.

Example:

list my_list = [1, 2, 3, 4, 5, 6];
print( n for n in my_list if n % 2 == 0 ); 

should print:

2 4 6

The above comprehension example prints every item contained in variable my_list if and only of it is an even value.

3.4.9 Subscription and Slicing

A subscription indexes an item that is contained in a container. Containers are, for instance, lists, sets, maps, arrays and strings (which are ordered lists of characters). A slice indexes a subpart of a container, this subpart being allowed to contain many of the container items.

They both can be applied to any kind of container. Their EBNF formal specification is:

<subscription> ::= '[' <integer expression> ( (',' <integer expression>)* |
                                               <if comprehension> ) ']'

<slicing>      ::= '[' <integer expression> ':' [<integer expression>]
                        [':' [<integer expression>]] <slice end>
<slice end>    ::= ']'  |  ')'

Here, <integer expression> is a rule to express arithmetic expressions that evaluate to an integer value. Any other type of evaluated value is a type error.

Examples:

str s = "abcdefghij";  // prints:
print( s[4] );         //   e        -- this is a charprint( s[4:6) );       //   ef       -- this is an str, final index is excluded
print( s[4:6] );       //   efg      -- this is an str, final index is included
print( s[4:]           //   efghij   -- this is an str
print( s[4::2] );      //   egi      -- this is an str
print( s[-1] );        //   j        -- this is a char, first char before end
print( s[-1:-8] );     //   jihgfedc -- this is a string, reverse indexing
print( s[-2:-8:-2) );  //   ige      -- this is a string, reverse indexing, last index excluded

3.4.10 Functions and Methods calls

Methods are functions that are defined in classes. The act exactly the same as functions but their names are associated with the identifiers of the instances of the class they are defined in.

Every function or method gets an identifier. For a method, this identifier at call time is associated with another identifier: the identifier of its owning entity. Both identifiers are separated with a dot, the identifier of the owning entity first ,the identifier of the owned function then.

At call time, arguments are passed to the called function or method. These arguments are separated by commas and are surrounded by parenthesis. The EBNF formal specification of this is here:

<function call>      ::= <dotted name> [<template args>] <function call args>

<dotted name>        ::= <identifier> ('.' <identifier>)*

<function call args> ::= '(' [ (<function call>)*
                                 [',' ((<function call>)*  |
                                        <ellipsis> <identifier>)  |
                                  <for comprehension>] ] ')'
<template args>      ::= '<' [<expression> (',' <expression>)*] '>'

C++ programmers know what templates are. Java programmers know it also as soon as they are told that this has to do with generics. Python programmers may not know what are templates. We will explain their roles later in this documentation, when we will introduce classes. Let’s just consider that this is a convenient way to specify what are the types or constant values that a method or a function may process while they might be known or evaluated at run time.

3.4.11 Unnamed Functions

Unnamed functions are heavily used in Javascript. There, they most often are embedded in callbacks registration on user events. Unnamed functions are very often used in Java also and they are available in Python.

They sometimes are named lambdas. This naming seems to be much confusing for many programmers who are not used with such functions. In Typee, we have decided to name these functions unnamed.

You will see in the specification of unnamed functions that keyword lambda is also specified. Well, this keyword is so much common in other computer languages that we have decided to accept both namings for unnamed function. This could lead in some future to the depreciation of one or the other of these keywords. This is why we strongly encourage newbies-to-unnamed-functions to systematically use keyword unnamed when they define an unnamed function.

Formal specification is this:

<unnamed func> ::= <unnamed> [<returned type>] 
                       <function args declaration> <statements block>
<unnamed>      ::= 'unnamed'  |  'lambda'

<function args declaration> has been specified above in present page.
<returned type> is any built-in or user-defined type. We will specify user-defined types later in this documentation.
<statements block> is a list of statements, separated with semi-colons and grouped into a block of statements. We are going to specify statements in next section, and statements blocks then after.

 
Next section formerly explains a first set of Typee statements, the compound statements.

< previous (3.3 literals) | (3.5 compound statements) next >