– Typee Documentation –

4 – Built-in Containers

4.4 – Sets

Sets are containers with a singular characteristic: any item in a set may be present in it only once at the same time. Adding more than once a same item will take no effect on the conctent of the set. Sets may contain items of any type and cannot be sorted. They are something like a bag with one and only one instance of each contained item, whatever the types of the contained items.

Furthermore, sets are a mathematical concept. Mathematical operators union, intersection and difference apply to them. These operators exist in Typee, as functions as well as operators for some of them. Inclusion of a set within another and superset characteristic can also be checked in Typee. The presence of an item in a set, its picking from a set, its adding and its removal are the final functions that can be operated on a set.

Caution: when adding scalar numbers in a set, their types are not distinguished. As long as two scalar values are considered to be equal, only one of them will be present in a set at a time. For instance, adding 1 then 1.0 in a set will lead to the containment of the sole value 1; adding 1.0 then 1 will lead to the containment of the sole value 1.0. Caution again: due to approximations on internal storage of float numbers, it may be that same integer and float values will both appear in a set at a time, should their internal coding be different on the CPU running the program.

4.4.1 Sets Declaration

The formal EBNF specification of the syntax of sets declaration in Typee is:

<set declaration>           ::= <set type> <identifier> [ '=' <set litteral> ]

<contained type>            ::= '<' (<TYPE> | <enclosed types list>) '>' 

<enclosed types list>       ::= '(' <types list> ')' 

<for comprehension>         ::= 'for' '(' <target list> 
                                   'in' <or test> <iter comprehension> ')' 

<if comprehension>          ::= 'if' '(' (<or test> | <unnamed func>) ')' 
                                <iter comprehension>

<iter comprehension>        ::= [ <for comprehension>  |  <if comprehension> ]

<set litteral>              ::= '[' <set items list> ']'  |
                                <set type> '(' <set items list> ')'

<set item>                  ::= <expression> [<for comprehension>]

<set items list>            ::= <set item> ( ',' <set item> )*

<set type>                  ::= ['const'] 'set' [<contained type>]

<slice end>                 ::= ']'  |  ')' 

<slice form>                ::= ':' [<expression>]  [ ':' [<expression>] ]

<subscription or slicing>   ::= <subscription or slicing'>
                                    (<subscription or slicing'>)*
<subscription or slicing'>  ::= '[' <expression>
                                    ( ']'                     |
                                      <if comprehension> ']'  |
                                      <slice form> <slice end> )

<target>                    ::= <dotted name> (<subscription or slicing>)*

<targets list>              ::= <target> ( ',' <target> )*

<types list>                ::= <TYPE> ( ',' <TYPE> )*

<TYPE>                      ::= [ 'const' ] <type>

So, there is no needs to specify any contained type for a set in Typee. This is the exact equivalent of sets in Python, for instance. Of course, if a type or a list of types are declared, the set may only contain items of these types.

Example:

set my_set = [ 1, 2, 3, 1.0, 'a string' ];
print( my_set );  // prints: [1, 2, 3, "a string"]
// next line contains two errors, on the two last items
set<int8 my_set = [ 1, 2, 3, 1.0, 'a string' ];

4.4.2 Sets Iterations

Sets are iterable. Statement for runs through sets and provides back each of the contained items one after the other. The order in which contained items are returned by statement for is not specified. Python for instance stipulates that items may be returned in any order, and that this order may vary from platforms to platforms. So, this is the case also in Typee since its translators will take benefit of any equivalent goodies in their translated programming languages.

Example:

set my_set = [ 1, 2, 3, 1.0, 'a string' ];
for( ? item in my_set )
    print( item );
// prints: (with no ensurance that it will be printed in this order)
// 1
// 2
// 3
// a string
// (notice: 1.0 is not printed since 1 is already present in set)

4.4.3 Sets and Lists Type Casting

Sets are a sub-set of Lists: they contain items of any type, each of them once and only once, as a one-dimensional container. This is the reason why sets may be cast to lists as well as lists may be cast to sets (but with one contraint: doublons will be removed in sets to get only one occurence of a same item in them).

4.4.3.1 Set casting to List

Sets can be type-casted to lists. The items of the source set are appended to the destination list in a random manner.
The formal EBNF syntax of the related grammar rules in Typee are:

<set to list casting> ::= 'list' '(' <set litteral> | <dotted name> ')' 

Example:

set my_set = [ 1, 2, 3, [4, 5, 6] ];
list my_list = set( my_set );
print( my_list );  // prints: [2, 3, [4, 5, 6], 1 ]
4.4.3.2 List casting to Set

Lists can be type-casted to sets. When casted this way, items from the source list are added to the destination set once and only once each. The formal EBNF syntax of the related grammar rules in Typee are:

<list to set casting> ::= 'set' '(' <list litteral> | <dotted name> ')'

Example:

list my_list = [ 1, 2.0, 3, [4, 5, 6], 1, 2, 'three' ];
set  my_set  = list;
my_set = set( my_list );
print( my_set );  // prints: [1, 2.0, 3, [4, 5, 6], 'three']

4.4.4 Sets Manipulations

Typee provides built-in functions and operators for the manipulation of sets. All of the functions are methods-like. They can be applied to a set as would be built-in methods:
my_set.set_function(...)

Functions and operators are of a few kinds and are all presented in next sub-sections.

4.4.4.1 Sets initialisation
set operator = ( set left, const ? in (array,list,map,set) right )

The assignment operator for sets is the usual assignment operator =. It assigns the content of argument right to set argument left. right may be of many types:

  • array: each contained item in the array is put (once and only once) in the set;
  • list: The lists can be named by their identifiers as well as provided as list litterals (i.e. [ ..., ... ]). Each contained item in the list is put (once and only once) in the set;
  • map: from the pairs (key, item) that are contained in the map, only items are put (once and only once each) in the set;
  • set: this is a true copy of sets – the copied set may be provided as a litteral set (i.e. as a set litteral, see right below.

About set litterals, remember the associated grammar rules presented above:

<set litteral>   ::= '[' <set items list> ']'  |
                     <set type> '(' <set items list> ')'

<contained type> ::= '<' (['const'] <type> | <enclosed types list>) '>' 

<set item>       ::= <expression> [<for comprehension>]

<set items list> ::= <set item> ( ',' <set item> )*

<set type>       ::= ['const'] 'set' [<contained type>]

These special rules are mandatory to distinguished set litterals from list litterals in Typee.
Example:

set<uint8> my_set = set<uint8>( uint8 10*n for n in [0,12),
                                1, 2, 3, 4, 5, 10, 20 );
                    // (notice doublons for 10 and 20)
print( my_set );
// prints: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 1, 2, 3, 4, 5]
//  (well, or same list of numbers but maybe in any other ordering)
4.4.4.2 Set emptying and deleting
set empty()

Empties the set: every item contained in it is deleted. This may lead to a call to their destructors when they are instancies of classes that define a destructor. Programmers have to take care there because some items may have been previously referenced in the code. These references will then reference released resources (most of the time, memory space) and should not be later accessed in the code.
Example:

set my_set= [ 0, 1, 'c' ];
print( my_set.empty() );  // prints: []
delete
del

Typee instruction delete applied to a set deletes the set and its whole content. This may lead to a call to the item destructors when they are instances of classes that define a destructor. Programmers have to take care there because some items may have been previously referenced in code. These references will then reference released resources (most of the time, memory space) and should not be later accessed in the code.
Keyword del is a wrapper to keyword delete.
Example:

set my_set= [ 0, 1, 'c' ];
del my_set;
print( my_set);  // prints: none
4.4.4.3 Sets mathematical operations: Union, Intersection, Difference, Inclusion, Equality

Let’s first shortly explain what are those mathematical operations. Here is a graphical representation of two sets. They contain items (dark dots) and share three of them in their respective content.

We describe here below the union, difference and intersection operations on sets.

set union( const set other )
set operator +( const set left, const set right )
set operator +=( set left, const set right )

Here is the graphical representation of the union operator:

Here, every items contained in the two sets are merged into one set, removing any doublon in the merged set.
The function union() merges the two sets in-place, i.e. into the operated set. It returns a reference to the operated set, for other operations or functions applied to the same set to be cascadable.
The operator + creates an unnamed new set which contains the merged items from the two sets and which is returned to be used in the expression the operator appears in.
The operator += is an in-place equivalent of function union().
Examples:

set s1 = [1, 2, 3];
set s2 = [1, 12, 13];
print( s1 + s2 );  // prints: [1, 2, 3, 12, 13]
s1.union( s2 );
print( s1 );  // prints: [1, 2, 3, 12, 13]
s1 += [11, 21];
print( s1 );  // prints: [1, 2, 3, 12, 13, 11, 21]
set difference( const set other )
set operator –( const set left, const set right )
set operator -=( set left, const set right )

Here is the graphical representation of the difference operator:

Here, every items contained in both first and second set are removed from the first set.
The function difference() removes those items in-place, i.e. into the operated set. It returns a reference to the operated set, for other operations or functions applied to the same set to be cascadable.
The operator - creates an unnamed new set which contains all the items belonging to the ‘left’ set and not belonging to the ‘right’ set.
The operator -= is an in-place equivalent of function difference().
Examples:

set s1 = [1, 2, 3];
set s2 = [1, 12, 13];
print( s1 - s2 );  // prints: [2, 3]
s1.difference( s2 );
print( s1 );  // prints: [2, 3]
s1 -= [2, 21];
print( s1 );  // prints: [3]
set intersect( const set other )
set operator ^( const set left, const set right )
set operator ^=( set left, const set right )

Here is the graphical representation of the intersection operator:

Here, the intersection of the sets is a set which contains all items contained in both original sets.
The function intersect() operates in-place, i.e. into the operated set. It returns a reference to the operated set, for other operations or functions appplied to the same set to be cascadable.
The operator ^ creates an unnamed new set which contains all the items belonging to both the ‘left’ and the ‘right’ sets.
The operator ^= is an in-place equivalent of function intersect().
Examples:

set s1 = [1, 2, 3];
set s2 = [1, 12, 13];
print( s1 ^ s2 );  // prints: [1]
s1.intersect( s2 );
print( s1 );  // prints: [1]
s1  = [1, 2, 3];
s1 ^= [2, 3, 4];
print( s1 );  // prints: [2, 3]
const bool includes( const set other )
const bool > ( const set left, const set right )
const bool >= ( const set left, const set right )

Inclusion of a set s1 in another set s2 is verified when all items contained in set s1 are also contained in set s2.

Here, every items contained in red set (s1) are also contained in blue set (s2). So, s2 is said to include s1 (as well as s1 is said to be included in s2).

For instance, a corresponding Typee code would be:

set s1 = [1, 2, 3];
set s2 = [1, 2, 3, 12, 13];
print( s2.includes( s1 ) );  // prints true
print( s2 > s1 );            // prints true
print( s2 >= s1 );           // prints true
const bool < ( const set left, const set right )
const bool <= ( const set left, const set right )

These operators are the opposite of the includes function. They implement the concept of being included in:

set s1 = [1, 2, 3];
set s2 = [1, 2, 3, 12, 13];
print( s1 < s2 );   // prints true
print( s1 <= s2 );  // prints true
const bool == ( const set left, const set right )
const bool != ( const set left, const set right )

Equality and inequality of sets are evaluated with the usual operators == and !=.

Here, every items contained in red set (s1) are also contained in blue set (s2) and every item contained in blue set are also contained in red set.
In this case, s1 <= s2 and s2 <= s1.
So, s1 == s2.
Meanwhile, should there is at least one item that makes a difference between two sets, they would be evaluated as not equal.

For instance, a corresponding Typee code would be:

set s1 = [1, 2, 3];
set s2 = [1.0, 2, 3];  // remember: 1 == 1.0
print( s2 == s1 );     // prints true
print( s1 == s2 );     // prints true
set s3 = [1, 2];
print( s1 == s3 );     // prints false
print( s1 != s3 );     // prints true

Note: operator !=, when returning value true, validates the disjoint status of compared sets. Disjoint is also a concept applied to sets in Mathematics – see section 0.1 Concepts of sets theory of chapter 0 Mathematical preliminaries in old book “The Theory of Parsing, Translation and Compiling, Vol. 1: Parsing“, Alfred V. Aho, Jeffrey D. Ullman, Prentice-Hall inc., Series in Automatic Computation, ed. George Forsythe, 1972, ISBN: 0-13-914556-7.

4.4.4.4 Items Adding, Deleting and Picking

Two functions allow the insertion of an item in a set and the deletion of an item in a set. They both return a reference to the operated set, just to allow for the cascading of calls.

set add( const ? item )
set operator << ( set left, const ? right )
set operator <<= ( set left, const ? right )

Adds the item into the set. If item is already present in set, no adding takes place. Returns a reference to the operated set.
The streaming operator << adds item argument right to set argument left and returns a reference to the operated set left to be cascadable.
The in-place streaming operator <<= adds item argument right to set argument left.
Example:

set my_set;
print( my_set );  // prints: []
my_set.add( 1 ).add( 11 ).add( 2 ).add( 11.0 );
print( my_set );  // prints: [1, 11, 2]
                  // (may be with any other ordering)
my_set <<= 22;
print( my_set );  // prints: [1, 11, 2, 22]
                  // (may be with any other ordering)
my_set = my_set << 3 << 33 << 3.0 << 'd';
print( my_set );  // prints: [1, 11, 2, 3, 33, "d"]
                  // (may be with any other ordering)
set del( const ? item )

Removes the item from the set. If item is not present in set, raises a NotFoundException. Returns a reference to the operated set. Cascaded deletions are operated in left to right order.
Example:

set my_set = set( 1.0, 2, 11 );
my_set.del( 11 ).del( 1 );
print( my_set );  // prints: [2]
const ? pick()
const ItemT pick<ItemT>()
set operator >> ( set left, ? right )

Removes an item from a set and returns it. Returns none if no item is to be picked from the set (either because no more item is available or because the set was declared to be const). The templated version of this function picks an item of type ItemT from a set. Picking from sets is a random operation.
Operator >> assigns item argument right with a randomly picked item from set argument left. It returns a reference to the operated set for operations to be cascadable.
Example:

set my_set = set( 1.0, 2, 11 );
print( my_set.pick(), my_set.pick() );  // prints: 11 1.0
print( my_set );                        // prints: [2]
my_set.add( 22, 33, 44 );
uint32 item1, item2;
my_set >> item1 >> item2;
print( item1, item2, my_set );          // prints: 33 22 [44]
4.4.4.5 Items counting

How many items in a set?

const uint64 count()
const uint64 count< ItemT >()

Returns the number of items contained in a set. The templated version of this function returns the count of items of type ItemT that are contained in a set.
Example:

set my_set = ['b', 2, 'a', 1, 10, 'a string'];
print( my_set.count() );  // prints: 6
4.4.4.6 Items Searching

Keyword in and its counterpart not in check for the presence or not of a specified item in a set.

const bool operator in ( const ? item, const set container )

Keyword in checks for the presence of an item into a set. Result is either true if item has been found in the set or false otherwise.
Examples:

set my_set = ['b', 2, 'a', 1, 10, 'a string'];
print( 2 in my_set );     // prints true
print( 10.0 in my_set );  // prints true
print( 'c' in my_set );   // prints false
const bool operator not in ( const ? item, const set container )

The association of these two keywords is the counterpart of keyword in.
Examples:

set my_set = ['b', 2, 'a', 1, 10, 'a string'];
print( 2 not in my_set );     // prints false
print( 10.0 not in my_set );  // prints false
print( 'c' in my_set );       // prints true
4.4.4.7 Set copying and referencing

Copying a set means creating a new set with same content in heap or stack memory. Any transformation done then on one of the sets will not modify the other one.

Referencing a set is to be understood as creating an alias or defining a pointer to the original set. Any transformation applied then to the set will be reflected through its reference. Any transformation applied also to its reference will be applied to the original set.

Syntax is straightforward. Let’s illustrate it with simple examples:

set src_set = ['c', 1, 'b', 2, 'a', 1, 10, 'a string']
set<uint8> dst_set;

dst_set = src_set ;  // set copy, notice that sole uint8 values are copied
dst_set <<= 20;
print( src_set, dst_set);
  // prints: ['c', 1, 'b', 2, 'a', 10, 'a string']
  //         [1, 2, 10, 20]

dst_set = &src_set; // set reference
dst_set.add( 100 );
print( src_set, dst_set);
  // prints: ['c', 1, 'b', 2, 'a', 10, 'a string', 100]
  //         [1, 2, 10, 20, 100]
4.4.4.8 Set sorting

There are two main classes of sorting algorithms: unstable and stable. Stable sorting means that equal items will stay ordered between each others the same way they where before sorting. Such sorts are most of the time a little bit slower than unstable sorts, but they are much appreciated because of this particular main characteristic.

It appears that some other programming languages offer by default a stable sorting function, based on the Timsort algorithm. This is a hybrid algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. (source: Wikipedia). This algorithm had been first introduced in [1] by Peter McIlroy in 1993 and had been implemented by Tim Peters (Timsort…) in 2002 for the Python sort() built-in function. Since then, this algorithm has been enhanced (a bug has been found) and has also been implemented for Java.
[1]: “Optimistic Sorting and Information Theoretic Complexity“, Peter McIlroy, in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 467–474, January 1993.

So, it may be that the unstable sorting function built in Typee will finally be translated into a stable version of a sorting algorithm, depending on the targeted programming language. This will enventually be the case when translating Typee code to Python or Java.

Meanwhile, sets are never sorted in place in Typee, on the contrary of arrays or lists for instance. There is no “stable” version of the sort also in current version of Typee.

It has to be noticed that sorting a set results in an unnamed sorted list that can then be accessed, for instance, in a for loop. We provide examples also about this in this section of the documentation.

list sort()

This is the basic sort function in Typee. Sets are sorted in the ascending order of their items values. Its result is a sorted list containing the items of the set in their ascending order. Of course, contained items have to be comparable. In Python for instance, there is no way to compare strings and integers. In Typee, objects and scalar are comparable as soon as they get a hash code. All scalar types in Typee are hashable when they are contained in a set, so strings and integers (or floats) will be comparable when sorting a set. If you use instances of classes (i.e. objects), these classes must define at least operator # to still be comparable. Hash value will always be used for sorting sets if no other means is available to compare two objects.
If some item is not comparable, a TypeException will be raised at run-time (unless this will be possible to check at translation time, in which case en error will be set by Typee translators).
Example:

set my_set = ['b', 2, 'c', 1, 'a', 1, 10, 'a string'];
print( my_set.sort() );  // prints [1, 2, 10, "a", "a string", "b", "c"]
list sort( const int32 comp_func(const ?, const ?) )
list sort( const int32 comp_func< ItemT >(const ItemT, const ItemT) )

This signature of built-in function sort() specifies the comparison function to be used for the sorting operation on the set. Comparison functions here have to accept two arguments of any type with a constraint: these arguments have to be comparable together. If they are not, either a type error will be set at translation time or (most often) a TypeException will be raised at run-time. Be cautious there. These comparison functions are expected to return a signed integer value:

  • less than 0 if first argument is less than second one;
  • equal to 0 if both arguments are equal;
  • greater than 0 if first argument is greater than second one.

The result of this sort is a list containing the items of the set in their ascending order.

list reverse_sort()

This is the counter part of function sort(). It sorts sets in the descending order ot their items. Its result is a sorted list containing the set items in their descending order. Of course, items have to be comparable. In Python for instance, there is no way to compare strings and integers. In Typee, objects and scalar are comparable as soon as they get a hash code. All scalar types in Typee are hashable when they are contained in a set, so strings and integers (or floats) will be comparable. If you use instances of classes (i.e. objects), these classes must define at least operator # to still be comparable. Hash value will always be used for sorting when no other comparing means is available.
If some item is not comparable, a TypeException will be raised at run-time (unless this will be possible to check at translation time, in which case en error will be set by Typee translators).

list reverse_sort( const int32 comp_func(const ?, const ?) )
list reverse_sort( const int32 comp_func< ItemT >(const ItemT, const ItemT) )

This is the counter part of function sort() when passed a comparison function as argument. It sorts sets in the reverse order that the comparison function results in on their items. Comparison functions here have to accept two arguments of any type with a constraint: these arguments have to be comparable together. If they are not, either a type error will be set at translation time or (most often) a TypeException will be raised at run-time. Be cautious there. These comparison functions are expected to return a signed integer value:

  • less than 0 if first argument is less than second one;
  • equal to 0 if both arguments are equal;
  • greater than 0 if first argument is greater than second one.

The result of this sort is a list containing the items of the set in their descending order.

4.4.5 Sets as arguments

Functions and methods signatures are constituted of the types of their arguments and the type of value they return. Sets may be passed as arguments to functions and methods and may be passed back to caller on return. Keyword set is a generic type to specify any kind of set, whatever its size and the maybe declared types for its content.

 
Next section formerly explains built-in files and the serialization concept.

< previous (4.3 built-in maps) | (4.5 built-in files & Serialization) next >