– Typee Documentation –

4 – Built-in Containers

4.5 – Files and Serialization

Files are singular containers. Their content are read from and written to permanent storage (e.g. hard disks) and in Typee they are streamed as well as indexed containers.

The associated grammar syntax is very simple: only files declarations are specified. Meanwhile, there are a lot of built-in functions and operators that are associated with files. Furthermore, a special built-in library is provided with Typee: File, described with the many other Typee built-in libraries in next section of this documentation. The File library dedicated to files defines file classes that ease the creation and the valid manipulations of files.

Serialization is the way instances of classes or complex objects (such as arrays or lists, for instance) get the value of their attributes or the values they contain saved in files and are further loaded from files. Typee specifies a dedicated serialization protocol for this that we explain by the end of this page: TSP (for Typee Serialization Protocol).

4.5.1 Files declaration

The formal EBNF specification of the syntax of files declaration in Typee is:

<file declaration>    ::= <file type> <identifier>

<contained type>      ::= '<' (<TYPE> | <enclosed types list>) '>' 

<enclosed types list> ::= '(' <types list> ')' 

<file type>           ::= 'file' [<contained type>]

<TYPE>                ::= [ 'const' ] <type>

<types list>          ::= <TYPE> ( ',' <TYPE> )*

Files may contain any type of content, but every item put in them have to conform with at least one of the declared types as soon as types are specified for their content. Specifying types at file declaration may help accessing stored items via indexes. It is also a way to take benefit of types checking at translation time, as much as this is possible – and most of the time, it is.

Examples:

// next file may contain any type of data
file a_file;

// next file may only contain 16-bits chars
file<char16> file_chars16;

// next files contains 4-bytes integer records,
// either signed or unsigned
file<(int32, uint32)> file_4bytes;

// let's see a little bit more complex  example
// for which next file contains 8-bytes records
class KV {
  public uint32  key;
  public float32 value;
}
file< KV > file_with_records;

4.5.2 Files Opening and Closing

Once a file has been declared, it can be opened according to one from three modes: read-only, write-only and read-write. In almost all Operating Systems (OS), opening a file allocates an OS resource while these resources are limited in number. Any allocated resource should be release as soon it is no longer needed. Closing a file releases any previously allocated resource for it.
Whatever the access mode that has been specified at opening time, the file cursor is always positionned at starting position of the file (i.e. position 0). To append items at end of a writable file, a call to function end_pos() is mandatory before writing anything in the file – see related sub-section further below.

none open( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode )

Opens a previously declared file according to the specified file path and with the specified read/write access mode.

By default, file paths are relative to the directory from which the program is run. This directory may be explicitely named also with “./“.
The directories separator in the file path is exclusively “/“. Windows programmers should mind this.
To go “up a level” once to many times in a directory tree, file paths may be prefixed with as many “../” as necessary.
Finaly, absolute paths are specified with a leading “/“. Unfortunately, absolute paths get different syntaxes with different Operating Systems. When translating a Typee program for both Windows and Linux, for instance, absolute paths for a same file will be different. Up today, there is no way in Typee to distinguish between targeted Operating Systems in code. So, programmers are strongly encouraged to use relative file paths to take benefit of maximalist portability. This is an identified issue that will be fixed with high priority.

Argument rw_mode is a string. Actually, accepted strings are:

  • "ro" for Read-Only access;
  • "wo" for Write-Only access;
  • "rw" for Read-Write access;

Function open() raises FileNotFoundException if file is not found or if file_path is incorrect while 'r' is specified in rw_mode, FileAccessException if access to the file cannot be granted, TypeException if any of the arguments is not a string, FileExistsException when trying to create (with 'w' in rw_mode) an already existing file, and ValueException if argument rw_mode is not one of the three legal 2-chars strings specified right above.

Examples:

file my_file;
// next, file is Read-Only and one directory below "working" directory
my_file.open( "./some_dir/some_file.ext", "ro" );

// next, file is Write-Only and two directories far from "working" one
my_file.open( "../../log/logging.txt", "wo" );

// next, file is Read-Write an directly in "working" directory
my_file.open( "this_file.ext", "rw" );

Notice: for convenience of coding, strings "r" and "w" are also available for rw_mode specification, as wrappers to resp. "ro" and "wo". Programmers are nevertheless encouraged to use strings "ro" and "wo" since the ending 'o' alleviates any ambiguity such as possible typo and 'r' or 'w' missing, for instance.

none reopen( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode )

Closes first an already opened file, then opens it again according to the specified file path and with the specified read/write access mode.

By default, file paths are relative to the directory from which the program is run. This directory may be explicitely named also with “./“.
The directories separator in the file path is exclusively “/“. Windows programmers should mind this.
To go “up a level” in a directory tree, file paths may be prefixed with as many “../” as necessary.
Finaly, absolute paths are specified with a leading “/“. Unfortunately, absolute paths get different syntaxes with different Operating Systems. When translating a Typee program for both Windows and Linux, for instance, absolute paths for a same file will be different. Up today, there is no way in Typee to distinguish between targeted Operating Systems in code. So, programmers are strongly encouraged to use relative file paths to take benefit of maximalist portability. This is an identified issue that will be fixed with high priority.

Argument rw_mode is a string. Actually, accepted strings are:

  • "ro" for Read-Only access;
  • "wo" for Write-Only access;
  • "rw" for Read-Write access;

Function reopen() raises FileNotFoundException if file is not found or if file_path is incorrect while 'r' is specified in rw_mode, FileAccessException if access to the file cannot be granted, TypeException if any of the arguments is not a string, FileExistsException when trying to create (with 'w' in rw_mode) an already existing file, and ValueException if argument rw_mode is not one of the three legal 2-chars strings specified right above.

Examples:

file my_file;
// next, file is Read-Only and one directory below "working" directory
my_file.reopen( "./some_dir/some_file.ext", "ro" );

// next, file is Write-Only and two directories far from "working" one
my_file.reopen( "../../log/logging.txt", "wo" );

// next, file is Read-Write an directly in "working" directory
my_file.reopen( "this_file.ext", "rw" );

Notice: for convenience of coding, strings "r" and "w" are also available for rw_mode specification, as wrappers to resp. "ro" and "wo". Programmers are nevertheless encouraged to use strings "ro" and "wo" since the ending 'o' alleviates any ambiguity such as possible typo and 'r' or 'w' missing, for instance.

none close()

Closes a previously opened file. This function operates first the writing of any pending data onto the physical permanent storage media if the file had been opened with write access. It operates then the releasing of any allocated resource to the file by the Operating System. Once closed, no access to the physical permanent storage media can be granted to the file. Any attempt to do so will raise a FileClosedException.
Attempting to close a file that was not already opened raises a FileNotOpenException.
On closing failure, what other reason why, a FileException will be raised also.
Example:

file my_file;
my_file.open( "./some_dir/some_file.ext", "ro" );
// ...
my_file.close();

4.5.3 Files Cursor Positionning

Once a file has been opened, its cursor can be moved anywhere within the bounds of its physical storage on the permanent media. It is also possible to get back the current position of the file cursor.

4.5.3.1 File cursor current position

Two things have to be distinguished about files in Typee: file cursor and file index.
The file cursor is always positionned on a byte boundary while the file index is always positionned on the starting byte of a record or an item.
When a file is not typed, index and cursor get the same value, counting bytes.
When a file is typed with a type of constant size (e.g. not strings) or with types of same contant types, cursor is a bytes count and index gets a value equal to:

  cursor_position / type_size  

Cursor position and cursor index both start at 0.

const uint64 cursor_pos()

Returns the current position of the cursor, in bytes count.
Raises FileNotOpenException if file has not been previously opened.

const uint64 index()

Returns the current index value, evaluated as cursor_pos / type_size if a constant size has been specified for items stored in file, and evaluated as cursor_pos – i.e. as a bytes count from start – if no constant types size has been specified.
Raises FileNotOpenException if file has not been previously opened.

4.5.3.2 File cursor positionning at position 0
file rewind()

Puts the file cursor and the file index at the very beginning of the file, at position 0. Returns a reference to the operated file for file operations to be cascadable.
Raises FilePosException, FileNotOpenedException, FileClosedException, FileAccessException or FileException in case of related errors.
Example:

file my_file;
my_file.open( "file_path.txt", "rw" );
my_file.rewind();
print( my_file.cursor_pos(), my_file.index() );  // prints: 0 0
4.5.3.3 File cursor positionning at end of file
const uint64 end_pos()

Puts the file cursor at the very end of the file, i.e. one byte further than its ending byte. At that position, no read operation is allowed. Only write ones are. When a constant size for types has been specified at declaration of the file, the index position gets according value, i.e. 1 value further the current number of items (or records) that are present in the file.
Returns the cursor position (in bytes) if no constant size type has been specified at declaration time, and index value otherwise.
Raises FilePosException, FileNotOpenedException, FileClosedException, FileAccessException or FileException in case of related errors.
Examples:

file< float64 > my_file;
my_file.open( "file_path.txt", "ro" );
// let's say that this file currently contains
// ten 64-bits floats (i.e. 8-bytes floats)
print( my_file.end_pos(), my_file.cursor_pos(), my_file.index() );
  // prints: 10 80 10
my_file.close()

file my_file_b;
my_file_b.open( "file_path.txt", "ro" );
// this is the same file on hard drive disk but with no type specified
print( my_file.end_pos(), my_file.cursor_pos(), my_file.index() );
  // prints: 80 80 80
my_file.close();
4.5.3.4 File cursor relative positionning
const uint64 skip( const int64 n )

Caution: this function acts according to the type declared for the content of the file.
The new position of the file cursor (and of the file index) is evaluated as relative to its current position.
If a constant-size type has been declared for the file content, argument n is a number of records (or items) to skip. If no constant-size types have been declared for the file content, argument n is a number of bytes.
n is either a positive number, in which case a forward skipping takes place, or a negative number, in which case a backward skipping takes place.
In case of file out-of-bounds skipping, function skip() raises a FilePosException. It raises also FileNotOpenedException, FileClosedException, FileAccessException or FileException in case of related errors.
skip() returns the cursor new position, or the index new position, according the constant size (or not constant size) type specified at declaration time.
Examples:

file< float64 > my_file;
my_file.open( "file_path.txt", "ro" );
// let's say that this file currently contains
// ten 64-bits floats (i.e. 8-bytes floats)
print( my_file.skip(6), my_file.cursor_pos(), my_file.index() );
  // prints: 6 48 6
print( my_file.skip(-4), my_file.cursor_pos(), my_file.index() );
  // prints: 2 16 2
my_file.close()

file my_file_b;
my_file_b.open( "file_path.txt", "ro" );
// this is the same file on hard drive disk but with no type specified
print( my_file.skip(50), my_file.cursor_pos(), my_file.index() );
  // prints: 50 50 50
print( my_file.end_pos(-20), my_file.cursor_pos(), my_file.index() );
  // prints: 30 30 30
my_file.close();
const uint64 from_end()
const uint64 from_end( const uint64 n )

Caution: this function acts according to the type declared for the content of the file.
The new position of the file cursor (and of the file index) is evaluated as relative to the very end position of the file.
If a constant-size type has been declared for the file content, argument n is a number of records (or items) to skip. If no constant-size types have been declared for the file content, argument n is a number of bytes.
If no n is specified, from_end() just positions the cursor of the file one byte after the very last byte it contains.
In case of file out-of-bounds indexing, function from_end() raises a FilePosException. It raises also FileNotOpenedException, FileClosedException, FileAccessException or FileException in case of related errors.
from_end() returns the cursor new position, or the index new position, according the constant size (or not constant size) type specified at declaration time.
Examples:

file< float64 > my_file;
my_file.open( "file_path.txt", "ro" );
// let's say that this file currently contains
// ten 64-bits floats (i.e. 8-bytes floats)
print( my_file.from_end(6), my_file.cursor_pos(), my_file.index() );
  // prints: 4 32 4
print( my_file.from_end(0), my_file.cursor_pos(), my_file.index() );
  // prints: 10 80 10
my_file.close()

file my_file_b;
my_file_b.open( "file_path.txt", "ro" );
// this is the same file on hard drive disk but with no type specified
print( my_file.from_end(50), my_file.cursor_pos(), my_file.index() );
  // prints: 30 30 30
print( my_file.from_end(0), my_file.cursor_pos(), my_file.index() );
  // prints: 80 80 80
my_file.close();

4.5.4 Files Writing

Writing to a file is operated by two functions with a few different signatures and with streaming and indexing operators. All of these raise FilePosException in case of out-of-bounds indexing and also FileNotOpenedException, FileClosedException, FileAccessException, FileRWModeException, FilePermissionException, TypeException, and FileException in case of related errors.

Typee specifies a protocol for the writing and the reading of built-in types. This is the Typee Serialization Protocol (or TSP). arrays, lists, maps and sets may then be written without specifying either the many types of their content or their number and sizes of dimensions (for arrays and lists).
The TSP protocol is described in the last subsection of this section of Typee documentation.

Objects in Typee are serializable as long as the classes they are instances of define operator << and operator >>. Should some attribute be an instance of another class, this other class should also define these operators. If not, an error will be set at translation time. Once every attribute of an object is serializable, the object itself can be written into file with any of the described next functions and operators. We provide examples of overridding operator << and operator >> in the chapter dedicated to the learning of Typee.

Notice: writing data into a file does not ensure that the written data is really, physically, stored on the permanent storage media. This is due to the way disk transfers are implemented in Operating Systems (OS). To ensure that a file transfer has been effective, just have a call to function flush() after any writing operation. This function flushes all currently pending content in OS buffers, i.e. writes it on disk.

4.5.4.1 Appending at end of file

File cursor is first positionned at end of file. Writing then takes place.

file append( const ? item )
file append( const ? item, const uint64 n )

Writes the specified item at end of file. Notice that function append() gets specialized signatures for array, list, map, set, str and str16 items types. These are specified in next signatures of this function.
If n is specified, the item is appended n times to file.
Example:

my_file.append( 10 ).append( 11, 3 );
// appends 10, 11, 11 and 11 at end of file
file operator <<< ( const ? item )
none operator <<<= ( file f, const ? item )

Appends the item at end of file f, the same way as function append() with above signature.
Example:

my_file <<< 10 <<< 11;
my_file <<<= 12
// appends 10, 11 and 12 at end of file
file append( const ? in (array,list,set) items )
file append( const ? in (array,list,set) items, const uint64 n )

Iterates over the specified container and writes picked items at end of file.
If n is specified, the n first picked items are appended to file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.append( [0, 1, 2] ).append( [10, 11, 12], 2 );
// appends 0, 1, 2, 10 and 11 at end of file
// with interleaved protocol to specify lists,
// number of dimensions and sizes
file operator <<< ( const ? in (array,list,set) items )
none operator <<<= ( file f, const ? in (array,list,set) items )

Appends all the items contained in the container at end of file f, the same way as function append() with above signature.
Example:

my_file <<< [0,1,2] <<< [10,11,12];
my_file <<<= [20, 21]
// appends 0, 1, 2, 10, 11 , 12, 20 and 21 at end of file
// with interleaved protocol to specify lists,
// number of dimensions and sizes
file append( const map items )
file append( const map items, const uint64 n )

Iterates over the specified map (key, value) pairs and writes them as picked, at end of file.
If n is specified, the n first picked items are appended to file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.append( ['a':97, 'b':98] ).append( ['c':99, 'd':100], 1 );
// appends maps ['a':97, 'b':98] and either ['c':99'] or ['d':100] at end of file
// with interleaved protocol to specify maps,
// number of dimensions and sizes
// and items types
file operator <<< ( const map items )
none operator <<<= ( file f, const map items )

Appends all the items contained in the map container at end of file f, the same way as function append() with above signature.
Example:

my_file <<< ['a':97, 'b':98] <<< ['c':99];
my_file <<<= ['d':100]
// appends the above three maps at end of file
// with interleaved protocol to specify maps,
// number of dimensions and sizes
// and items types
file append( const ? in (str,str16) s )
file append( const ? in (str,str16) s, const uint64 n )

Appends to the operated file the specified string s and appends a newline code '\n' at its end. If n is specified, the string is appended n times at the end of the file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.append( 'abc' ).append( 'def' );
// appends "abc\ndef\n" at end of file
file operator <<< ( const ? in (str,str16) s )
none operator <<<= ( file f, const ? in (str,str16) s )

Appends to file f the specified string and appends a newline code '\n' at its end, the same way as function append() with above signature.
Example:

my_file <<< 'abc' <<< 'def';
my_file <<<= 'g'
// appends "abc\ndef\ng\n" at end of file
4.5.4.2 Writing at current position of cursor

The writing of items takes place in file at the current position of the cursor. If the cursor is positionned at the end of the file, writing has the same result as appending.

file write( const ? item )
file write( const ? item, const uint64 n )

Writes the specified item at end of file. Notice that function write() gets specialized signatures for array, list, map, set, str and str16 items types. These are specified in next signatures of this function.
If n is specified, the item is writtenn times into file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.write( 10 ).write( 11, 3 );
// writes 10, 11, 11 and 11 at current position of file cursor
file operator << ( const ? items )
none operator <<= ( file f, const ? items )

Writes the item at current position of cursor of file f, the same way as function write() with above signature.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file << 10 << 11;
my_file <<= 12
// writes 10, 11 and 12 at current position of file cursor
file write( const ? in (array,list,set) items )
file write( const ? in (array,list,set) items, const uint64 n )

Iterates over the specified container and writes picked items at current position of cursor.
If n is specified, the n first picked items are written into file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.write( [0, 1, 2] ).write( [10, 11, 12], 2 );
// writes lists [0, 1, 2] and [10, 11] in place into file
// with interleaved protocol to specify lists,
// number of dimensions and sizes
file operator << ( const ? in (array,list,set) items )
none operator <<= ( file f, const ? in (array,list,set) items )

Writes all the items contained in the container, at current position of cursor of file f the same way as does function write() with above signature.
Example:

my_file << [0, 1, 2] << [10, 11, 12];
my_file <<= [20, 21]
// writes the three above lists in place into file
// with interleaved protocol to specify lists,
// number of dimensions and sizes
file write( const map items )
file write( const map items, const uint64 n )

Iterates over the specified map (key, value) pairs and writes them, at current position of cursor into file.
If n is specified, the n first iterated items are written into file.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.write( ['a':97, 'b':98] ).write( ['c':99, 'd':100], 1 );
// writes maps ['a':97, 'b':98] and either map ['c':99] or map ['d':100]
// at current position of file cursor
// with interleaved protocol to specify maps,
// and the related types for keys and values
file operator << ( const map items )
none operator <<= ( file f, const map items )

Writes all the items contained in the map container at current position of cursor of file f, the same way as function append() with above signature.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file << ['a':97, 'b':98] << ['c':99];
my_file <<= ['d':100]
// appends the three above maps at current position of file cursor
// with interleaved protocol to specify maps,
// and the related types for keys and values
file write( const ? in (str,str16) s)
file write( const ? in (str,str16) s, const uint64>n)

Writes the specified string s plus an ending ‘\n’ at current position of cursor into file. If n is specified, the string is written n times at current position of cursor into file.
Returns a reference to the operated file, for file operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file.write( 'abc' ).write( 'def' );
// writes  "abc\ndef\n" at current position of file cursor
file operator << ( const ? in (str,str16) s)
none operator <<= ( file f, const ? in (str,str16) s)

Writes the specified string s plus an ending ‘\n’ at current position of cursor into file f, the same way as function write() with above signature.
Returns a reference to the operated file, for read and write operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

my_file << 'abc' << 'def';
my_file <<= 'g'
// writes "abc\ndef\ng\n at current position of file cursor
4.5.4.3 Writing at indexed position

This is a goodie offered by Typee. Files may be of two kinds: declared with a constant size of items, or not. When all contained items are of the same constant size, indexing is done on the file index position (index n relates to the n-th item in the file). When items are not all of the same size, indexing is done on the file cursor position (i.e. on a byte basis).
Caution: under certain circumstances, indexed writing into files may break down the buffering optimizations of the underlying Operating System. This will be the case when successive indexes are too much random with values far enough from each others. This will drastically slow down your running program. Only use indexed writing into files when this is really convenient and when gaps are not too big between successive indexes.

file operator [] ( file f, const uint64 index)

The formal EBNF description of writing on an indexed file is:

<file indexed write> ::= <dotted name> '[' <integer expression> ']'
                              '=' <expression> ';'

The expression value is written into file at the integer expression index, which is either a byte position or an item position according to the constant size or not of items contained in file.
The integer index has to be greater than or equal to 0 and less than or equal to the size of the file (bytes count) or to the number of already written items (items count). If this index equals the max number, the indexing addresses the end of the file: this is appending a new item at the end of the file. Exception OutOfBoundsException is raised if index value is negative or greater than what the current size of file allows.
Returns a reference to the operated file for this writing to be cascadable with other files operations.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

file< float64 > my_file;
my_file.open( "file_path.txt", "rw" );
// let's say that this file currently contains
// ten 64-bits floats (i.e. 8-bytes floats)

// modifies stored number at index 2 of file (cursor position 16)
my_file[2] = 3.14159_26535_89793_23846_26433_83; // do you know Mr Archimede?
print( my_file.cursor_pos(), my_file.index() );  // prints: 24 3

// appends a new item at end of file
my_file[10] = 6.022_140_76e023;                  // do you know Mr Avogadro?
print( my_file.cursor_pos(), my_file.index() );  // prints: 88 11
4.5.4.4 Flushing data buffers

Operating Systems (OS) optimize disk accesses which are very slow. OSs manage intermediate buffer where written data is first buffered, waiting for its final writing on disk. This flushing is done when closing a file (i.e. when calling function close() on a file). Function flush(), when applied on a file, flushes also those buffers but at the time of its call.
Caution: flushing buffers should only be done when writing very precious data or while writing on file some synchronizing data, since this may break down the OS optimizations on disk accesses and slow down your running program. Well, this is true unless you use asynchronous transfers files. Fortunately, many OS and programming languages allow such asynchronously accesses to files. See documentation on built-in library Files to get an easy access to asynchronous transfers to and from files in Typee.

file flush()

This function flushes OS buffers associated with the operated file if any pending data was contained in there. It ensures that all recently written data on file has finally been stored on the physical media.
Returns a reference to the operated file, for this function to be cascadable with other file operations.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

file my_file;
my_file.open( 'some_path', 'wo' );
my_file.write( 'some text' ).flush();
none operator ! ( file f )

This unary operator flushes OS buffers associated with the operated file if any pending data was contained in there. It ensures that all recently written data on file has finally been stored on the physical media.
This operator is implicitely applied after any other operation applied to the file in the same instruction. See last line of code in next example below.
Raises any exception as specified in introduction of this section 4.5.4.
Example:

file my_file;
my_file.open( 'some_path', 'wo' );
my_file.write( 'some text' );
! my_file;  //  flushes the file

// or, more concise coding:
! my_file.write( 'next line text' );  // flush takes place after writing

// and also:
! my_file[ 3 ] = 2.71828_18284_59045;  // do you know Mr John Napier?
// notice: this last instruction is legal since operator []
//         applied to a file returns a reference to this file!

4.5.5 Files Reading

Reading items from a file is operated by two functions with a few different signatures and with streaming and indexing operators. All of these raise FilePosException in case of out-of-bounds indexing and also FileNotOpenedException, FileClosedException, FileAccessException, FilePermissionException, FileRWModeException, FileEOFException, TypeException and FileException in case of related errors.

Typee specifies a protocol for the writing and the reading of built-in types. arrays, lists, maps and sets may then be written without specifying either the many types of their content or their number and sizes of dimensions (for arrays and lists).
This protocol, named TSP for Typee Serialization Protocol, is described in the last subsection of this section of Typee documentation.

Objects in Typee are serializable for reading as long as the classes they are instances of define operator >>. Should some attribute be an instance of another class, this other class should also define operator >>. If not, an error will be set at translation time. Once every attribute of an object is serializable for reading, the object itself can be read from file with any of the described next functions and operators. We provide examples of overridding operator >> in the chapter dedicated to the learning of Typee.

4.5.5.1 Reading at current position of cursor

The reading of items takes place in file at the current position of the cursor. To read data at start of a file, just operate function rewind() on this file before reading any data. There is no way to read data at end of file: per its definition, there is no data at end of file.

file read( ? item )
file operator >> ( file f, ? item )
none operator >>= ( file f, ? item )

Reads the specified item. Notice that function read() and its related operators get specialized signatures for array, list, map, set, str and str16 items types. These are specified in next signatures of this function.
First and second signatures return a reference to the operated file, for file operations to be cascadable.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that my_file contains values 1, 2, 3 and 4 with
// respective integer types according to next variables
uint8 val1, val2;
int16 val3;
int64 val4;
my_file.read( val1 ).read( val2 ) >> val3;
my_file >>= val4;
print( val1, val2, val3, val4 );  // prints: 1 2 3 4
? in (array,list,set) read()

Reads the content of a container as previously written in file and according to the Typee Serialization Protocol used to save this container.
Returns a reference to a read container.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say file contains uint32 0, 1, 2, 10, 1 and 11
file<uint32> my_file;
my_file.open( 'path.bin', 'ro' );
array<uint32>[6] the_array = my_file.read();
// the expected returned type is set by the assigned array type
print( the_array );  // prints: 0 1 2 10 1 11

my_file.index( 1 );  // positions file cursor at position 4
the_array[0:2] = my_file.read();
print( the_array );  // prints: 1 2 10 10 1 11
? in (array,list,set) read( const uint64 n )

Reads the content of a container as previously written in file. Argument n specifies the number of items to read and to put in the resulting container.
Returns a reference to a read container.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say file contains uint32 0, 1, 2, 10, 1 and 11
file<uint32> my_file;
my_file.open( 'path.bin', 'ro' );
list<uint32> the_list = my_file.read( 6 );
// the expected returned type is set by the assigned list type
print( the_list );  // prints: 0 1 2 10 1 11

my_file.index( 1 );  // positions file cursor at position 
set<uint32> the_set = my_file.read( 4 );
print( the_set );  // prints: 1 2 10 (1 is inserted only once)
file operator >> ( file f, ? in (array,list,set) items )
none operator >>= ( file f, ? in (array,list,set) items )

Reads items from file and puts them in the passed container (items) according to the Typee Serialization Protocol used to previously write the container in file.
First signature returns a reference to the operated file, to be cascadable.
Second signature operates on file and assigns the passed container with the read items.
Raises any exception as specified in introduction of this section 4.5.5.
Examples:

// let's say file contains uint32 0, 1, 2, 10, 1 and 11
file<uint32> my_file;
my_file.open( 'path.bin', 'ro' );
array<uint32>[4] the_array1;
array<uint32>[2] the_array2;
my_file >> the_array1 >> the_array2;
// the expected returned type is set by the assigned arrays type
print( the_array1 );  // prints: [0, 1, 2, 10]
print( the_array2 );  // prints: [1, 11]

my_file.index( 1 );  // positions file cursor at position 4
my_file >>= the_array1[0:2];
print( the_array1 );  // prints: 1 2 10 10
map read()

Reads the content of a map as previously written in file and according to the Typee Serialization Protocol used to save this container.
Returns a reference to the read map.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that file contains map ['a':97, 'b':98, 'c':99, 'd':100]
map the_map = my_file.read();
print( the_map );  // prints ['a':97, 'b':98, 'c':99, 'd':100]
map read( const uint64 n )

Reads n items from file and uses them as pairs of (key, value) to assign a map.
Returns a reference to the read map.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that file contains 'a', 97, 'b', 98, 'c', 99, 'd', 100
map<char,uint8> the_map = my_file.read( 3 );
print( the_map );  // prints ['a':97, 'b':98, 'c':99]
file operator >> ( file f, map items )
none operator >>= ( file f, map items )

Reads items from file and puts them in the passed map (items) according to the Typee Serialization Protocol used to previously write the map into file.
First signature returns a reference to the operated file, to be cascadable.
Second signature operates on file and assigns the passed map with the read pairs (key, value).
Raises any exception as specified in introduction of this section 4.5.5.
Examples:

// let's say file 'my_file' contains next three maps
//     ['a':97, 'b':98], ['c':99] and ['d':100]
// with the whole associated stuff according to the TSP protocol
map the_map1;
map<char,uint8> the_map2;
my_file >> the_map1 >> the_map2;
print( the_map1, the_map2 ); // prints: ['a': 97, 'b': 98] ['c': 99]
my_file >>= the_map1;
print( the_map1 ); // prints: ['d':100]
? in (str,str16) read()

Reads one string from operated file, starting at current position of file cursor and until a newline character '\n' is read. Caution: the ending newline character is not appended to the string.
Returns a reference to the string read from file.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that file already contains text:
// "that's all, folks!\nthis is a 2nd string"
str s = my_file.read();
print( s );  // prints: that's all, folks!
list<(str,str16)> read( const uint64 n )

Reads the n next strings from operated file, starting at current position of file cursor. Caution: the ending newline characters are not appended to the read strings.
Returns a reference to a list containing the successive strings read from file.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

list my_list = my_file.read( 5 );  // reads next five lines
list<(str,str16)> readlines()

Reads all strings from operated file, starting at current position of file cursor and until end of file.
Returns a reference to a list containing the successive strings read from file.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

list my_lines = my_file.readlines();
  // whole text file is now splitted into my_lines
file operator >> ( file f, ? in (str,str16) s )
none operator >>= ( file f, ? in (str,str16) s)

Reads one string from file f, starting at current position of file cursor and until a newline character '\n' is read or until end of file is found. Caution: the ending newline character is not appended to the string.
The first signature of this operator returns a reference to the operated file for this operator to be cascadable.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that my_file contains strings 'abc' then 'def' and 'g'
str text1, text2;
my_file >> text1 >> text2;
print( text1, text2 );  // prints: abc def

my_file >>= text1;
print( text1, text2 );  // prints: g def
file operator >>> ( file f, list<(str,str16)> lines )
none operator >>>= ( file f, list<(str,str16)> lines )

Reads all strings from operated file, starting at current position of file cursor and until end of file, and puts them in list typed for containing strings. Caution: the ending newline characters that separate strings are not appended to the read strings.
The first signature of this operator returns a reference to the operated file for this operator to be cascadable.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that my_file contains strings 'abc' then 'def' and 'g'
list<str> my_lines;
my_file.rewind() >>> my_lines;
print( my_lines );  // prints: [abc, def, g]

my_file.rewind();
my_file >>>= my_lines;
print( my_lines );  // prints again: [abc, def, g]
4.5.5.2 Reading at indexed position of cursor

This is a goodie offered by Typee. Files may be of two kinds: declared with a constant size of items, or not. When all items are of the same constant size, indexing is done on the file index position (index n relates to the n-th item in the file). When items are not all of the same size, indexing is done on the file cursor position (i.e. on a byte basis).
Caution: under certain circumstances, indexed reading into files may break down the buffering optimizations of the underlying Operating System. This will be the case when successive indexes are too much random with values far enough from each others. This will drastically slow down your running program. Only use indexed reading from files when this is really convenient and when gaps are not too big between successive indexes.

const ? operator [] ( file f, const uint64 index )

The formal EBNF description of reading items from an indexed file is:

<file indexed read> ::= <dotted name> '[' <integer expression> ']'

This is to be considered as a legal part of Typee expression rules. To be fully complete, this is an <atom>
The integer index has to be greater than or equal to 0 and less than the size of the file (bytes count) or less than the number of already written items (items count) if all items have been decalred to be of same size types at filed eclaration time. Exception OutOfBoundsException is raised if index value is negative or greater than what the current size of file allows.
Returns a reference to the operated file for this writing to be cascadable with other file operations.
Raises any exception as specified in introduction of this section 4.5.5.
Example:

// let's say that my_file contains values 1, 2, 3 and "four" with
// respective integer types according to next variables
uint8 val1 = my_file[0], val2 = my_file[1]
int16 val3 = my_file[3];
str val4 = my_file[ 4 ];
print( val1, val2, val3, val4 );  // prints: 1 2 3 four

The number of bytes read to return a value, as well as their decoding for evaluating the value, are determined by the expected returned type. If this type cannot be determined with no ambiguity in an expression, it has to be cast. If not, a TypeAmbiguityException error will be set at translation time. See examples below:

// reads 4 bytes and puts their decoding in my_val
uint32 my_val = my_file[0];

// reads a string from byte index 4 until an '\n' is read
// does not append this '\n' to the end of the string
str my_str = my_file[4];

// reads a string until an '\n' is read, due to the expected final type
str my_str2 = my_str + my_file[18];

// reads an int16 - i.e. 2 bytes - from file,
// even if a float is present in the expression
int16 my_res = 2 * my_file[24] - 3.0;

4.5.6 Little vs. Big Endian

Endianness is the characterization of the manner multi-bytes values are stored in memory. This is true also for storage in files. The mostly used modes for this storage are Little and Big Endian.

Little Endian mode stores less signifiant bytes in lower byte addresses. Big Endian mode stores most significant bytes in lower byte addresses.

When storing multi-bytes values on permanent storage, the default mode in Typee is Big Endian. This default behavior can be modified and restored via two built-in functions and two unary operators.

4.5.6.1 Little Endianness
file little_endian()
file < ( file f )

Sets the writing and the reading mode to little endian. Next read and write operations will get and put first lowest significant bytes and end with highest significant byte (i.e. little-at-start).
Built-in function little_endian() returns a reference to the operated file, to be cascadable with other operations.
The unary operator < returns also a reference to the operated file and is operated with high precedence over any other operator or function. This way, other operations can be applied to the file itself with the little endian mode in operation.
This function and this operator may be applied to the file before any other operation. The file may have not been already opened, for instance. It may be applied also after a call to built-in function close() to a file. This helps modifying the endianness of the file before it to be open again.
Examples:

uint32 val = 0x12_34_56_78;
file my_file:
my_file.open( 'filepath.data', 'rw' );
myfile.write( val ).flush();

my_file.rewind();
uint32 read_val = my_file.read();     // read_val contains 0x12_34_56_78

my_file.rewind();
my_file.little_endian() >> read_val;  // read_val contains 0x78_56_34_12

my_file.big_endian();                 // to be seen at next sub-section
my_file.rewind();
my_file >> read_val;                  // read_val contains 0x12_34_56_78

my_file.rewind();
< my_file >> read_val;                // read_val contains 0x78_56_34_12
my_file.close();
4.5.6.2 Big Endianness
file big_endian()
file > ( file f )

Sets the writing and the reading mode to big endian. Next read and write operations will get and put first highest significant bytes and end with lowest significant bytes (i.e. big-at-start).
Built-in function big_endian() returns a reference to the operated file, to be cascadable with other file operations.
The unary operator > returns also a reference to the operated file and is operated with high precedence over any other operator or function. This way, other operations can be applied to the file itself with the big endian mode in operation.
This function and this operator may be applied to the file before any other operation. The file may have not been already opened, for instance. It may be applied also after a call to built-in function close() to a file. This helps modifying the endianness of the file before it to be open again.
Notice: big endian is the default behavior for read and write files in Typee.
Examples:

uint32 val = 0x12_34_56_78;
file my_file:
my_file.open( 'filepath.data', 'rw' );
myfile.little_endian().write( val ).flush();

my_file.rewind();
uint32 read_val = my_file.read();     // read_val contains 0x12_34_56_78

my_file.rewind();
my_file.big_endian() >> read_val;     // read_val contains 0x78_56_34_12

my_file.little_endian();
my_file.rewind();
my_file >> read_val;                  // read_val contains 0x12_34_56_78

my_file.rewind();
> my_file >> read_val;                // read_val contains 0x78_56_34_12
my_file.close();

4.5.7 Files Iterating

Files are iterable. Statement for runs through files and provides back every of the contained items one after the other.
Of course, for this purpose types of serialized items have to be known at read-time. There are only two cases for which this condition will be satisfied:

  1. The file content type has been specified at declaration time of the file, with a single specified type for content;
  2. or the Typee Serialization Protocol has been involved when items have been stored in the file.

Let’s have a look to all of these cases.

4.5.7.1 File declared with a single type for content

In this case, every items contained in the file get a fixed size that is known at read time. We name such items fixed sized records. The for statement used for iterating through the file returns all the records, one after each other, at each of the iterations.

Examples:

file<float64> my_file;
my_file.open( "filename.dat", "ro" );
for( float64 value in my_file ) {
    print( value );  // prints every float number stored in file.
}
my_file.close();

class MyDataRecord {
    uint32  val_i;
    float32 val_f;
}
file<MyDataRecord> my_records_file;
my_records_file.open( "records.dat", "ro" );
for( MyDataRecord record in my_records_file ) {
    print( record.val_i, record.val_f );  // prints each record stored in file.
}
my_file.close();
4.5.7.2 File created using TSP

The Typee Serialization Protocol is involved as soon as containers and objects (i.e. instances of classes) are written in and read from files. TSP is explained in the last sub-section of this section of Typee documentation.
Once TSP has been used at write-time (most of the time, silently), it is available for free (and silently) at read-time. Containers and objects are then picked from file one after the other when iterating through the file with statement for.

A special case is when iterating on a text file. Text files contain strings of variable lengths each, separated by newline characters ('\n'). Iterating through a text file returns one line of text at a time, in their exact order of writing.

Examples:

file<(int32, uint16, str)> my_tsp_file;
my_tsp_file.open( "tsp_file.dat", "ro" );
for( ? in (int32, uint16, str) item in my_tsp_file ) {
    print( item );  // prints every item contained in the file
}
my_tsp_file.close();

file<str> my_txt_file;
my_txt_file.open( "filename.txt", "ro" );
for( str line in my_txt_file.read() ) {
    print( line );  // prints every line of the text file
}
my_txt_file.close();

4.5.8 Files Manipulations

Typee provides built-in functions and operators for the manipulation of files. All of the functions are methods-like. They can be applied to a file as would be built-in methods via the file identifier:
my_file.file_function(...)

Built-in functions and operators on files are of a few kinds and are all presented in the next sub-sections. They all deal with the file entity itself in the program. They do not deal with the Operating System abstraction layer which is operated in Typee via a dedicated built-in library that is described in a next section of this documentation: built-in library File.

4.5.8.1 File initialisation

There are three ways in Typee to open a file:

  1. built-in function open() is explained in an above sub-section; this is the one that we have used up to now in all the above examples;
  2. a built-in constructor, file(), that we explain here;
  3. and statement with which uses also the built-in file constructor and which is a convenient way to manipulate files. We explain also here how to use this statement with files.
file file( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode )
file<<template types list> > file( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode )

This constructor is to be used at declaration time of a file. It returns a reference to the opened file.
As for built-in function open() on files, it opens a declared file according to the specified file path and with the specified read/write access mode.

By default, file paths are relative to the directory from which the program is run. This directory may be explicitely named also with “./“.
The directories separator in the file path is exclusively “/“. Windows programmers should mind this.
To go “up a level” in a directory tree, file paths may be prefixed with as many “../” as necessary.
Finaly, absolute paths are specified with a leading “/“. Unfortunately, absolute paths get different syntaxes with different Operating Systems. When translating a Typee program for both Windows and Linux, for instance, absolute paths for a same file will be different. Up today, there is no way in Typee to distinguish between targeted Operating Systems in code. So, programmers are strongly encouraged to use relative file paths to take benefit of maximalist portability. This is an identified issue that will be fixed with high priority.

Argument rw_mode is a string. Actually, accepted string are:

  • "ro" for Read-Only access;
  • "wo" for Write-Only access;
  • "rw" for Read-Write access;

Built-in constructor file() raises FileNotFoundException if file is not found or if file_path is incorrect, FileAccessException if access to the file cannot be granted, TypeException if any of the arguments is not a string and ValueException if argument rw_mode is not one of the three legal 2-chars strings specified right above.

Examples:

// next, file is Read-Only and one directory below "working" directory
file my_file = file( "./some_dir/some_file.ext", "ro" );

// next, file is Write-Only and two directories far from "working" one
file my_log_file = file( "../../log/logging.txt", "wo" );

// next, file is Read-Write an directly in "working" directory
file<int32> my_rw_file = file( "this_file.ext", "rw" );

Notice: for convenience of coding, strings "r" and "w" are also available for rw_mode specification, as wrappers to resp. "ro" and "wo". Programmers are nevertheless encouraged to use strings "ro" and "wo" since the ending 'o' alleviates any ambiguity such as possible typo and 'r' or 'w' missing, for instance.

with file( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode ) as
with file<template types list>( const ? in(str,str16) file_path, const ? in(str,str16) rw_mode ) as

This is the third way to initialize a file. Statement with ... as ... creates something like an instance of a file that is named after keyword as. The file is opened and its internal control structure is assigned as a reference to the named instance. This identifier becomes then a reference to the file and can be used at any place in the block of statements of the with statement, so that any built-in operator and function for files can be applied to this identifier.

Example 1:

with file( "data.bin", "rw" ) as file f_ref {
    f_ref << uint32(0x12345678) << "a-string";
    f_ref.flush();

    uint16 val_1, val_2;
    str val_s;
    f_ref.rewind();
    f_ref >> val_1 >> val_2 >> val_s;
    print( val_1, val_2, val_s );  // prints: 4660 22136 a-string
                                   // (i.e. 0x1234 and 0x5678)
}
// file is automatically closed (and flushed if needed) here
// and can no more be accessed via its reference identifier

Notice: the type declaration for identifier f_ref is not mandatory here, after keyword as. We just add it here to show that it is legal to specify its type, but the statement with - as can be understood as something like a declaration in which the type of the reference is declared in the with clause. See example 2 below.

Example 2:

with file<str>( "typee_MIT_licence.txt", "ro" ) as f_ref ) {
    for( str line in f_ref.read() )
        print( line );
}
// prints the content of some copyright, line after line
// and finally closes the file that contains the licence notice
4.5.8.2 File reference deleting

File reference deletion is not permanent storage deletion. It only releases any logical resources allocated at run time with the file reference in your program. A dedicated built-in library is available to physically delete a file on disk as well as on any other kind of mutable permanent storage. Typee provides two keywords for the purpose of deleting files references into a program: delete and its wrapper del.

delete
del

Closes the operated file. Flushes any maybe pending content into file before closing. Releases from memory any alocated resources associated with file. The file reference is no more available after its deletion. This operation does not delete the file on disk.
Raises exceptions in case of any type of error: flushing error, closing error, not opened file error, unavailable resource, etc. This list of corresponding exception is still to be specified in this documentation.
Example:

// let's open a file
file my_file = file( "path.ext", "rw" );
// four first bytes are set to 0
my_file << uint32(0);
// then, we let Typee automatically:
//   - flush the last written data
//   - close the file
//   - release any allocated resource in memory
delete my_file;
4.5.8.3 File emptying and truncating

Files are containers. As such, they can be emptied. They can also be truncated: everything stored after a specified offset is removed from file the size of which equals then this offset. None of these built-in functions are deleting the file on disk but they get operated at call time on the storage media.

file empty()

Empties the file. CAUTION: this operation physically empties the file on disk whose content cannot be recovered after the operation has been operated. Any content is removed from the operated file. The size of the file is then 0. This operation flushes also any data pending for their writing in Operating System buffers so that no data is later saved in the file.
Returns a reference to the operated file, for this function to be cascadable with other file functions.
Raises FileNotOpenedException if file is not opened and FileAccessException if access has not been granted to file.
Example:

// let's say that file contains  string "abcdef\n"
file<str> my_file = file<str>( "letters.txt", "rw" );
str text = my_file.read();
print( text, my_file.cursor_pos() );  // prints: abcdef
my_file.empty();
my_file >> text;
print( text, my_file.cursor_pos() );  // prints: nothing but a newline
my_file.close();
file trunc()
file trunc( const uint64 pos)

Removes the end of the file from disk:

  • first signature: from current position of index (if file has been declared with a single, fixed-sized, type) or at current position of cursor otherwise;
  • second signature: from specified position of index (if file has been declared with a single, fixed-sized, type) or at specified position of cursor otherwise.

In both cases, the chosen pos is also removed from file on disk. The size of the file is then set accordingly.
CAUTION: this operation operates the file on disk. Removed content cannot be recovered after the file has been operated.
Returns a reference to the operated file, for this function to be cascadable with other file functions.
Raises FileNotOpenedException if file is not opened, FileAccessException if access has not been granted to file, OutOfBoundsException if pos is greater than the current size of the file.
Example:

// let's say that file contains  string "abcdef\n"
file my_file = file( "letters.txt", "rw" );
str text = my_file.read();
print( text, my_file.cursor_pos() );  // prints: "abcdef"
my_file.trunc( 3 );
my_file.rewind();
my_file >> text;
print( text, my_file.cursor_pos() );  // prints: "abc"
my_file.close();
4.5.8.4 Items counting, File size

How many items in a file? What is the size (in bytes, KBytes, MBytes, GBytes) of the content of a file?

const uint64 count()

Returns the number of items contained in a file. The file must have been declared with a fixed size, single type. If not, count() returns the bytes size of the content of the file.
Raises FileNotOpenedException if file is not opened and FileAccessException if access has not been granted to file.
Example:

// let's say next file contains eight float64
file<float64> my_file = file<float64>( "data.bin", "ro" );
print( my_file.count() );  // prints: 8
my_file.close();

// let's say next file contains eight strings of total length 128 characters
file<str> my_file = file<str>( "data.txt", "ro" );
print( my_file.count() );  // prints: 136 (128 characters plus 8 ending '\n')
my_file.close();
const uint64 size()
const float32 size_kb()
const float32 size_mb()
const float32 size_gb()

Returns the size of the content of the file. This is not the size of the file on disk, which most of the time will be greater due to Operating System optimizations on disk storage (files sizes are multiples of blocks which are a few KBytes each). To get the exact size of a file on disk, use the built-in library on Operating Systems as explained in the section on built-in libraries, later in this documentation.
First signature returns the exact count of bytes.
The next three signatures return a float value corresponding to respectively KBytes, MBytes and GBytes.
Raises FileNotOpenedException if file is not opened and FileAccessException if access has not been granted to file.
Example:

// let's say next file contains one million of float64 values
file<float64> my_file = file<float64>( "data.bin", "ro" );
print( my_file.size() );     // prints: 8000000, i.e. 1,000,000 x 8
print( my_file.size_kb() );  // prints: 7812.5 , i.e. 8,000,000 / 1,024
print( my_file.size_mb() );  // prints: 7.62939, i.e. 8,000,000 / 1,0242
print( my_file.size_gb() );  // prints: 0.00745, i.e. 8,000,000 / 1,0243
my_file.close();

4.5.9 Files as arguments

Functions and methods signatures are constituted of the types of their arguments and the type of value they return. Files may be passed as arguments to functions and methods and may be passed back to caller on return. Keyword file is a generic type to specify any kind of file, whatever its size and the maybe declared types for its content.

4.5.10 Typee Serialization Protocol

Serialization is the way items are stored in and then read from files. Typee specifies a protocol for the storage of complex objects such as instances of classes or containers, all of which may contain items of different types, in varying numbers and with one to many associated dimensions. We explain the Typee Serialization Protocol (TSP) in this subsection.

4.5.10.1 Introduction

The Typee Serialization Protocol, or TSP, specifies the storage of complex data in memory, be it permanent (e.g. on disk) or not (e.g. in volatile memory), for its later retrieving and uploading from this storage.

Two kinds of complex data are defined. The first one deals with strings, which are entities of variable length – i.e. a kind of container for sole characters. The second one deals with all the Typee containers types and with any type of objects (i.e. instances of classes).
The storage of strings is simple. We describe it in next sub-section. The storage of containers and objects is a little bit more complex and specifies headers and data payloads. Headers are set to specify the type of data and some other control information (e.g. dimensions of an array). Payloads are the content data.

The coding of headers and payloads is implemented for the storage of complex data. It is specified by TSP and it is explained in the next sub-sections. The decoding of headers and payloads is implemented for the retrieving of complex data from storage. It is specified by TSP and is explained in the next sub-sections also.

The Typee Serialization Protocol is directly implemented in the Typee translators, for it to be totally invisible from the programmers. Writing and reading complex data on files, for instance, involves TSP transparently. Programmers just have to write and read lists or arrays (for instance) as a whole without having to develop any specific lines of code. For writing and reading objects, two dedicated operators have to be implemented by programmers in the definition of serializable classes. We provide examples of programming for this in the last sub-section of this section: very easy to use, as you will see.

4.5.9.2 Serialization of Strings

TSP specification for strings is very simple. It has already been shown in previous examples.

When writing a string into a file, a newline character ('\n') is appended after the end of the string content.
When reading a string from a file, characters are appended to the read string up to the first newline character '\n' found in the file.

Caution: when a string contains newline characters, all of them are written on file and an additional one is appended on file after the end of the string. At reading-time, this string will be naturally split into as many strings as '\n' will be found on file.

4.5.10.3 Serialization of Arrays

Arrays are written in files according to TSP: a header is first written into file, then the content of the array is written in the file.

array header
bytes offsets signification value
0 [bit 7] endianness of header 0b1: big endian,
0b0: little endian
0 [bits 6…0] array serialization uint8( 0b001_0000 – i.e. 0x10 )
1 … 3 contained types see [1] below
4 … 7 first dimension size uint32 – see [2]
4xk … 4xk+3 k-th dimension size uint32, for 1 ≤ kN – see [2]
4xN … 4xN+3 last N-th dimension size uint32 – see [2]
4xN+1 … 4xN+4 end of dimensions sizes uint32( 0x0000_0000 )

[1]: Each of the 24 bits in these 3 bytes represent a built-in type or an instance class. See table in dedicated sub-section at end of this section.
[2]: max dimension size is then 4 GBytes for any dimension, which we expect to be enough for most programs.

array payload

There are two cases here:

a. either the array contains a single type of content
b. or the array contains values of multiple types.

a. single type of content
In this first case, the payload is contituted of successive items, the most internal dimension first and the most external one last. These items are all written (and then read back) according to the endianness that has been specified in the header of the array in file.

b. multiple types for content
In this second case, each item is preceded by a control byte in the file. This control byte specifies the type of the item. The values for this control-type byte are listed at sub-section Built-in types control specification at the end of current section of Typee documentation.

The control byte declares the type of the next item in file plus its endianness. In this very first version of TSP, the bit describing the endianness of next item gets the same value than the endianness bit for the whole array, as present in the array header. We nevertheless keep track locally of this endianness. This might be useful in some future, as well as this may ease some programming when locally processing contained items.

Example 1:

array<uint32>[2][3] my_matrix = [ [0, 1, 2], [10, 11, 12] ];
with file( "matrix.bin", "wo") as f_mat {
    f_mat.write( my_matrix );
}
// file contains:
//  header
//   byte 0: 0x90  - bit 7 set to 1 for big endian + 0x10 for array
//   bytes 1... 3: 0b0000_0000_0000_0010_0000_0000  - bit 9 set to 1 for uint32
//   bytes 4... 7: 0x02_00_00_00  - first dimension size
//   bytes 8...11: 0x03_00_00_00  - second dimension size
//  payload
//   bytes 12...15: 0x00_00_00_00  - 0
//   bytes 16...19: 0x01_00_00_00  - 1
//   bytes 20...23: 0x02_00_00_00  - 2
//   bytes 24...27: 0x0a_00_00_00  - 10
//   bytes 28...31: 0x0b_00_00_00  - 11
//   bytes 32...35: 0x0c_00_00_00  - 12

Example 2:

const array<(uint16,str)>[4] my_data = [ 1024, "ABCDE", 1025, "fgh" ];
with file( "data.bin", "wo" ) as f_bin {
    f_bin << my_data;
}
// file contains:
//  header
//   byte 0: 0x90  - bit 7 set to 1 for big endian + 0x10 for array
//   bytes 1... 3: 0b0000_0000_0000_1001_0000_0000  - bit 11 and 8 set to 1 for str and uint16
//   bytes 4... 7: 0x04_00_00_00  - first dimension size
//  payload
//   bytes  8: 0x88  - bit 7 set to 1 for big endian + 0x08 for uint16
//   bytes  9...10: 0x04_00              - 1024
//   bytes 11: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 12...17: 0x41_42_43_44_45_0d  - "ABCDE\n"
//   bytes 18: 0x88  - bit 7 set to 1 for big endian + 0x08 for uint16
//   bytes 19...20: 0x04_01              - 10<5/span>
//   bytes 21: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 22...25: 0x66_67_68_0d        - "fgh\n"
4.5.10.4 Serialization of Lists

Lists are written in files according to TSP: a header is first written into file, then the content of the list is written in the file.

list header
bytes offsets signification value
0 [bit 7] endianness of header 0b1: big endian,
0b0: little endian
0 [bits 6…0] list serialization uint8( 0b001_0001 – i.e. 0x11 )
1 … 3 contained types see [1] below
4 … 7 items count uint32 – see [2]

[1]: Each of the 24 bits in these 3 bytes represents a built-in type or an instance class. See table in dedicated sub-section at end of this section. If no contained type is specified at declaration time, these three bytes are set to 0x00_00_00.
[2]: max items count for serialized lists is then 4 G-items, which we expect to be enough for most programs.

list payload

There are two cases here:

a. either the list contains a single type of content
b. or the list contains values of multiple types.

a. single type of content
In this first case, the payload is constituted of the successive items values. These items are all written (and then read back) according to the endianness that has been specified in the header of the list in file.

b. multiple types for content
In this second case, each item is preceded by a control byte in the file. This control byte specifies the type of the item. The values for this control-type byte are listed at sub-section Built-in types control specification at the end of current section of Typee documentation.

The control byte declares the type of the next item in file plus its endianness. In this very first version of TSP, the bit describing the endianness of next item gets the same value than the endianness bit for the whole list, as present in the header of the list. We nevertheless keep track locally of this endianness. This might be useful in some future, as well as this may ease some programming when locally procesing contained items.

Example 1:

list<uint32> my_ints = [ 0, 1, 2, 10, 11, 12 ];
with file( "ints.bin", "wo") as f_ints {
    f_ints.write( my_ints );
}
// file contains:
//  header
//   byte 0: 0x91  - bit 7 set to 1 for big endian + 0x11 for list
//   bytes 1... 3: 0b0000_0000_0000_0010_0000_0000 - bit 9 set to 1 for uint32
//   bytes 4... 7: 0x00_00_00_06  - items count
//  payload
//   bytes  8...11: 0x00_00_00_00  - 0
//   bytes 12...15: 0x00_00_00_01  - 1
//   bytes 16...19: 0x00_00_00_02  - 2
//   bytes 20...23: 0x00_00_00_0a  - 10
//   bytes 24...27: 0x00_00_00_0b  - 11
//   bytes 28...31: 0x00_00_00_0c  - 12

Example 2:

const list my_data = [ uint16(1024), "ABCDE", uint32(1025), "fgh" ];
with file( "data.bin", "wo" ) as f_bin {
    f_bin << my_data;
}
// file contains:
//  header
//   byte 0: 0x91  - bit 7 set to 1 for big endian + 0x11 for list
//   bytes 1... 3: 0b0000_0000_0000_0000_0000_0000  - no predefined type
//   bytes 4... 7: 0x00_00_00_04  - items count
//  payload
//   bytes  8: 0x88  - bit 7 set to 1 for big endian + 0x08 for uint16
//   bytes  9...10: 0x04_00              - 1024
//   bytes 11: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 12...17: 0x41_42_43_44_45_0d  - "ABCDE\n"
//   bytes 18: 0x89  - bit 7 set to 1 for big endian + 0x09 for uint32
//   bytes 19...22: 0x00_00_04_01        - 1025
//   bytes 23: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 24...27: 0x66_67_68_0d        - "fgh\n"
4.5.10.5 Serialization of Maps

Maps are written in files according to TSP: a header is first written into file, then the content of the map is written in the file.

map header
bytes offsets signification value
0 [bit 7] endianness of header 0b1: big endian,
0b0: little endian
0 [bits 6…0] map serialization uint8( 0b001_0010 – i.e. 0x12 )
1 … 3 contained keys types see [1] below
4 padding set to 0x00
5 … 7 contained item values types see [1] below
8 … 11 items count uint32 – see [2]

[1]: Each of the 24 bits in these 3 bytes represents a built-in type or an instance class. See table in dedicated sub-section at end of this section. If no contained type is specified at declaration time, these three bytes are set to 0x00_00_00.
[2]: max items count for serialized maps is then 4 G-items, which we expect to be enough for most programs.

map payload

There are three cases here:

a. the map contains a single type for the keys and a single type for the values of its contained items;
b. the map contains keys of multiple types and values of a single type for its contained items;
c. keys and values are both of multiple types for its contained items.

a. single type of content
In this first case, the payload is constituted of the successive pairs (key, item value). These pairs are all written (and then read back) according to the endianness that has been specified in the header of the map in file. The type of the keys and the type of the values are specified in the map header in file.

b. multiple types for keys, single type for values
In this second case, the pairs are written (and then read back) with a control bye specifying the type of the key, then the key content and finally with the value content – for which the type is specified in the map header. See sub-section Built-in types control specification at the end of current section of Typee documentation to get an overview of the content of the key control byte.

c. multiple types for keys, multiple types for values
In this third case, each pair of (key, item value) is preceded by a control byte before the key and a second control byte before the item value, in the file. These control bytes specify the type of the key (first) and the type of the item value (second). The values for these control-type bytes are listed at sub-section Built-in types control specification at the end of current section of Typee documentation.

The control bytes declare the type of the related values (either key or item value) in file plus their endianness. In this very first version of TSP, the bits describing the endiannesses of next map entry get the same value than the endianness bit for the whole map, as present in the header of the map. We nevertheless keep track locally of this endianness. This might be useful in some future, as well as this may ease some programming when locally procesing contained items.

Example 1:

map<uint32> my_map = [ 0:1, 2:10, "11":12 ];
with file( "map.bin", "wo") as f_map {
    f_map.write( my_map );
}
// file contains:
//  header
//   byte 0: 0x92  - bit 7 set to 1 for big endian + 0x12 for map
//   bytes 1... 3: 0b0000_0000_0000_0000_0000_0000  - no keys specified type
//   bytes 4... 7: 0b0000_0000_0000_0010_0000_0000  - bit 9 set to 1 for uint32
//   bytes 8...11: 0x00_00_00_03   - items count
//  payload
//   byte  12: 0x89  - bit 7 set to 1 for big endian + 0x09 for uint32 (default for ints)
//   bytes 13...16: 0x00_00_00_00  - 0, the key
//   bytes 17...20: 0x00_00_00_01  - 1, the item value
//   byte  21: 0x89  - bit 7 set to 1 for big endian + 0x09 for uint32 (default for ints)
//   bytes 22...25: 0x00_00_00_02  - 2, the key
//   bytes 26...29: 0x00_00_00_0a  - 10, the item value
//   byte  30: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str (default for strings)
//   bytes 31...33: 0x31_31_0d     - "11\n", the key
//   bytes 34...37: 0x00_00_00_0c  - 12, the item value
// Notice that the order of items storage may differ
// since maps are not sorted in-place

Example 2:

const map my_map = [ uint16(1024): "ABCDE", uint32(1025): "fgh" ];
with file( "map.bin", "wo" ) as f_map {
    f_map << my_data;
}
// file contains:
//  header
//   byte 0: 0x92  - bit 7 set to 1 for big endian + 0x12 for map
//   bytes 1... 3: 0x00_00_00     - no keys specified type
//   bytes 4... 7: 0x00_00_00_00  - no item values predefined type
//   bytes 8...11: 0x00_00_00_02  - items count
//  payload
//   byte  12: 0x88  - bit 7 set to 1 for big endian + 0x08 for uint16
//   bytes 13...14: 0x04_00              - 1024
//   bytes 15: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 16...21: 0x41_42_43_44_45_0d  - "ABCDE\n"
//   bytes 22: 0x89  - bit 7 set to 1 for big endian + 0x09 for uint32
//   bytes 23...26: 0x00_00_04_01        - 1025
//   bytes 23: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//   bytes 27...30: 0x66_67_68_0d        - "fgh\n"
// Notice that the order of items storage may differ
// since maps are not sorted in-place
4.5.10.6 Serialization of Sets

Sets are written in files according to TSP: a header is first written into file, then the content of the set is written in the file.

set header
bytes offsets signification value
0 [bit 7] endianness of header 0b1: big endian,
0b0: little endian
0 [bits 6…0] set serialization uint8( 0b001_0011 – i.e. 0x13 )
1 … 3 contained types see [1] below
4 … 7 items count uint32 – see [2]

[1]: Each of the 24 bits in these 3 bytes represents a built-in type or an instance class. See table in dedicated sub-section at end of this section. If no contained type is specified at declaration time, these three bytes are set to 0x00_00_00.
[2]: max items count for serialized maps is then 4 G-items, which we expect to be enough for most programs.

set payload

There are two cases here:

a. either the set contains a single type of content for the values of its items
b. or the set contains values of multiple types for the values of its items.

a. single type of content
In this first case, the payload is constituted of the successive items values. These items are all written (and then read back) according to the endianness that has been specified in the header of the set in file.

b. multiple types for content
In this second case, each item is preceded by a control byte in the file. This control byte specifies the type of the item. The values for these control-type bytes are listed at sub-section Built-in types control specification at the end of current section of Typee documentation.

The control bytes declare the type of their related item values in file plus their endianness. In this very first version of TSP, the bits describing the endiannesses of a set item get the same value than the endianness bit for the whole set, as present in the header of the set. We nevertheless keep track locally of this endianness. This might be useful in some future, as well as this may ease some programming when locally procesing contained items.

Example 1:

set<uint32> my_set = [ 0, 1, 2, 10, 11, 12, 1 ];
with file( "set.bin", "wo") as f_set {
    f_set.write( my_set );
}
// file contains:
//  header
//   byte 0: 0x93 (bit 7 set to 1 for big endian + 0x13 for set)
//   bytes 1... 3: 0b0000_0000_0000_0010_0000_0000 (bit 9 set to 1 for uint32)
//   bytes 4... 7: 0x00_00_00_06   - items count
//  payload
//   byte   8...11: 0x00_00_00_00  - 0
//   byte  12...15: 0x00_00_00_01  - 1
//   byte  16...19: 0x00_00_00_02  - 2
//   byte  20...23: 0x00_00_00_0a  - 10
//   byte  24...27: 0x00_00_00_0b  - 11
//   byte  28...31: 0x00_00_00_0c  - 12
// Notice that the order of the items storage may differ since
// set items are randomly picked from their containing set

Example 2:

const set my_set = [ uint16(1024), "ABCDE", uint32(1025), "fgh" ];
with file( "set.bin", "wo" ) as f_set {
    f_set << my_set;
}
// file contains:
//   header
//      byte 0: 0x93  - bit 7 set to 1 for big endian + 0x13 for set
//      bytes 1... 3: 0b0000_0000_0000_0000_0000_0000  - no predefined type
//      bytes 4... 7: 0x04_00_00_00  - items count
//   payload
//      byte   8: 0x88  - bit 7 set to 1 for big endian + 0x08 for uint16
//      bytes  9...10: 0x04_00              - 1024
//      bytes 11: 0x8b  - bit 7 set to 1 for big endian + 0x0b for str
//      bytes 12...17: 0x41_42_43_44_45_0d  - "ABCDE\n"
//      bytes 18: 0x89  - bit 7 set to 1 for big endian + 0x09 for uint32
//      bytes 19...22: 0x00_00_04_01        - 1025
//      bytes 23: 0x8b  - bit 7 set to 1 for big endian + 0x08 for str
//      bytes 24...27: 0x66_67_68_0d        - "fgh\n"
// Notice that the order of the items storage may differ since
// set items are randomly picked from their containing set
4.5.10.7 Serialization of Objects

Objects, in Typee as in any other OOP (Object Oriented Programming) language, are instances of classes. Classes define methods that are applied to their instances and attributes that characterize their instances. Instances are created by calling any of the constructors that a class defines. Attributes can be of any type, scalar, container or other class. Writing the attributes of an object in a file involves then the writing of attributes of many different types.

The TSP protocol is quite simple for this purpose. It is up to the programmer to specify the way attributes are written in (and then read from) files. This is specified via the two operators << for writing in file and >> for reading from file. When these two operators are defined in a class, this class is said to be serializable. Of course, if instances of other classes are attributes of a class, all of those other classes must be serializable also. Let’s have a few examples.

Example 1:

class MyClassA {

    public uint8 val;
 
    MyClassA(){
       me.val = 0;
    }

    file operator << ( file f ) {
       // writes current instance in file
       f <<= me.val;
       return f;
    }

    file operator >> ( file f ) {
       // reads current instance from file
       f >>= me.val;
       return f;
    }

}

In this example, the two operators << and >> are defined with a file as their sole argument. Remember: file is a generic type for any type of files, even the ones which have been declared with typed content. So, these operators are generic and will be applied when writing or reading an instance of class MyClassA into or from any type of file.
Notice: Both operators have to return the reference to the operated file. This is mandatory for these operators to be cascadable.

Furthermore, when writing things like this – see next code – the operators << and >> defined in MyClassA are automatically called:

MyClassA a = MyClassA();
with file( "somepath", "rw" ) as f {
    f.write( a );  // same as f << a;
    f.read( a );   // same as f >> a;
}

This was a simple example. Let’s see another one, a little bit more complex, when composition enters the game.

Example 2:

class MyClassB {
 
    MyClassB(){
       me.text = "some default text";
       me.a    = MyClassA(); // as defined in Example 1
    }

    file operator << ( file f ) {
       // writes current instance in file
       f << me.a << me.text;
       return f;
    }
     
    file operator >> ( file f ) {
       // reads current instance from file
       f >> me.a >> me.text;
       return f;
    }

    public MyClassA  a;
    protected str  text;
}

Here also, the two operators << and >> are defined with a file as their sole argument. They embed also the writing and the reading of the instance of class MyClassA that is contained as an attribute in MyClassB. It is the responsability of the programmer to write and read any attribute of a class. It might be that some attributes won’t have to be written (and then read back).
In the case of composition, the operator applied to file f will be the one defined in the compositing class (i.e. here, MyClassA).

Ok, this was about composition, but what’s up with inheritance?

Example 3:

class MyClassC : MyClassB {

    MyClassC(){
        MyClassB(); // as defined in Example 2
        me.val_f = 0.0;
    }

    file operator << ( file f ) {
       // writes current instance in file
       f << me.val_f;
       MyClassB.operator << ( f, me );
       return f;
    }
     
    file operator >> ( file f ) {
       // reads current instance from file
       f >> me.val_f;
       MyClassB.operator >> ( f, me );
       return f;
    }

    public float32 val_f;
}

Here again, the two operators << and >> are defined with a file as their sole argument. They embed also the writing and the reading of the inherited attributes which have not to be named as long as the inherited class is serializable itself. The syntax is then the one shown in the example:
MyClassB.operator << ( f, me ); and MyClassB.operator >> ( f, me );
and might evolve one day, in a further version ot Typee, to become something like:
f << me.MyClassB and f >> me.MyClassB

In the case of inheritance, the operator applied to file f will be the one defined in the inherited class (i.e. here, MyClassB).

4.5.10.8 Headers – Type bits specification

Bytes at offsets 1 to 3 of headers contain 24 bits. Each of them represents a different type. They are set to 0 by default and set to 1 when the corresponding type is a valid type for the content of the array (i.e. it has been declared as a contained type at array declaration time).

bit number contained type
0 bool
1 char
2 char16
3 int8
4 int16
5 int32
6 int64
7 uint8
8 uint16
9 uint32
10 uint64
11 str
12 str16
13 array
14 list
15 map
16 set
17 object
18 … 23 0b0000_00 padding
4.5.10.9 Payloads – Built-in types control specification

When payloads are of multiple types, a control byte precedes every contained item value. This control byte content is specified in next table.

byte control value (bits 0 … 6) contained type
0x00 bool
0x01 char
0x02 char16
0x03 int8
0x04 int16
0x05 int32
0x06 int64
0x07 uint8
0x08 uint16
0x09 uint32
0x0a uint64
0x0b str
0x0c str16
0x10 array
0x11 list
0x12 map
0x13 set
0x20 object
byte control value (bit 7) endianness
0b0 little endian
0b1 big endian

Important notice: it may be that the types of values are not specified in program. This will be the case with scalar values when they are not typed. In such cases, the next algorithm is used to determine at write-time the type of a scalar value.

if scalar value is an integer:

  • uint32 is used if value is positive or null and can be coded in at most 4 bytes
  • int32 is used if value is negative and can be coded in at most 4 bytes
  • uint64 is used if value is positive or null and cannot be coded in at most 4 bytes
  • int64 is used if value is negative and cannot be coded in at most 4 bytes

if scalar value is a float:

  • float32 is used if exponent is in interval [-37, +38]
  • float64 is used otherwise

if scalar value is a string:

  • str is used by default
  • str16 is used as soon as a contained character cannot be coded into one byte
4.5.11 Conclusion

This finally ends the Typee documentation on files. It was a long journey. Have a look at Learn Typee to get examples of programming files in Typee. Have a look also at Files built-in library to get explained about many goodies when using files in Typee.

Notice: up to now, no asynchronous built-in access to files is provided in the Typee. Every call to functions, methods and operators are then blocking. Asynchronicity is proposed in the built-in library Files which is described in the dedicated sub-section of next section. See below.

Next section documents the built-in libraires provided with Typee. This section will evolve over time, when new built-in libraries will have been delivered.

< previous (4.4 built-in sets) | (5. Built-in Libraries) next >