TOC PREV NEXT INDEX

Put your logo here!


5 Character Sets (cset.hhf)


The HLA Standard Library contains several routines that provide the power of the HLA compile-time character set facilities at run-time (i.e., within your programs).

HLA uses a 128-bit bitmap (16 consecutive bytes) to implement sets of seven-bit ASCII characters. This has a very important implication: you cannot pass byte values greater than $7F to a character set function. Currently, the HLA Standard Library routines do not check for values out of range (for performance reasons). In the future, this checking may be added as a compilable option. For the time being, however, it is your responsibility to verify that all character values are in the range #$0..#$7F (and, in general, #$0 is an exceeding bad value to specify in many cases since the null character terminates strings).

The bitmap consists of 128 consectutive bits numbered 0..127. If a bit in a character set is one, then the corresponding character (whose ASCII code matches the bit number) is a member of the character set. Conversely, if a bit is zero, the corresponding character is not a member of the set.

Note that many routines pass character sets by value. This means you can pass HLA character set constants as parameters to these procedures/functions. HLA emits four MOV (doubleword) instructions to copy a character set by value, so passing character sets by value is not horribly

inefficient (though not quite as fast as a 32-bit integer!).

Warning: All of the character set routines are members of the cs namespace. This means you cannot use the name cs within your programs. (cs is a common character set name that lazy programmers use; sorry, it's already been taken!)

The following sections describe each of the character set routines in the HLA Standard Library.

5.1 Predicates (tests)

Although the "returns" value for each of the following functions is "AL", these tests always set EAX to zero or one. Therefore, you may refer to the AL or EAX register after these tests, whichever is more convenient for you. If you use instruction composition and bury one of these function calls in another statement, that statement will use the AL register as the operand.

Note that these functions generally pass their character set parameters by value. This involves pushing 16 bytes on the stack for each cset parameter (typically four push instructions). Keep this in mind if efficiency is your utmost concern.

cs.IsEmpty( src: cset ); @returns( "AL" ); 
 

This function returns true (1) in the AL register if the specified character set is empty (has no members). It returns false (0) in AL/EAX otherwise.

cs.member( c:char; theSet:cset ); @returns( "AL" ); 
 

This function returns true (1) or false (0) in AL/EAX if the specified character is a member of the specified character set.

cs.subset( src1:cset; src2:cset ); @returns( "AL" ); 
 
cs.superset( src1:cset; src2:cset ); @returns( "AL" );
 
cs.psubset( src1:cset; src2:cset ); @returns( "AL" ); 
 
cs.psuperset( src1:cset; src2:cset ); @returns( "AL" ); 
 
 cs.eq( src1:cset; src2:cset ); @returns( "AL" ); 
 
cs.ne( src1:cset; src2:cset ); @returns( "AL" );
 

These functions determine if one set is equal to another, or if one set is a subset or superset of another. They all return the boolean values true (1) or false (0) in AL/EAX if the relationship holds.

The cs.subset function returns true if src1 <= src2 (that is, all of src1's members are members of src2).

The cs.psubset (proper subset) function returns true if src1 < src2 (that is, all of src1's members are members of src2 but src1 <> src2).

The cs.ssuperset function returns true if src1 >= src2 (that is, all of src2's members are members of src1).

The cs.spsuperset (proper superset) function returns true if src1 > src2 (that is, all of src2's members are members of src1 but src2 <> src1).

The cs.eq and cs.ne return the appropriate values based upon the equality of the two sets.

5.2 Character Set Construction and Manipulation

cs.empty( var dest:cset );
 

This function clears all the bits in a character set to create the empty set. Note that the single character set parameter is passed by reference.

cs.cpy( src:cset; var dest:cset );
 

This routine copies the data from the source character set (src) to the destination character set (dest). Note that the dest set is passed by reference. Although this routine is convenient, you should consider writing a macro to do this same function (copy 16 bytes from src to dest) if you call this function in time critical sections of your code.

procedure cs.charToCset( c:char; var dest:cset ); 
 

The cs.charToCset procedure takes the character passed as a parameter and creates a singleton set containing that character (a singleton is a set with exactly one member). The resulting set is stored into the destination parameter (which is passed by reference).

procedure cs.rangeChar( first:char; last:char; var dest:cset ); 
 

This function creates a set whose member range between the first character specified and the last character specified. For example, cs.rangeChar( 'A', 'Z', UpperCaseSet) will create a character set whose members are the upper case alphabetic characters. Any previous members in the destination set are lost.

procedure cs.strToCset( s:string; var dest:cset );
 

This function first sets the destination character set to the empty set. Then it "unions in" all the characters found in the string parameter to the destination set.

procedure cs.strToCset2( s:string; offs:uns32; var dest:cset );
 

This function first sets the destination character set to the empty set. Then it "unions in" all the characters found in the string parameter starting at offset offs to the destination set.

procedure cs.extract( var dest:cset ); @returns( "EAX" );
 

This function removes a single character from the character set and returns that character in the AL register. Currently, this function removes characters by order of their ASCII character codes (that is, each call returns the character in the set with the lowest ASCII code). However, you should not make this assumption. You should assume that this function could return the characters in an arbitrary order. If the specified character set is empty, this routine returns -1 ($FFFF_FFFF) in the EAX register; in all other cases the H.O. three bytes of EAX contain zero upon return.

Note: unlike the HLA compile-time function "@extract", this function actually removes the character from the character set ("@extract" leaves the character in the set). Keep this in mind. (In the future, the name of the HLA @extract function will probably be changed to something else to clean up this conflict.)

5.3 Set Operations

cs.setunion( src:cset; var dest:cset ); 
 

This function computes the union of two sets, storing the result back into the destination set. Note that the destination set parameter is passed by reference.

Note: The name "setunion" was used rather than the more obvious choice of "union" because "union" is an HLA reserved word.

cs.intersection( src:cset; var dest:cset ); 
 

This function computes the set intersection of the two sets passed as parameters and stores the result back into the destination set. Note that the dest parameter is passed by reference.

cs.difference( src:cset; var dest:cset ); 
 

This function computes the set difference of two sets (i.e., the members in the destination set that are not also members of the source set). It stores the result back into the dest set (which is passed by reference).

cs.complement( src:cset; var dest:cset ); 
 

This function computes the set complement of a set (i.e., the members in the destination set are those elements that are not in the source set.). It stores the complemented version of the set in the destination operand (which is passed by reference).

procedure cs.unionChar( c:char; var dest:cset ); 
 

The cs.unionChar function adds the character (supplied as a parameter) to the specified destination character set (passed by reference). If the character was already a member of the set, this function does not affect the character set.

procedure cs.removeChar( c:char; var dest:cset ); 
 

This function removes a single character from the specified destination set (passed by reference). If the character was not previously a member of the destination set, this function does not affect that set.

procedure cs.unionStr( s:string; var dest:cset );
 

This function will union in all the characters in a string to the destination set. Unlike the cs.strToCset function, this function does not clear the destination character set before processing the characters in the string.

procedure cs.unionStr2( s:string; offs:uns32;  offs:uns32; var dest:cset );
 

This function will union in all the characters in a string to the destination set. Unlike the cs.unionStr function, this function starts at character position offs in s rather than at character position zero.

procedure cs.removeStr( s:string; var dest:cset );
 

This function removes characters found in the string from the specified character set. If a character in the string was not previously a member of the character set, the specified character has no effect on the destination set.

procedure cs.removeStr2( s:string; offs:uns32; var dest:cset );
 

This function removes characters found in the string at character position offs and beyond from the specified character set. If a character in the string was not previously a member of the character set, the specified character has no effect on the destination set.



TOC PREV NEXT INDEX