%CHARCOUNT (Return the Number of Characters)
%CHARCOUNT (Return the Number of Characters)
%CHARCOUNT(string)
%CHARCOUNT returns the number of natural characters in the alphanumeric, graphic, or UCS-2 expression. This may be different from the number of bytes or double bytes that is returned by %LEN if the operand is one of the following:
- UTF-16, data type UCS-2 with CCSID(*UTF16) (or CCSID(1200)).
- UTF-8, data type CHAR with CCSID(*UTF8) (or CCSID(1208)).
- EBCDIC with mixed SBCS and DBCS data.
- ASCII with mixed SBCS and DBCS data.
See Processing string data by the natural size of each character.
Note: The %CHARCOUNT built-in function always returns the number of natural characters, even if the data type and CCSID of the operand is not usually relevant due to the CHARCOUNTTYPES.
Example of %CHARCOUNT with a UTF-8 value
UTF-8 data can have 1, 2, 3, or 4 bytes per character. Character ‘ç’ has two bytes. Characters ‘a’ and ‘b’ have one byte.
- The string ‘abç’ has four bytes. %LEN(string) returns 4.
- The string ‘abç’ has three characters. %CHARCOUNT(string) returns 3.
DCL-S string VARCHAR(20) CCSID(*UTF8);
DCL-S n INT(10);
string = 'abç';
n = %len(string); // 1
// n = 4
n = %charcount(string); // 2
// n = 3
Example of %CHARCOUNT with a mixed SBCS/DBCS EBCDIC value
The DBCS sections of the data are surrounded by shift characters. The DBCS data begins with the shift-out character x’0E’ and ends with the shift-in character x’0F’.
- The string x’81820E4CB10F8384’ has eight bytes.
- The comment below the hexadecimal literal indicates the character associated with the hexadecimal literal, where “o” indicates the shift-out character and “i” indicates the shift-in character. The hexadecimal value x’E4CB’ represents a single DBCS character.
- The second comment below the literal indicates the start of each natural character in the string.
- The string x’81820E4CB10F8384’ has eight bytes. %LEN(string) returns 8.
- The string has five characters. %CHARCOUNT(string) returns 5.
DCL-S string VARCHAR(20) CCSID(937);
DCL-S n INT(10);
string = x'81820E4CB10F8384'; // 1
// a b o D D i c d
// 1 2 3 4 5
n = %len(string); // 2
// n = 8
n = %charcount(string); // 3
// n = 5