UNICODE Strings Functions

HMG Unicode versions 3.1.x related

Moderator: Rathinagiri

User avatar
srvet_claudio
Posts: 1911
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Has thanked: 25 times
Been thanked: 92 times
Contact:

UNICODE Strings Functions

Post by srvet_claudio » Mon Dec 10, 2012 10:25 pm

Hi All.
Researching in the source files of Harbour I found strings functions that work with UNICODE/ANSI and only ANSI.
This list may be incomplete.
Best regards,
Claudio Soto.

******************************************************
Functions that support UNICODE and ANSI strings
******************************************************

* harbour/src/rtl/chruni.c
/* Unicode(character) and Binary(byte) string functions: */
HB_UCHAR( <nCode> ) -> <cText> // return string with U+nCode character in HVM CP encoding
HB_BCHAR( <nCode> ) -> <cText> // return 1 byte string with <nCode> value

HB_UCODE( <cText> ) -> <nCode> // return unicode value of 1-st character (not byte) in given string
HB_BCODE( <cText> ) -> <nCode> // return value of 1-st byte in given string

HB_ULEN( <cText> ) -> <nChars> // return string length in characters
HB_BLEN( <cText> ) -> <nBytes> // return string length in bytes

HB_UPEEK( <cText>, <n> ) -> <nCode> // return unicode value of <n>-th character in given string
HB_BPEEK( <cText>, <n> ) -> <nCode> // return value of <n>-th byte in given string

HB_UPOKE( [@]<cText>, <n>, <nVal> ) -> <cText> // change <n>-th character in given string to unicode <nVal> one and return modified text
HB_BPOKE( [@]<cText>, <n>, <nVal> ) -> <cText> // change <n>-th byte in given string to <nVal> and return modified text

HB_USUBSTR( <cString>, <nStart>, <nCount> ) -> <cSubstring>
HB_BSUBSTR( <cString>, <nStart>, <nCount> ) -> <cSubstring>

HB_ULEFT( <cString>, <nCount> ) -> <cSubstring>
HB_BLEFT( <cString>, <nCount> ) -> <cSubstring>

HB_URIGHT( <cString>, <nCount> ) -> <cSubstring>
HB_BRIGHT( <cString>, <nCount> ) -> <cSubstring>

HB_UAT( <cSubString>, <cString>, [<nFrom>], [<nTo>] ) -> <nAt>
HB_BAT( <cSubString>, <cString>, [<nFrom>], [<nTo>] ) -> <nAt>



* harbour/src/rtl/hbtoken.c
HB_TOKENCOUNT()
HB_TOKENGET()
HB_TOKENPTR()
/* like HB_TOKENGET() but returns next token starting from passed position (0 based) inside string, f.e.: HB_TOKENPTR( cString, @nTokPos, Chr( 9 ) ) -> cToken */
HB_ATOKENS()

* harbour/src/rtl/memofile.c
MEMOREAD()
MEMOWRIT()
HB_MEMOREAD()
// not limited to 64 KB as MEMOREAD()
HB_MEMOWRIT() // not limited to 64 KB as MEMOWRIT()

* harbour/src/rtl/mlcfunc.c
/* warning <nLineLength> is in bytes, <nLineLength> must be greater than the number of bytes of the longest line of text in UTF-8 */
MEMOLINE( <cString>, [ <nLineLength>=79 ], [ <nLineNumber>=1 ], [ <nTabSize>=4 ], [ <lWrap>=.T. ], [ <cEOL>|<acEOLs> ] ) -> <cLine>
MLCOUNT ( <cString>, [ <nLineLength>=79 ], [ <nTabSize>=4 ], [ <lWrap>=.T. ], [ <cEOL>|<acEOLs> ] ) -> <nLines>
MLPOS ( <cString>, [ <nLineLength>=79 ], [ <nLineNumber>=1 ], [ <nTabSize>=4 ], [ <lWrap>=.T. ], [ <cEOL>|<acEOLs> ] ) -> <nLinePos>
/*
MLCTOPOS() // not support UTF-8
MPOSTOLC() // not support UTF-8
*/

* harbour/src/rtl/mtran.c
MEMOTRAN()

* harbour/src/rtl/replic.c
REPLICATE() /* returns n copies of given string */

* harbour/src/rtl/strc.c
HB_STRDECODESCAPE ( <cEscSeqStr> ) -> <cStr> /* decode string with \ escape sequences */

HB_STRCDECODE ( <cStr> [, @<lCont> ] ) -> <cResult> | NIL /* decode string using C compiler rules */
/* If second parameter <lCont> is passed by reference then it allows to decode multiline strings.
In such case <lCont> is set to .T. if string ends with unclosed "" quoting.
Function returns decoded string or NIL on syntax error. */

* harbour/src/rtl/strmatch.c
HB_WILDMATCH (cPattern, cValue [, lExact] ) /* compares two strings */
/* Compares cValue with cPattern.
cPattern * may contain wildcard characters (?*)
When lExact is TRUE then it will check if whole cValue is covered by cPattern
else it will check if cPattern is a prefix of cValue */

HB_WILDMATCHI (cPattern, cValue) /* compares two strings */
/* Compares cValue with cPattern
Check if whole cValue is covered by cPattern */

HB_FILEMATCH (cFileName, cPattern)
/* eg. HB_FILEMATCH ("picture.bmp", "*.bmp") ---> return TRUE if file exist */
/* eg. HB_FILEMATCH ("c:\image\picture.bmp", "picture.bmp") ---> return TRUE if file exist */

* harbour/src/rtl/strtoexp.c
HB_STRTOEXP() /* convert string to valid macrocompiler expression */

* harbour/src/rtl/strtran.c
STRTRAN()

* harbour/src/rtl/trim.c
LTRIM()
RTRIM()
TRIM() /* synonymn for RTRIM */
ALLTRIM()

* harbour/src/vm/hvm.c
/* operator $ */
<cSubStr> $ <cStr> /* return TRUE if <cSubStr> is contained in <cStr> */

* harbour/src/rtl/cdpapihb.c
HB_STRTOUTF8 (<cStr> [, <cCPID> ] ) -> <cUTF8Str>
HB_UTF8TOSTR (<cUTF8Str> [, <cCPID> ] ) -> <cStr>

* <cCPID> is Harbour codepage id, f.e.: "EN", "ES", "ESWIN", "PLISO", "PLMAZ", "PL852", "PLWIN", ...
* When not given then default HVM codepage (set by HB_SETCODEPAGE()) is used.

HB_TRANSLATE ( <cSrcText>, [<cPageFrom>], [<cPageTo>] ) --> cDstText /* is used usually to convert between the Dos and the Windows code pages of the same language */

HB_UTF8CHR ()
HB_UTF8ASC ()
HB_UTF8AT ()

HB_UTF8RAT () /* NOTE: In HB_UTF8RAT we are still traversing from left to right, as it would be required anyway to determine the real string length */
HB_UTF8SUBSTR ()
HB_UTF8LEFT ()
HB_UTF8RIGHT ()
HB_UTF8PEEK ()
HB_UTF8POKE ()
HB_UTF8STUFF ()
HB_UTF8LEN ()
HB_UTF8STRTRAN()
/* equal to STRTRAN() */


/* Miscellaneous Functions */
* --------------------------

All functions STRINGS related to DATE and TIME
SPACE()
/* returns n copies of a single space */
STR()
STRZERO ()
TYPE()
VAL()
HB_VALTOSTR()
/* converts any data type to STR*/
VALTYPE()
HB_ISSTRING()
HB_ISCHAR()
HB_ISMEMO()
Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com

User avatar
esgici
Posts: 4268
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Has thanked: 145 times
Been thanked: 54 times
Contact:

Post by esgici » Mon Dec 10, 2012 10:34 pm

Thanks Dr.

Regards
Viva INTERNATIONAL HMG :D

User avatar
Rathinagiri
Posts: 5077
Joined: Tue Jul 29, 2008 6:30 pm
DBs Used: MariaDB, SQLite, SQLCipher and MySQL
Location: Sivakasi, India
Has thanked: 89 times
Been thanked: 103 times
Contact:

Post by Rathinagiri » Tue Dec 11, 2012 12:09 am

Thanks a lot Claudio. It shows that there is a long way to go...
East or West HMG is the Best.
South or North HMG is worth.
...the possibilities are endless.

User avatar
bpd2000
Posts: 908
Joined: Sat Sep 10, 2011 4:07 am
Location: India
Has thanked: 79 times
Been thanked: 14 times

Post by bpd2000 » Tue Dec 11, 2012 3:26 am

Thank you Claudio Soto for sharing
BPD
HMG Convert Dream into Reality

User avatar
srvet_claudio
Posts: 1911
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Has thanked: 25 times
Been thanked: 92 times
Contact:

Post by srvet_claudio » Tue Dec 11, 2012 3:45 am

***********************************
Functions that support only ANSI strings
***********************************


/* All these functions work with CodePages using custom character indexes (ANSI CodePage), do not support UNICODE. */

* harbour/src/rtl/at.c
HB_AT()
AT()

* harbour/src/rtl/ati.c
HB_ATI()
* harbour/src/rtl/hbstrsh.c
HB_STRSHRINK()
* harbour/src/rtl/left.c
LEFT()
* harbour/src/rtl/len.c
LEN()
* harbour/src/rtl/pad.c
PAD() /* synonymn for PADR */
* harbour/src/rtl/padc.c
PADC()
* harbour/src/rtl/padl.c
PADL()
* harbour/src/rtl/padr.c
PADR()
* harbour/src/rtl/rat.c
RAT()
HB_RAT()

* harbour/src/rtl/right.c
RIGHT()
* harbour/src/rtl/strcase.c
LOWER()
UPPER()

* harbour/src/rtl/stuff.c
STUFF()
* harbour/src/rtl/sub str.c
SUB STR()
* harbour/src/rtl/transfrm.c
TRANSFORM()
Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com

User avatar
srvet_claudio
Posts: 1911
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Has thanked: 25 times
Been thanked: 92 times
Contact:

Post by srvet_claudio » Sun May 19, 2013 4:35 pm

Hi Friends.

Please, see this post: viewtopic.php?p=26698#p26698

Best regards,
Claudio.
Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com

User avatar
Pablo César
Posts: 4036
Joined: Wed Sep 08, 2010 1:18 pm
Location: Curitiba - Brasil
Has thanked: 99 times
Been thanked: 169 times

Post by Pablo César » Fri Jul 26, 2013 2:16 am

Dear Claudio, functions like these:

- All functions STRINGS related to DATE and TIME (DTOC,DTOS...)
- Trim, AllTrim, LTrim
- Str

Are not included in list of your instructions ?
srvet_claudio wrote:To develop applications that support ANSI and UNICODE, we must abandon those text functions that only support ANSI character set. Here is a partial list of equivalences:

Code: Select all

           ANSI/UNICODE               ANSI Only
 
-          HMG_LEN()             <=>   LEN()
-          HMG_LOWER()           <=>   LOWER()
-          HMG_UPPER()           <=>   UPPER()
-          HMG_PADC()            <=>   PADC()
-          HMG_PADL()            <=>   PADL()
-          HMG_PADR()            <=>   PADR()
-          HMG_ISALPHA()         <=>   ISALPHA()
-          HMG_ISDIGIT()         <=>   ISDIGIT()
-          HMG_ISLOWER()         <=>   ISLOWER()
-          HMG_ISUPPER()         <=>   ISUPPER()
-          HMG_ISALPHANUMERIC()  <=>   RETURN (ISALPHA(c) .OR. ISDIGIT(c))

Harbour native functions: 
-------------------------
HB_USUBSTR()      <=>   SUBSTR()
HB_ULEFT()        <=>   LEFT()
HB_URIGHT()       <=>   RIGHT()
HB_UAT()          <=>   AT()
HB_UTF8RAT()      <=>   RAT()
HB_UTF8STUFF()    <=>   STUFF()


You have to replace in the source code all functions which only supports ANSI character set for ANSI/UNICODE equivalent functions.
HMGing a better world
"Matter tells space how to curve, space tells matter how to move."
Albert Einstein

User avatar
srvet_claudio
Posts: 1911
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Has thanked: 25 times
Been thanked: 92 times
Contact:

Post by srvet_claudio » Fri Jul 26, 2013 2:19 pm

Pablo César wrote:Dear Claudio, functions like these:

- All functions STRINGS related to DATE and TIME (DTOC,DTOS...)
- Trim, AllTrim, LTrim
- Str

Are not included in list of your instructions ?
Pablo,
this is only a partial list that is included in the changelog with the most used functions to prevent errors when developing programs.
Functions such as date and time, alltrim, etc., are the same for ANSI and for ANSI/Unicode and therefore does not make sense to put them here.
Just have to replace the functions that support only ANSI for functions that support ANSI/Unicode
Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com

User avatar
Pablo César
Posts: 4036
Joined: Wed Sep 08, 2010 1:18 pm
Location: Curitiba - Brasil
Has thanked: 99 times
Been thanked: 169 times

Post by Pablo César » Fri Jul 26, 2013 7:34 pm

srvet_claudio wrote:Functions such as date and time, alltrim, etc., are the same for ANSI and for ANSI/Unicode and therefore does not make sense to put them here.
Ok, It is clear now. Thank you !
HMGing a better world
"Matter tells space how to curve, space tells matter how to move."
Albert Einstein

User avatar
Pablo César
Posts: 4036
Joined: Wed Sep 08, 2010 1:18 pm
Location: Curitiba - Brasil
Has thanked: 99 times
Been thanked: 169 times

Post by Pablo César » Wed Oct 05, 2016 12:17 pm

Claudio and rest,

Please note to be included as part of "Harbour native functions" in our DOC/HMG UNICODE:

hb_UPadL(), hb_UPadR() and hb_UPadC()

Please read this

Keeping informed
HMGing a better world
"Matter tells space how to curve, space tells matter how to move."
Albert Einstein

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest