Problem reading Unicode file
Moderator: Rathinagiri
- Clip2Mania
- Posts: 99
- Joined: Fri Jun 13, 2014 7:16 am
- Location: Belgium
Problem reading Unicode file
Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it.
Suggestions?
Thx,
Erik
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it.
Suggestions?
Thx,
Erik
- Attachments
-
- test_unicode.zip
- (999 Bytes) Downloaded 520 times
Re: Problem reading Unicode file
Refer attached demo
You have to save file using Encoding UTF-8
Also refer
viewtopic.php?f=7&t=3689&p=34140&hilit= ... F+8#p34140
You have to save file using Encoding UTF-8
Also refer
viewtopic.php?f=7&t=3689&p=34140&hilit= ... F+8#p34140
- Attachments
-
- DemoUni.rar
- (603 Bytes) Downloaded 565 times
BPD
Convert Dream into Reality through HMG
Convert Dream into Reality through HMG
- Clip2Mania
- Posts: 99
- Joined: Fri Jun 13, 2014 7:16 am
- Location: Belgium
Re: Problem reading Unicode file
Yes, I saw the demo & read post previously. That is exactly the issue. I cannot save the file in UTF-8, because it comes from an external program (EAC). I have a lot of these files and need to read them, so manually opening & saving each file is way too much work for my customer. Furthermore, I want to save him the complexityYou have to save file using Encoding UTF-8
In the mean time, found a command-line conversion tool on the web (http://www.autohotkey.com/board/topic/9 ... icode-cmd/. which allows to do this. I use "execute file" command to convert each file first. It's not really beautiful, but it kinda works...
Last edited by Clip2Mania on Tue Jul 22, 2014 11:02 am, edited 1 time in total.
- srvet_claudio
- Posts: 2193
- Joined: Thu Feb 25, 2010 8:43 pm
- Location: Uruguay
- Contact:
Re: Problem reading Unicode file
Hi Erik,Clip2Mania wrote:Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it.
Suggestions?
Thx,
Erik
the problem is that you file is in Unicode UTF16LE (Unicode of Windows) and HMG work with UTF8,
see this code:
Code: Select all
#include "hmg.ch"
FUNCTION Main()
cText := HMG_UTF16LE_TO_UTF8 ("test_unicodeUTF16LE.txt")
MsgInfo (cText)
RETURN NIL
#pragma BEGINDUMP
#define UNICODE
#include "HMG_UNICODE.h"
#include <windows.h>
#include "hbapi.h"
HB_FUNC ( HMG_UTF16LE_TO_UTF8 )
{
TCHAR *FileName = (TCHAR *) HMG_parc (1);
HANDLE hFile;
DWORD nFileSize;
DWORD nReadByte;
hFile = CreateFile (FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE)
return;
nFileSize = GetFileSize (hFile, NULL);
if (nFileSize == INVALID_FILE_SIZE)
{ CloseHandle (hFile);
return;
}
TCHAR cBuffer [ nFileSize ];
ReadFile (hFile, cBuffer, nFileSize, &nReadByte, NULL);
CloseHandle (hFile);
HMG_retc (cBuffer);
}
#pragma ENDDUMP
- esgici
- Posts: 4543
- Joined: Wed Jul 30, 2008 9:17 pm
- DBs Used: DBF
- Location: iskenderun / Turkiye
- Contact:
Re: Problem reading Unicode file
Simply another approach :
Code: Select all
/*
Convert big-endian Unicode string to ANSI
CAUTION : Use only for big-endian Unicode string !
*/
#include <hmg.ch>
PROCEDURE Main
MsgBox( UniBE2UT8( HB_MEMOREAD( "test_unicode.txt" ) ) )
RETURN
FUNCTION UniBE2UT8( cBigEndianStr ) // Convert big-endian Unicode string to ANSI
RETURN ( SUBSTR( STRTRAN( cBigEndianStr, CHR(0), '' ), 3 ) )
Viva INTERNATIONAL HMG
- Clip2Mania
- Posts: 99
- Joined: Fri Jun 13, 2014 7:16 am
- Location: Belgium
Re: Problem reading Unicode file
Fantastic, thanks gentlemen for the effort!
There is a problem with both codes, however
Mr. esgici's code does not read to the end of the file but stops somewhere
Dr. Claudio's code reads too much (see the "garbage" characters at the end of the file)
Not beautiful in Msgbox, but I can filter out in my code.
There is a problem with both codes, however
Mr. esgici's code does not read to the end of the file but stops somewhere
Dr. Claudio's code reads too much (see the "garbage" characters at the end of the file)
Not beautiful in Msgbox, but I can filter out in my code.
- Attachments
-
- dr. claudio's result
- unicode_claudio.jpg (95.1 KiB) Viewed 12837 times
-
- mr esgici's result
- unicode_esgici.jpg (22.82 KiB) Viewed 12837 times
- esgici
- Posts: 4543
- Joined: Wed Jul 30, 2008 9:17 pm
- DBs Used: DBF
- Location: iskenderun / Turkiye
- Contact:
Re: Problem reading Unicode file
There isn't such truncate problem in my method and upper extra characters in Claudio's method at my side And physically there isn't such extra (letter or not) characters into your fileClip2Mania wrote:...
Mr. esgici's code does not read to the end of the file but stops somewhere
...
If you made this test on another file, please send me it.
Regards
Viva INTERNATIONAL HMG
- Clip2Mania
- Posts: 99
- Joined: Fri Jun 13, 2014 7:16 am
- Location: Belgium
Re: Problem reading Unicode file
Mr esgici,
the trouble is in the accents/special characters (it always is )
I tried adding 'SET CODEPAGE TO UNICODE' at the beginning of the program, but that does not change anything.
the trouble is in the accents/special characters (it always is )
I tried adding 'SET CODEPAGE TO UNICODE' at the beginning of the program, but that does not change anything.
- Attachments
-
- test2.jpg (9.92 KiB) Viewed 12817 times
-
- Chanson_EAC.zip
- (1.12 KiB) Downloaded 460 times
- Clip2Mania
- Posts: 99
- Joined: Fri Jun 13, 2014 7:16 am
- Location: Belgium
Re: Problem reading Unicode file
It is true, I added the éèçàôù characters in the file, because they are very common. In the example above,
if you leave them out, you will see that they are not correctly translated further in the file.
if you leave them out, you will see that they are not correctly translated further in the file.
- Attachments
-
- original.jpg (95.82 KiB) Viewed 12816 times
- esgici
- Posts: 4543
- Joined: Wed Jul 30, 2008 9:17 pm
- DBs Used: DBF
- Location: iskenderun / Turkiye
- Contact:
Re: Problem reading Unicode file
You are right, my conversion method not convenient to your needs
Viva INTERNATIONAL HMG