惭愧!!昨天我还居然算不出一道小学数学题! >>
<< Add brackets to RegExp as necessary
Encoding between GB2312 and UTF-8

Author Zhou Renjian Create@ 2004-11-12 09:10
whizz Note icon
Transform GB2312 bytes to UTF-8 bytes, and then UTF-8 bytes to GB2312 bytes, now the GB2312 bytes is not the same as the original GB2312 bytes. But GB2312 bytes to ISO-8859-1 bytes, and then ISO-8859-1 bytes to GB2312 bytes, the GB2312 bytes is still the same as the original GB2312 bytes. I don't know why. I am using Linux with JDK 1.4.2. So I have use the following nassy codes to transform GB2312 bytes into UTF-8 bytes. And after such transformation, I can turn UTF-8 bytes into GB2312 bytes and the GB2312 bytes is the same as the original GB2312 bytes.

    /**
     * For some tests under Linux, I found that <code>new String(String.getBytes(), "utf-8")</code> did not
     * work. That is why this #nativeToUTF8 is here.
     *  
     * @param str gb2312/iso-8859-1 encoded String
     * @return utf-8 encoded String
     * @throws Exception IOException or UnsupportedEncodedException will occurs
     * Exceptions is thrown and not caught inside is for developer to use this
     * method carefully.
     */
    public static String nativeToUTF8(String str) throws Exception {
        File f = new File(System.getProperty("java.io.tmpdir") + File.separator + Math.random());
        FileOutputStream fos = new FileOutputStream(f);
        fos.write(str.getBytes("utf-8"));
        fos.close();
        FileInputStream fis = new FileInputStream(f);
        ByteArrayOutputStream buffer = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        int readLength = 0;
        while (readLength != -1) {
            readLength = fis.read(buf);
            if (readLength != -1) {
                buffer.write(buf, 0, readLength);
            }
        }
        fis.close();
        f.delete();
        return new String(buffer.toByteArray(), "utf-8");
    }

本记录所在类别:
本记录相关记录: