转载地址:www.jianshu.com/p/8828105802c4
网络上下载的好多文本是GBK的编码,在mac电脑上打开是乱码
除了下载有多种编码的文本编辑器外,通过终端也可以进行转码
iconv -f GBK -t UTF-8 原文件名 > 随便起个名
文本内容的编码:用Terminal 里的 iconv 命令批量转换文本编码到UTF8. 步骤:
1. 新建一个文件夹,在下载目录下创建文件夹 encoding ,然后将要转换的文本拉入这个文件夹。
2. 打开终端,输入以下代码:
AlbertdeAir:Downloads Albert$ pwd /Users/Albert/Downloads AlbertdeAir:Downloads Albert$ cd /Users/Albert/Downloads/encoding AlbertdeAir:encoding Albert$ find *.txt -exec sh -c "iconv -f GB18030 -t UTF8 {} > {}.txt" \;
上面的 -f 后的参数是针对GB18030,如果你转换前的编码为GB2312,将 GB18030 代替为 GB2312 即可。
搞定啦。 encoding 文件夹里出现了一批 .txt.txt 扩展名的文本,这些就是转换后的了,拉进iPhone即可。
具体的操作方法
打开终端
AlbertdeAir:~ Albert$ iconv –help Usage: iconv [OPTION…] [-f ENCODING] [-t ENCODING] [INPUTFILE…] or: iconv -l</p> <p>Converts text from one encoding to another encoding.</p> <p>Options controlling the input and output format: -f ENCODING, –from-code=ENCODING the encoding of the input -t ENCODING, –to-code=ENCODING the encoding of the output</p> <p>Options controlling conversion problems: -c discard unconvertible characters –unicode-subst=FORMATSTRING substitution for unconvertible Unicode characters –byte-subst=FORMATSTRING substitution for unconvertible bytes –widechar-subst=FORMATSTRING substitution for unconvertible wide characters</p> <p>Options controlling error output: -s, –silent suppress error messages about conversion problems</p> <p>Informative output: -l, –list list the supported encodings –help display this help and exit –version output version information and exit</p> <p>Report bugs to <bug-gnu-libiconv@gnu.org>.
最常用的几个参数
重要参数是:
-f是表示从什么编码,后面跟编码 -t是表示转换到什么编码,后面跟编码 >表示从哪个文件保存为哪个文件
较少用的参数:
-c 从输出中忽略无效的字符 -o, –output=FILE 输出文件 -s, –silent 关闭警告
iconv支持哪些编码:
AlbertdeAir:~ Albert$ iconv –list ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII UTF-8 UTF8 UTF-8-MAC UTF8-MAC ISO-10646-UCS-2 UCS-2 CSUNICODE UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11 UCS-2LE UNICODELITTLE ISO-10646-UCS-4 UCS-4 CSUCS4 UCS-4BE UCS-4LE UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7 UCS-2-INTERNAL UCS-2-SWAPPED UCS-4-INTERNAL UCS-4-SWAPPED C99 JAVA CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1 ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2 ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3 ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4 CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 CSISOLATINGREEK HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5 ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6 ISO-8859-11 ISO8859-11 ISO_8859-11 ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7 ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8 ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9 ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10 KOI8-R CSKOI8R KOI8-U KOI8-RU CP1250 MS-EE WINDOWS-1250 CP1251 MS-CYRL WINDOWS-1251 CP1252 MS-ANSI WINDOWS-1252 CP1253 MS-GREEK WINDOWS-1253 CP1254 MS-TURK WINDOWS-1254 CP1255 MS-HEBR WINDOWS-1255 CP1256 MS-ARAB WINDOWS-1256 CP1257 WINBALTRIM WINDOWS-1257 CP1258 WINDOWS-1258 850 CP850 IBM850 CSPC850MULTILINGUAL 862 CP862 IBM862 CSPC862LATINHEBREW 866 CP866 IBM866 CSIBM866 MAC MACINTOSH MACROMAN CSMACINTOSH MACCENTRALEUROPE MACICELAND MACCROATIAN MACROMANIA MACCYRILLIC MACUKRAINE MACGREEK MACTURKISH MACHEBREW MACARABIC MACTHAI HP-ROMAN8 R8 ROMAN8 CSHPROMAN8 NEXTSTEP ARMSCII-8 GEORGIAN-ACADEMY GEORGIAN-PS KOI8-T CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154 MULELAO-1 CP1133 IBM-CP1133 ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1 CP874 WINDOWS-874 VISCII VISCII1.1-1 CSVISCII TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993 ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208 ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990 CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988 CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280 CN-GB-ISOIR165 ISO-IR-165 ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987 EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS CP932 ISO-2022-JP CSISO2022JP ISO-2022-JP-1 ISO-2022-JP-2 CSISO2022JP2 CN-GB EUC-CN EUCCN GB2312 CSGB2312 GBK CP936 MS936 WINDOWS-936 GB18030 ISO-2022-CN CSISO2022CN ISO-2022-CN-EXT HZ HZ-GB-2312 EUC-TW EUCTW CSEUCTW BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5 CP950 BIG5-HKSCS:1999 BIG5-HKSCS:2001 BIG5-HKSCS BIG5-HKSCS:2004 BIG5HKSCS EUC-KR EUCKR CSEUCKR CP949 UHC CP1361 JOHAB ISO-2022-KR CSISO2022KR CP856 CP922 CP943 CP1046 CP1124 CP1129 CP1161 IBM-1161 IBM1161 CSIBM1161 CP1162 IBM-1162 IBM1162 CSIBM1162 CP1163 IBM-1163 IBM1163 CSIBM1163 DEC-KANJI DEC-HANYU 437 CP437 IBM437 CSPC8CODEPAGE437 CP737 CP775 IBM775 CSPC775BALTIC 852 CP852 IBM852 CSPCP852 CP853 855 CP855 IBM855 CSIBM855 857 CP857 IBM857 CSIBM857 CP858 860 CP860 IBM860 CSIBM860 861 CP-IS CP861 IBM861 CSIBM861 863 CP863 IBM863 CSIBM863 CP864 IBM864 CSIBM864 865 CP865 IBM865 CSIBM865 869 CP-GR CP869 IBM869 CSIBM869 CP1125 EUC-JISX0213 SHIFT_JISX0213 ISO-2022-JP-3 BIG5-2003 ISO-IR-230 TDS565 ATARI ATARIST RISCOS-LATIN1</p> <p>
查看版本
AlbertdeAir:~ Albert$ iconv –version iconv (GNU libiconv 1.11) Copyright © 2000-2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Bruno Haible.
Comments
comments powered by zero