Aku no Hikari
12-10-2009, 12:36 PM
Hey everybody. I want to write a JavaScript function that accepts a Japanese character and returns its type. (Hiragana, Katakana, Kanji, Alphanumeric, or Symbol.) Pretty simple. It's a matter of character code ranges. All I need to know is the code range of those characters and we're done.
The problem is that I Googled "shift-jis table" and I got a page like the following one. A whole Shift-JIS table, but none of the values are correct for JavaScript. None of them are even close.
http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml
For example, according to this table, the Hiragana range should be 33438 (hex: 829E) to 33518 (hex: 82EE). But Hiragana characters seem to have values between 12353 and 12435... Not even close! I've written a function to figure out the code ranges of each of Hiragana and Katakana characters... See the following code:
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-us" xml:lang="en-us">
<head>
<title>Code Range FFFUUUUUUUU</title>
<script type="text/javascript">
/*<![CDATA[*/
function convert() {
input = document.getElementById("input").value;
output = "";
for (i=0; i<input.length; i++) {
x = input[i].charCodeAt(0);
if (x == 10) {
output += "<br />";
} else {
output += input[i] + "(" + x + ") ";
}
}
document.getElementById("container").innerHTML = output;
}
/*]]>*/
</script>
</head>
<body>
<div id="container">Input Shift-JIS Kanji:<br />
<textarea id="input" rows="20" cols="80">
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
`1234567890-=~!@#$%^&*()_+
[];',./{}:"<>?\|
あいうえおぁぃぅぇぉかきくけこがぎぐげご
さしすせそざじずぜぞたちつてとだぢづでど
なにぬねのはひふへほばびぶべぼぱぴぷぺぽ
まみむめもやゆよゃゅょらりるれろわをんっ
アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ
アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
‘1234567890-=
~!@#$%^&*()_+
</textarea>
<br /><button onclick="convert();">Convert</button>
</div>
</body>
</html>
I could figure out the code range of Hiragana and Katakana characters... but hell I'll spend the rest of my life on this if I want to know the code range of the 50,000 Kanji characters!! :banghead:
I guess I need a formal answer on this. Kthxbye.
EDIT: You can try the above code on Firefox. I have no idea if it works on Google Chrome or IE8.
The problem is that I Googled "shift-jis table" and I got a page like the following one. A whole Shift-JIS table, but none of the values are correct for JavaScript. None of them are even close.
http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml
For example, according to this table, the Hiragana range should be 33438 (hex: 829E) to 33518 (hex: 82EE). But Hiragana characters seem to have values between 12353 and 12435... Not even close! I've written a function to figure out the code ranges of each of Hiragana and Katakana characters... See the following code:
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-us" xml:lang="en-us">
<head>
<title>Code Range FFFUUUUUUUU</title>
<script type="text/javascript">
/*<![CDATA[*/
function convert() {
input = document.getElementById("input").value;
output = "";
for (i=0; i<input.length; i++) {
x = input[i].charCodeAt(0);
if (x == 10) {
output += "<br />";
} else {
output += input[i] + "(" + x + ") ";
}
}
document.getElementById("container").innerHTML = output;
}
/*]]>*/
</script>
</head>
<body>
<div id="container">Input Shift-JIS Kanji:<br />
<textarea id="input" rows="20" cols="80">
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
`1234567890-=~!@#$%^&*()_+
[];',./{}:"<>?\|
あいうえおぁぃぅぇぉかきくけこがぎぐげご
さしすせそざじずぜぞたちつてとだぢづでど
なにぬねのはひふへほばびぶべぼぱぴぷぺぽ
まみむめもやゆよゃゅょらりるれろわをんっ
アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ
アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
‘1234567890-=
~!@#$%^&*()_+
</textarea>
<br /><button onclick="convert();">Convert</button>
</div>
</body>
</html>
I could figure out the code range of Hiragana and Katakana characters... but hell I'll spend the rest of my life on this if I want to know the code range of the 50,000 Kanji characters!! :banghead:
I guess I need a formal answer on this. Kthxbye.
EDIT: You can try the above code on Firefox. I have no idea if it works on Google Chrome or IE8.