PDA

View Full Version : [Help - JS] Japanese Characters' Code Ranges



Aku no Hikari
12-10-2009, 12:36 PM
Hey everybody. I want to write a JavaScript function that accepts a Japanese character and returns its type. (Hiragana, Katakana, Kanji, Alphanumeric, or Symbol.) Pretty simple. It's a matter of character code ranges. All I need to know is the code range of those characters and we're done.

The problem is that I Googled "shift-jis table" and I got a page like the following one. A whole Shift-JIS table, but none of the values are correct for JavaScript. None of them are even close.

http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml

For example, according to this table, the Hiragana range should be 33438 (hex: 829E) to 33518 (hex: 82EE). But Hiragana characters seem to have values between 12353 and 12435... Not even close! I've written a function to figure out the code ranges of each of Hiragana and Katakana characters... See the following code:



<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-us" xml:lang="en-us">
<head>
<title>Code Range FFFUUUUUUUU</title>

<script type="text/javascript">
/*<![CDATA[*/

function convert() {
input = document.getElementById("input").value;
output = "";

for (i=0; i<input.length; i++) {
x = input[i].charCodeAt(0);
if (x == 10) {
output += "<br />";
} else {
output += input[i] + "(" + x + ") ";
}
}

document.getElementById("container").innerHTML = output;
}

/*]]>*/
</script>
</head>
<body>
<div id="container">Input Shift-JIS Kanji:<br />
<textarea id="input" rows="20" cols="80">
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
`1234567890-=~!@#$%^&*()_+
[];',./{}:"<>?\|

あいうえおぁぃぅぇぉかきくけこがぎぐげご
さしすせそざじずぜぞたちつてとだぢづでど
なにぬねのはひふへほばびぶべぼぱぴぷぺぽ
まみむめもやゆよゃゅょらりるれろわをんっ

アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ

アイウエオァィゥェォカキクケコガギグゲゴ
サシスセソザジズゼゾタチツテトダヂヅデド
ナニヌネノハヒフヘホバビブベボパピプペポ
マミムメモヤユヨャュョラリルレロワヲンッ

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
‘1234567890-=
~!@#$%^&*()_+

</textarea>
<br /><button onclick="convert();">Convert</button>
</div>
</body>
</html>

I could figure out the code range of Hiragana and Katakana characters... but hell I'll spend the rest of my life on this if I want to know the code range of the 50,000 Kanji characters!! :banghead:

I guess I need a formal answer on this. Kthxbye.

EDIT: You can try the above code on Firefox. I have no idea if it works on Google Chrome or IE8.

AzureDark
12-11-2009, 12:14 AM
JS works in Unicode, and this unicode chart (http://macchiato.com/unicode/chart/) saves my butt more times than I can count.

However if you're insisting on SJIS then I dunno...

Aku no Hikari
12-11-2009, 04:34 PM
Yeah right... JavaScript works in Unicode... X_X

Anyways, thanks for the link. Totally saved my butt.

Actually, it's a good thing that JavaScript always works in Unicode. You don't have to worry about encoding issues.