java - Punycode for Unicode query parameter -
i trying encode unicode urls punycode. these urls have query parameter contains non-ascii characters, example:
https://en.wiktionary.org/w/index.php?title=clœlia&printable=yes
the problem is, when try in java, resulting url wrong:
string link = "https://en.wiktionary.org/w/index.php?title=clœlia&printable=yes"; link = idn.toascii(link); // -> link = http://en.wiktionary.org/w/index.xn--php?title=cllia&printable=yes-hgf
if way, resulting string different (i don't know why), wrong:
string link = "http://en.wiktionary.org/w/index.php?title=" + idn.toascii("clœlia") + "&printable=yes"; // -> link = http://en.wiktionary.org/w/index.php?title=xn--cllia-ibb&printable=yes
if copy address chrome , paste here, url, want:
https://en.wiktionary.org/w/index.php?title=cl%c5%93lia&printable=yes
what did wrong here?
what did wrong use punycode. punycode used domain names, including domain-name part of url, only.
other parts of url, including query-parameter part, use percent encoding known url encoding or uri encoding, , chrome doing; encodes non-ascii unicode characters in utf-8, , octets aren't in limited subset of ascii using percent-sign (%) , 2 hex digits; octets 80-ff used utf-8 non-ascii %-encoded. exact query-parameter part , other parts use slight variant defined html form submission application/x-www-form-urlencoded
; encodes space plus-sign '+' instead of %20, unambiguous because '+' in unsafe set encoded %2b.
in java use java.net.urlencoder.encode
, java.net.urldecoder.decode
this; reliable results use newer 2-arg forms encoding name "utf-8".
Comments
Post a Comment