java - Punycode for Unicode query parameter -


i trying encode unicode urls punycode. these urls have query parameter contains non-ascii characters, example:

https://en.wiktionary.org/w/index.php?title=clœlia&printable=yes 

the problem is, when try in java, resulting url wrong:

string link = "https://en.wiktionary.org/w/index.php?title=clœlia&printable=yes"; link = idn.toascii(link);  // -> link = http://en.wiktionary.org/w/index.xn--php?title=cllia&printable=yes-hgf 

if way, resulting string different (i don't know why), wrong:

string link = "http://en.wiktionary.org/w/index.php?title=" + idn.toascii("clœlia") + "&printable=yes";  // -> link = http://en.wiktionary.org/w/index.php?title=xn--cllia-ibb&printable=yes 

if copy address chrome , paste here, url, want:

https://en.wiktionary.org/w/index.php?title=cl%c5%93lia&printable=yes 

what did wrong here?

what did wrong use punycode. punycode used domain names, including domain-name part of url, only.

other parts of url, including query-parameter part, use percent encoding known url encoding or uri encoding, , chrome doing; encodes non-ascii unicode characters in utf-8, , octets aren't in limited subset of ascii using percent-sign (%) , 2 hex digits; octets 80-ff used utf-8 non-ascii %-encoded. exact query-parameter part , other parts use slight variant defined html form submission application/x-www-form-urlencoded; encodes space plus-sign '+' instead of %20, unambiguous because '+' in unsafe set encoded %2b.

in java use java.net.urlencoder.encode , java.net.urldecoder.decode this; reliable results use newer 2-arg forms encoding name "utf-8".


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -