More about urlencode and urldecode
What you want is possible but it is considerably more work that it is practical to put in. Just say decode and let PHP do the calculations
Anyway, thanks for an interesting question. Researching it taught me about both how UTF-8 works and about URL encoding in general.
First, link to an explanation of URL encoding:
http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
(disclosure: it’s written by someone I know
Secondly, here is how to find the character from a URL encoding – manually!
Your character above – “我” (according to babelfish.altavista.com it means “I” in Chinese, if you can’t see it in your browser try to copy this and paste in your address bar: javascript:’<html>我
First tool we use is the Windows calculator: open it and change to Scientific mode in the View menu. Then choose “Hex” format and type the hex value from above (simply strip out the % -signs): e68891.
Now click the “Bin” option to get the binary value of this hexadecimal number. Copy it and paste it in Notepad.
111001101000100010010001
This is the binary, UTF-8 encoded string. We want to un-UTF-8 it to find the Unicode value. Here is a technical documentation for UTF-8:
ftp://ftp.isi.edu/in-notes/rfc2279.txt
First, start at the end of the string add linebreaks for each 8 digits.
11100110
10001000
10010001
From the first line, remove all the initial 1 – digits. From each of the next lines, remove the inital “10″ – it will now look like this:
00110
001000
010001
Remove the line breaks and put it all on one line again:
00110001000010001
Copy that whole string and go back to the calculator. It should still be on “Binary” format, so just paste this new string.
If you now click “Dec” (for decimal or “normal” format), this is the exact number given in your first post because your browser translated a character not supported in the POST encoding to a HTML entity – 25105.
Next, click “Hex”. The calculator will say “6211″. Now open the Windows “character map” utility. Activate “Advanced view” if it doesn’t show the “Go to Unicode” box. Then, in the “Go to Unicode” box type 6211. Voila, it shows the character you are looking for.
I’m sure you agree it is simpler to just type <? urldecode(’%E6%88%91′) ?>
Posted on November 14th, 2008 by Denie
Filed under: Linux, Scripting




















































Leave a Reply