| JREF Homepage | Swift Blog | Events Calendar | $1 Million Paranormal Challenge | The Amaz!ng Meeting | Useful Links | Support Us |
![]() |
|
|
|
|||||||
| Notices |
| Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today. |
|
|
#1 |
|
Graduate Poster
Join Date: Jun 2004
Posts: 1,912
|
What represents an "å" in an XML file?
I'm using an application that let's me make notes about stuff. Those notes are stored in an xml file. Suppose I type an "å" that gets saved in the file. How do I figure out how what data represents that "å" in the file?
If I open the file in a text editor, the å appears as Ã¥ (Wordpad) or A¥ ("vi"). If I open it in a browser, it looks like an å. If I display the contents of the file using "cat -A" in a cygwin bash shell, the å is displayed as M-CM-%. The first line in the file is <?xml version="1.0" encoding="UTF-8"?>. (I also need to figure out what represents Å, ä, Ä, ö and Ö). |
|
|
|
|
#2 |
|
Penultimate Amazing
Join Date: Aug 2001
Posts: 42,804
|
Å is the last letter in the Danish alphabet.
|
|
__________________
SkepticReport.com |
|
|
|
|
|
#3 |
|
Graduate Poster
Join Date: Jun 2004
Posts: 1,912
|
It's also the third from last in the Swedish alphabet. What's your point?
|
|
|
|
|
#4 |
|
BOFH
Join Date: Jun 2003
Location: Sheffield
Posts: 8,243
|
Wordpad and vi don't know about UTF-8 just the standard ascii character sets AFAIK. Only open it with tools that do recognise the UTF-8 (i.e xml-aware tools) and you'll be okay
|
|
__________________
Aphorism: Subjects most likely to be declared inappropriate for humor are the ones most in need of it. -epepke |
|
|
|
|
|
#5 |
|
Graduate Poster
Join Date: Jun 2004
Posts: 1,912
|
Thanks, but I don't want to display it "correctly". I just want to know what this application puts into the file to represent those characters.
|
|
|
|
|
#6 |
|
Critical Thinker
Join Date: Jul 2007
Location: Stuck in Old Europe and the 80s, where the music is better than today
Posts: 310
|
You need to find a better text editor ;)
The encoding of the file is UTF-8, which means (in a quite rough approximation), that the "low" characters 32-127 will be represented by their original characters, and the "special" characters above 127 are represented by two or more characters, starting with the "escape character" that you see as "A". That's why it works in the browser (which knows how to properly handle UTF-8 escapes) and not in Wordpad (which isn't multi-byte aware).
Since I'm on a Mac here, and using TextWrangler for bare-bone text file editing (TW is freeware and has a slew of encodings it understands), I'm not quite sure what would be the right software under Windows. I'd start looking into HTML editors, since they usually know how to handle/switch between various encodings. Good luck... This here has more info about the rules of unicode and utf than any sane person should ever want to know... http://www.cl.cam.ac.uk/~mgk25/unicode.html |
|
__________________
"I may not know what's right / but I know this can't be it. I'm never satisfied / when the answers could be real." Title: Unsatisfaction - by: Men Without Hats |
|
|
|
|
|
#7 |
|
Critical Thinker
Join Date: Oct 2005
Location: London
Posts: 421
|
To see how the application stores the character, first look it up here. This gives the unicode value for the character.
The application you used stores data in UTF-8 format. Using the information here, you can convert the unicode into UTF-8. With regards to 'å', it has a unicode value of E5. Converting it into UTF-8 results in C3A5. This is what is stored in the file. If this value is treated as 2 characters by applications that do not understand UTF-8 then it will appear as 'Ã¥'. I hope this helps. |
|
|
|
|
#8 |
|
Thinker
Join Date: Aug 2003
Location: Reading, UK
Posts: 223
|
|
|
|
|
|
#9 |
|
Mafia Penguin
Join Date: Dec 2007
Location: Netherlands
Posts: 10,323
|
If you use vim (instead of vi), that is able to recognize the UTF-8 encoding and display correctly these characters. Instead of entering the å character, you could also enter the numeric character reference å . That will get stored as such in the XML file, but an application that just has to display the text will display it as å.
|
|
__________________
Proud member of the Solipsistic Autosycophant's Group |
|
|
|
|
|
#10 |
|
Graduate Poster
Join Date: Jan 2008
Posts: 1,447
|
Hi
Get Microsoft's XML Notepad. It's a free download. I'm pretty sure that you will find that it's represented by, "å." |
|
__________________
But it does me no injury for my neighbor to say there are twenty gods or no God. It neither picks my pocket nor breaks my leg. -----Thomas Jefferson, Notes on Virginia, 1782 Question with boldness even the existence of a god; because if there be one he must approve of the homage of reason more than that of blindfolded fear. -----Thomas Jefferson, Letter to Peter Carr, August 10, 1787 |
|
|
|
|
|
#11 |
|
Thinker
Join Date: Aug 2003
Location: Reading, UK
Posts: 223
|
Just to clarify: what I've written here is the sequences of hex numbers you'll see if you open the UTF-8 XML file with a hex editor.
According to the meta tags of this HTML page, the encoding used is ISO-8859-1, so you'd see a different set of hex numbers by putting this page through a hex editor. |
|
|
|
|
#12 |
|
Graduate Poster
Join Date: Jun 2004
Posts: 1,912
|
Thanks for all the information guys. My problem was actually solved by the next software update of that application before I had time to try the most interesting suggestions, but at least I learned something.
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|