Data formats
XML
Electronic archiving means classifying and storing as much of a text's semantic content as possible. It does not mean simply making a layout-based representation of a text as happens e.g. when files are archived in PDF or HTML format. By contrast, if a file is saved in XML format, this reveals the inner structure of the document, i.e. a part of the semantic content of the text, and you obtain a new dimension of the document.
The additional information level allows machine-based searches for an item of information in the digital archives, i.e. it is no longer necessary for a person to do this him/herself. If a search is made e.g. for a specific source, author or title which is referred to in the content, this query can be conducted in an XML encoded archive within seconds. The XML format also allows automatic post-structuring, i.e. further machine-supported content classification.
The XML format also fulfils additional important conditions for providing a reliable and secure basis for the long-term preservation of electronic files with regard to future developments. In the long-term, it is unlikely that the original layout of a dissertation will be retained (e.g. discontinued support for certain fonts). All editors can display the content of XML document regardless of the platform (content fidelity rather than page fidelity).
The main advantages of XML files include
- XML is not dependent on specific operating system or hardware
- it is convertible to other file formats with no data loss
- it is accessible in any presentation form, both for display on a screen or printing on paper
- XML is supported by the ISO (International Organization for Standardization) and is issued and recommended by W3C (World Wide Web Consortium) as a standard.
Exploitation of these properties lends the online documents extended accessibility. XML is an openly accessible non-proprietary format, a quasi-standard. Its structural and formatting information are strictly separated, and it is very well suited for ensuring reliable, long-term accessibility. The document can automatically be adapted to comply with current screen and printing formats.
XMLVersion 1.0 - W3C
standard
XMLVersion
1.1 - W3C standard
XML standard of W3C in
German
SELFHTML- Introduction to
XML (available only in German)