Data formats

XML

Electronic archiving means classifying and storing as much of a text's semantic content as possible. It does not mean simply making a layout-based representation of a text as happens e.g. when files are archived in PDF or HTML format. By contrast, if a file is saved in XML format, this reveals the inner structure of the document, i.e. a part of the semantic content of the text, and you obtain a new dimension of the document.

The additional information level allows machine-based searches for an item of information in the digital archives, i.e. it is no longer necessary for a person to do this him/herself. If a search is made e.g. for a specific source, author or title which is referred to in the content, this query can be conducted in an XML encoded archive within seconds. The XML format also allows automatic post-structuring, i.e. further machine-supported content classification.

The XML format also fulfils additional important conditions for providing a reliable and secure basis for the long-term preservation of electronic files with regard to future developments. In the long-term, it is unlikely that the original layout of a dissertation will be retained (e.g. discontinued support for certain fonts). All editors can display the content of XML document regardless of the platform (content fidelity rather than page fidelity).

The main advantages of XML files include

Exploitation of these properties lends the online documents extended accessibility. XML is an openly accessible non-proprietary format, a quasi-standard. Its structural and formatting information are strictly separated, and it is very well suited for ensuring reliable, long-term accessibility. The document can automatically be adapted to comply with current screen and printing formats.

XMLVersion 1.0 - W3C standard
XMLVersion 1.1 - W3C standard
XML standard of W3C in German
SELFHTML- Introduction to XML (available only in German)

 
E-Mail-IconUta Ackermann