What does the 'standalone' directive mean in XML? – Dev

The best answers to the question “What does the 'standalone' directive mean in XML?” in the category Dev.

QUESTION:

What does the ‘standalone‘ directive mean in an XML document?

ANSWER:

  • The standalone directive is an optional attribute on the XML declaration.
  • Valid values are yes and no, where no is the default value.
  • The attribute is only relevant when a DTD is used. (The attribute is irrelevant when using a schema instead of a DTD.)
  • standalone="yes" means that the XML processor must use the DTD for validation only. In that case it will not be used for:
    • default values for attributes
    • entity declarations
    • normalization
  • Note that standalone="yes" may add validity constraints if the document uses an external DTD. When the document contains things that would require modification of the XML, such as default values for attributes, and standalone="yes" is used then the document is invalid.
  • standalone="yes" may help to optimize performance of document processing.

Source: The standalone pseudo-attribute is only relevant if a DTD is used

ANSWER:

The standalone declaration is a way of telling the parser to ignore any markup declarations in the DTD. The DTD is thereafter used for validation only.

As an example, consider the humble <img> tag. If you look at the XHTML 1.0 DTD, you see a markup declaration telling the parser that <img> tags must be EMPTY and possess src and alt attributes. When a browser is going through an XHTML 1.0 document and finds an <img> tag, it should notice that the DTD requires src and alt attributes and add them if they are not present. It will also self-close the <img> tag since it is supposed to be EMPTY. This is what the XML specification means by “markup declarations can affect the content of the document.” You can then use the standalone declaration to tell the parser to ignore these rules.

Whether or not your parser actually does this is another question, but a standards-compliant validating parser (like a browser) should.

Note that if you do not specify a DTD, then the standalone declaration “has no meaning,” so there’s no reason to use it unless you also specify a DTD.

ANSWER:

The intent of the standalone=yes declaration is to guarantee that the information inside the document can be faithfully retrieved based only on the internal DTD, i.e. the document can “stand alone” with no external references. Validating a standalone document ensures that non-validating processors will have all of the information available to correctly parse the document.

The standalone declaration serves no purpose if a document has no external DTD, and the internal DTD has no parameter entity references, as these documents are already implicitly standalone.

The following are the actual effects of using standalone=yes.

  • Forces processors to throw an error when parsing documents with an external DTD or parameter entity references, if the document contains references to entities not declared in the internal DTD (with the exception of replacement text of parameter entities as non-validating processors are not required to parse this); amp, lt, gt, apos, and quot are the only exceptions

  • When parsing a document not declared as standalone, a non-validating processor is free to stop parsing the internal DTD as soon as it encounters a parameter entity reference. Declaring a document as standalone forces non-validating processors to parse markup declarations in the internal DTD even after they ignore one or more parameter entity references.

  • Forces validating processors to throw an error if any of the following are found in the document, and their respective declarations are in the external DTD or in parameter entity replacement text:

    • attributes with default values, if they do not have their value explicitly provided
    • entity references (other than amp, lt, gt, apos, and quot)
    • attributes with tokenized types, if the value of the attribute would be modified by normalization
    • elements with element content, if any white space occurs in their content

A non-validating processor might consider retrieving the external DTD and expanding all parameter entity references for documents that are not standalone, even though it is under no obligation to do so, i.e. setting standalone=yes could theoretically improve performance for non-validating processors (spoiler alert: it probably won’t make a difference).


The other answers here are either incomplete or incorrect, the main misconception is that

The standalone declaration is a way of telling the parser to ignore any markup declarations in the DTD. The DTD is thereafter used for validation only.

standalone=”yes” means that the XML processor must use the DTD for validation only.

Quite the opposite, declaring a document as standalone will actually force a non-validating processor to parse internal declarations it must normally ignore (i.e. those after an ignored parameter entity reference). Non-validating processors must still use the info in the internal DTD to provide default attribute values and normalize tokenized attributes, as this is independent of validation.

ANSWER:

standalone describes if the current XML document depends on an external markup declaration.

W3C describes its purpose in “Extensible Markup Language (XML) 1.0 (Fifth Edition)”:

  • 2.9 Standalone Document Declaration