Does the following propositions hold: For every DTD there is an XSD that defines exactly the same language, and for every XSD there is a DTD that defines exactly the same language. Or put another way: The collection of languages defined by any DTD is exactly the the collection of languages defined by any XSD?


Expanding on the question a little: An XML document is basically a large string. A language is a collection of strings. For example, the (infinite) set of all MathML documents is a language, and so is the set of all RSS documents and so on. MathML (RSS, ...) is also a proper subset of the (infinite) set of all XML documents. You can use DTD or XSD to define such a subset of XML.

稍微扩展一下这个问题:XML文档基本上是一个大字符串。语言是字符串的集合。例如,所有MathML文档的(无限)集合都是一种语言,所有RSS文档的集合也是如此。 MathML(RSS,...)也是所有XML文档(无限)集的适当子集。您可以使用DTD或XSD来定义这样的XML子集。

Now, every DTD defines exactly one language. But if you think of all possible DTDs, you get a set of languages. My question is, is this set exactly the same as the one you get from all possible XSDs? If so, then DTD and XSD are equivalent in the sense that the scope of XML languages defined by either is equal.


Why is this question important? If both DTD and XSD are equivalent then it is possible to write a program that takes a DTD as input and gives you an equivalent XSD, and another program that does the opposite. I know there are quite a few programs out there that claim to do exactly this, but I'm in doubt whether or not that's actually possible.


An interesting question; well asked!


The answer is "no", in both directions.


Here is a DTD which has no equivalent in XSD:


The set of character sequences accepted by this DTD includes both and &egbdf;, but not &beadgcf;.

该DTD接受的字符序列集包括 &egbdf; ,但不包括 &beadgcf; 。

Since XSD validation operates on an information set in which entities have all already been expanded, no XSD schema can distinguish the third case from the second.


A second area where DTDs can express constraints not expressible in XSD involves NOTATION types. I won't give an example; the details are too complicated for me to remember them correctly without looking them up, and not interesting enough to make me want to do so.


A third area: DTDs treat namespace attributes (aka namespace declarations) and general attributes in the same way; a DTD can therefore constrain the appearance of namespace declarations in documents. An XSD schema cannot. The same applies to attributes in the xsi namespace.

第三个方面:DTD以相同的方式处理命名空间属性(也称为命名空间声明)和一般属性;因此,DTD可以约束文档中名称空间声明的外观。 XSD架构不能。这同样适用于xsi名称空间中的属性。

If we ignore all of those issues, and formulate the question with respect only to character sequences containing no references to named entities other than the pre-defined entities lt, gt, etc., then the answer changes: for every DTD not involving NOTATION declarations, there is an XSD schema that accepts precisely the same set of documents after entity expansion and with 'same' defined in a way that ignores namespace attributes and attributes in the xsi namespace.

如果我们忽略所有这些问题,并且只针对不包含对预定义实体lt,gt等以外的命名实体的引用的字符序列来表达问题,则答案会发生变化:对于每个不涉及NOTATION声明的DTD ,有一个XSD架构在实体扩展后接受完全相同的文档集,并且以忽略xsi命名空间中的命名空间属性和属性的方式定义“相同”。

In the other direction, the areas of difference include these:


  • XSD is namespace aware: the following XSD schema accepts any instance of element e in the specified target namespace, regardless of what prefix is bound to that namespace in the document instance.



    No DTD can successfully accept all and only the e elements in the given namespace.


  • XSD has a richer set of datatypes and can use datatypes to constrain elements as well as attributes. The following XSD schema has no equivalent DTD:



    This schema accepts the document 42 but not the document 42d Street. No DTD can make that distinction, because DTDs have no mechanism for constraining #PCDATA content. The closest DTD would be , which accepts both sample documents.

    此架构接受文档 42 ,但不接受文档 42d Street 。没有DTD可以做出这种区分,因为DTD没有约束#PCDATA内容的机制。最接近的DTD是 ,它接受两个样本文件。

  • XSD's xsi:type attribute allows in-document modifications of content models. The XSD schema described by the following schema document has no equivalent DTD:



    This schema accepts the document and rejects the document . DTDs have no mechanism for making content models depend on an attribute value given in the document instance.

    此架构接受文档 并拒绝文档 。 DTD没有使内容模型依赖于文档实例中给出的属性值的机制。

  • XSD wildcards allow the inclusion of arbitrary well-formed XML among the children of specified elements; the closest one can come to that with a DTD is to use an element declaration of the form , which is not the same because it requires declarations for all the elements which in fact appear.

    XSD通配符允许在指定元素的子元素中包含任意格式良好的XML;使用DTD最接近的是使用 形式的元素声明,这是不一样的,因为它需要声明实际出现的所有元素。

  • XSD 1.1 provides assertions and conditional type assignment, which have no analogues in DTDs.

    XSD 1.1提供断言和条件类型赋值,它们在DTD中没有类似物。

There are probably other ways in which the expressive power of XSD exceeds that of DTDs, but I think the point has been illustrated adequately.


I think a fair summary would be: XSD can express everything DTDs can express, with the exception of entity declarations and special cases like namespace declarations and xsi:* attributes, because XSD was designed to be able to do so. So the loss of information when translating a DTD to an XSD schema document is relatively modest, well understood, and mostly involves things most vocabulary designers regard as DTD artefacts not of substantive interest.


XSD can express more than DTDs can, again because XSD was designed to do so. In the general case, translation from XSD to DTD necessarily involves loss of information (the set of documents accepted may need to be larger, or smaller, or to be an overlapping set). Different choices can be made about how to manage the loss of information, which gives the question "How does one best translate an XSD into DTD form?" a certain theoretical interest. (Very few people, however, seem to find it an interesting question in practice.)

XSD可以表达超过DTD的能力,因为XSD的设计也是如此。在一般情况下,从XSD到DTD的转换必然涉及信息丢失(接受的文档集可能需要更大或更小,或者是重叠集)。可以对如何管理信息丢失做出不同的选择,这就提出了“如何最好地将XSD转换为DTD形式?”的问题。一定的理论兴趣。 (然而,很少有人在实践中发现这是一个有趣的问题。)

All of this focuses, as did your question, on documents as character sequences, on languages as document sets, and on schema languages as generators of languages in that sense. Issues of maintainability and information present in the schema that does not turn into differences in the extension of document sets (e.g. the treatment of class hierarchies in the document model) is left out of account.




Without qualifiers, the answer is no.


You have to define what is it you call a "language". In my mind, these you refer to are languages meant to define document schemata. A schemata defines constraints on the document structure and content. The constraints expressible by XSD are far more powerful than DTD. So no, they wouldn't be the same.

你必须定义你称之为“语言”的东西。在我看来,你所指的是用于定义文档模式的语言。模式定义了对文档结构和内容的约束。 XSD可以表达的约束比DTD强大得多。所以不,他们不会是一样的。

A comparison of DTD vs. XSD might help you understand why not.


