XML to be validated against multiple xsd schemas

I'm writing the xsd and the code to validate, so I have great control here.

I would like to have an upload facility that adds stuff to my application based on an xml file. One part of the xml file should be validated against different schemas based on one of the values in the other part of it. Here's an example to illustrate:

<foo>
  <name>Harold</name>
  <bar>Alpha</bar>
  <baz>Mercury</baz>
  <!-- ... more general info that applies to all foos ... -->

  <bar-config>
    <!-- the content here is specific to the bar named "Alpha" -->
  </bar-config>
  <baz-config>
    <!-- the content here is specific to the baz named "Mercury" -->
  </baz>
</foo>

In this case, there is some controlled vocabulary for the content of <bar>, and I can handle that part just fine. Then, based on the bar value, the appropriate xml schema should be used to validate the content of bar-config. Similarly for baz and baz-config.

The code doing the parsing/validation is written in Java. Not sure how language-dependent the solution will be.

Ideally, the solution would permit the xml author to declare the appropriate schema locations and what-not so that s/he could get the xml validated on the fly in a sufficiently smart editor.

Also, the possible values for <bar> and <baz> are orthogonal, so I don't want to do this by extension for every possible bar/baz combo. What I mean is, if there are 24 possible bar values/schemas and 8 possible baz values/schemas, I want to be able to write 1 + 24 + 8 = 33 total schemas, instead of 1 * 24 * 8 = 192 total schemas.

Also, I'd prefer to NOT break out the bar-config and baz-config into separate xml files if possible. I realize that might make all the problems much easier, as each xml file would have a single schema, but I'm trying to see if there is a good single-xml-file solution.

I finally figured this out.

First of all, in the foo schema, the bar-config and baz-config elements have a type which includes an any element, like this:

<sequence>
    <any minOccurs="0" maxOccurs="1"
        processContents="lax" namespace="##any" />
</sequence>

In the xml, then, you must specify the proper namespace using the xmlns attribute on the child element of bar-config or baz-config, like this:

<bar-config>
    <config xmlns="http://www.example.org/bar/Alpha">
        ... config xml here ...
    </config>
</bar-config>

Then, your XML schema file for bar Alpha will have a target namespace of http://www.example.org/bar/Alpha and will define the root element config.

If your XML file has namespace declarations and schema locations for both of the schema files, this is sufficient for the editor to do all of the validating (at least good enough for Eclipse).

So far, we have satisfied the requirement that the xml author may write the xml in such a way that it is validated in the editor.

Now, we need the consumer to be able to validate. In my case, I'm using Java.

If by some chance, you know the schema files that you will need to use to validate ahead of time, then you simply create a single Schema object and validate as usual, like this:

Schema schema = factory().newSchema(new Source[] {
    new StreamSource(stream("foo.xsd")),
    new StreamSource(stream("Alpha.xsd")),
    new StreamSource(stream("Mercury.xsd")),
});

In this case, however, we don't know which xsd files to use until we have parsed the main document. So, the general procedure is to:

  1. Validate the xml using only the main (foo) schema
  2. Determine the schema to use to validate the portion of the document
  3. Find the node that is the root of the portion to validate using a separate schema
  4. Import that node into a brand new document
  5. Validate the brand new document using the other schema file

Caveat: it appears that the document must be built namespace-aware in order for this to work.

Here's some code (this was ripped from various places of my code, so there might be some errors introduced by the copy-and-paste):

// Contains the filename of the xml file
String filename;

// Load the xml data using a namespace-aware builder (the method 
// 'stream' simply opens an input stream on a file)
Document document;
DocumentBuilderFactory docBuilderFactory =
    DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
document = docBuilderFactory.newDocumentBuilder().parse(stream(filename));

// Create the schema factory
SchemaFactory sFactory = SchemaFactory.newInstance(
    XMLConstants.W3C_XML_SCHEMA_NS_URI);

// Load the main schema
Schema schema = sFactory.newSchema(
    new StreamSource(stream("foo.xsd")));

// Validate using main schema
schema.newValidator().validate(new DOMSource(document));

// Get the node that is the root for the portion you want to validate
// using another schema
Node node= getSpecialNode(document);

// Build a Document from that node
Document subDocument = docBuilderFactory.newDocumentBuilder().newDocument();
subDocument.appendChild(subDocument.importNode(node, true));

// Determine the schema to use using your own logic
Schema subSchema = parseAndDetermineSchema(document);

// Validate using other schema
subSchema.newValidator().validate(new DOMSource(subDocument));

XML documents are validated by the Create method of the XmlReader class. To validate an XML document, construct an XmlReaderSettings object that contains an XML schema definition language (XSD) schema with which to validate the XML document.

Take a look at NVDL (Namespace-based Validation Dispatching Language) - http://www.nvdl.org/

It is designed to do what you want to do (validate parts of an XML document that have their own namespaces and schemas).

There is a tutorial here - http://www.dpawson.co.uk/nvdl/ - and a Java implementation here - http://jnvdl.sourceforge.net/

Hope that helps! Kevin

Validating XML Against Multiple Schemas using the XmlValidatingReader: Description The XmlValidatingReader provides an API for validating XML documents against XSD schemas. In many cases a single XML instance document may be validated against a single XSD schema. However, more advanced applications may involve multiple schemas.

You need to define a target namespace for each separately-validated portions of the instance document. Then you define a master schema that uses <xsd:include> to reference the schema documents for these components.

The limitation with this approach is that you can't let the individual components define the schemas that should be used to validate them. But it's a bad idea in general to let a document tell you how to validate it (ie, validation should something that your application controls).

When I am using any tool for schema validation purpose and I will put all those xsd files in one folder then I can validate the xml just fine. I would like to achieve the same effect without using a directory on a filesystem. I was thinking about puting all those xsd in a zip then at runtime get them back and put them in XmlSchemaSet. The

You can also use a "resource resolver" to allow "xml authors" to specify their own schema file, at least to some extent, ex: https://stackoverflow.com/a/41225329/32453 at the end of the day, you want a fully compliant xml file that can be validatable with normal tools, anyway :)

XSD files are "XML Schemas" that describe the structure of a XML document. The validator checks for well formedness first, meaning that your XML file must be parsable using a DOM/SAX parser, and only then does it validate your XML against the XML Schema. The validator will report fatal errors, non-fatal errors and warnings. If the XSD is publicly available using HTTP and referenced through a "schemaLocation" or "noNamespaceSchemaLocation", then the validator will pick it up and it doesn't

Oxygen allows XML schema and external entities to co-exist since you can configure the validation to be performed against the specified XML schema even if a DTD is also specified. Validate XML Documents Against DTDs. You can use Oxygen to validate an XML document instance against a specified DTD.

SELECT schema_url FROM user_xml_schemas; SCHEMA_URL ----- my_schema.xsd SQL> With the schema registered, we can now validate XML documents against it. The DELETESCHEMA procedure can be used to un-register the schema.

and then just validate the XML with configuration.xsd. I'll look into why it doesn't work with NVDL. Right now I think it's because <xs:any namespace="##other"/> makes the first schema accept an element from any namespace other that the targetNamespace but it's expecting it to be declared in that same schema.

Comments
  • It does not work for me. SOLUTION -> stackoverflow.com/questions/61483586/…
  • Forgot to mention this is good too if your schemas are of different types (XSD, RNG, and DTD)