ReBuildAll Blog
Thoughts (mostly) on .NET development

Merging WSDL and XSD files   (Tips & Tricks)   
Updated! This tool was updated since this article was written. Read more about the update here

Having played with BizTalk some while ago (version 2006 R2) I had an interesting problem: in certain situations BizTalk would not accept web service description files (WSDL) where the XML schema was stored in separate files (XSD). No matter how hard I tried, there was no success. I even tried the good old trick of putting the files in a webserver and trying to add them over HTTP. It would not work. (This is the way you trick the Visual Studio proxy generator when WSDL and XSD files are spread over the hard drive in different directories, but that is another story :) )

Finally, I ended up creating a tool I called WSDLMerge. This can take a WSDL file, local or remote, and merge it with all the XSD files referenced. The merging is recursive, so any XSD files referenced by other XSD files are also included. It can follow local path locations and remote path locations. The result is a single WSDL file, that contains everything.

If you want to jump right into the code part, you can find the tool in source code format at GoogleCode. You will need Visual Studio 2010 to compile, but you can safely run it with .NET 3.5 SP1 (maybe even earlier).

The rest of this post will talk about how this tools works.

Just XML

WSDL files are just XML files after all. So we can go ahead and load it from disk or from a URL.

XmlDocument wsdl = new XmlDocument ();
wsdl.Load ( filename );

And voila, we have the entire WSDL loaded up. We need to create a XML namespace manager because we are going to work with namespaces. XPath searches in particular required the namespace manager. If you have the source code by now (see link above) you can find the following code in a method named PrepareNamespaceManager().

            XmlNamespaceManager manager = new XmlNamespaceManager ( wsdl.NameTable );
            manager.AddNamespace ( "wsdl", WSDLNamespace );
            manager.AddNamespace ( "xsd", XSDNamespace );
            return manager;

The tools will verify if the file loaded is an actual WSDL file. It does this by checking for the root element, which should be wsdl:definitions. There are probably better ways to do this, but this is good enough for our purposes.

Schemas, where are thee?

Next step, find schemas. These can be found under the following XPath /wsdl:definitions/wsdl:types. We read the import definitions one by one, load the schema location and namespace parts of the imports. We also keep track of all namespaces we have already loaded.

Ok, so first we locate the element where we should find the schema import statements:

XmlNode node = wsdl.SelectSingleNode ( "/wsdl:definitions/wsdl:types", manager );

If such an element exists, we can start finding any schemas:

XmlElement schemaElement = typesElement.SelectSingleNode ( "xsd:schema", manager ) as XmlElement;

Here the process turn recursive. This is done using a method called ProcessSchema() that is designed to process a single schema definition.

Inside this method we need to know if the schema is inline or if the schema is imported. So we look for import elements:
            imports = rootElement.SelectNodes ( "xsd:import", manager );

If we find anything, we check the namespace for this schema as well as the schema location. If the namespace we find is not yet loaded, we load the .XSD file (from either disk or an URL), attach it to the main document, and remove the import statement.

XmlDocument schemaDocument = new XmlDocument ();
schemaDocument.Load ( importLocation );

XmlElement newSchema = wsdl.ImportNode ( schemaDocument.DocumentElement, true ) as XmlElement;

XmlNodeList newImports = newSchema.SelectNodes ( "/xsd:import", manager );
foreach ( XmlNode importNode in newImports )
{
                    if ( level == 0 )
                    {
                        newSchema.RemoveChild ( importNode );
                    }
                    else
                    {
                        if ( importNode.Attributes["schemaLocation"] != null )
                        {
                            importNode.Attributes.RemoveNamedItem ( "schemaLocation" );
                        }
                    }
}
schemas.Add ( importNamespace, newSchema );

The ImportNode() method handles duplicating the element from the schema into our WSDL document. We also remove any import elements from the duplicated element (this could mean removing schemaLocation attributes). We do not want any XSD to import anything.

Of course we also do not want to have any schemas missing. So while we remove schema references from the document we are creating, we will want to follow them in the original documents. After this processing done we will process the original XSD (from which we created the duplicate) for these import statements, and call ourself (ProcessSchema()) recursively to import any further XML namespaces.

When this process is complete, all namespaces (== XSD files) that are referenced in any of the directly referenced schemas or anywhere in there recursively will be included one by one in the body of the WSDL document. This sort of flattens the entire XSD structure (previously using files it was built as a tree like structure). The schema references will still be in place, and because all schemas are now in the body, the WSDL will not have any dependencies.

In the end, the whole process is just navigating and modifying XML documents, looking up references, loading them, and attaching duplicated elements and nodes into the master document. This master document will become our merged WSDL document, which we just write to disk in the end.