Libor Bešenyi (Solution Architect)
In this post I would like to focus on schemas - not their description but their practical use in .Net.
XSD schema describes the XML format. SOAP is based on WSDL where XSD describes the interface service. Since VS can only import the WSDL (Reference.cs), there must be some tools and in this case Xsd.exe utility whose practical application will be explained in the following lines. Please note, this is only for inspiration - there must be a million ways to solve the problems described here!
As mentioned above, the schemas are most commonly used for B2B communication. If you have any service that consumes data in a specific format, then we have two options in essence. Either publish the structure of and SOAP / WSDL will take care of XSD or you manually generate XSD and .byte [] with a serialized XML will be the entry into service, for example. The advantage of the first approach is that it's fast. The disadvantage is that if we want to shield the interface (so that each code change does not affect the external interface), we have to have two sets of structures and create the mapping between them.
Practical example: inappropriate design through WSDL
Why it is important to shield? I recently addressed the problem of poor implementation of a Web service.
There was a central SOAP node and a few separate applications. Since the system was approached via SSO (the user logs in from one place and "jumps" across systems without knowing that he is being served from a completely different system). There was of course a system of rights (managed from the central node). Individual applications asked for information about the user after the first jump into the application. So far everything was perfect.
The problem occurred when the system added a new type of application and a new "type" access rights with it. This type of access rights was a simple enumeration. Central node thus began to publish a new kind of access rights (which did not relate to the already existing ones). But because enumeration spread and old systems did not have updated WSDL (client side), the users who had access to the new were sent new rights to the old application by the central node, but they did not recognize them. A new type of access rights thus hit all the old systems because the SOAP layer was trying to map a new enumeration that it did not know.
The solution is of course only a poor WSDL update (when an extended enumeration is imported). However, if there are many systems, the management is not very pleased (if the rules are too tight).
In my case, I made a new service that publishes the same thing but in the form of serialized data instead of WSDL. However, I forbid the old service to publish new type of rights (as it is used by old systems that do not use these rights anyway). I labelled the old service as obsolete and gradually when the maintenance of these systems will be done, I’ll be redoing them for use by the new service (before I finally remove it).
So the conclusion is that if we're not careful, WSDL can cause shutdown of external systems. It is worse when these systems are at the customer. Serialization can prevent similar problems but it is not as comfortable as import WSDL.
Solution ala "restfull"
What would the solution based on the serialization look like? On the server side, rather similar. We can have a structure as well as enumeration with such types of rights as well. However, the service will not publish the structure, but a byte array (or a string) in which the structure will be serialized.
This solution strongly follows the ideas of REST technology, where instead of the service there would only be a handler that returns http response for example XML type. The client must then manually load the XML and process it as needed. I like the philosophy of REST, but we should not build our systems on REST at any cost. Especially when talking about existing infrastructure - it makes no sense to plunge into REST, it is enough just to take the idea of it.
Hence, as we can see, the server is almost unchanged. It is worse with the client. The client will not receive serialized structure, but a dry XML. Therefore, this solution is more laborious in terms of initial costs. Of course, XML processing can be done by de-serialising XML in manually created "Reference.cs" (see below). Sometimes it's appropriate, if we talk about complex structures, for example.json conversion to XML and the like.
Or if it is just about data as access rights, I think is more appropriate to encapsulate this communication to a new class that takes XML as input, which then parses and returns only the data that the application really needs. Thus, for example, in the list of Access rights it would seek only those related to the given application (enumeration is a text, so the unknown text is just skipped).
Thus, how to edit a client depends on the nature of the task.
Example of (de)serialization
We will therefore not devote to WSDL further; we will focus on serialization in .Net. I have not found many articles about it (unlike WSDL) so I decided to write about it myself. First, let us demonstrate how to make an instance of XML class.
Each class that you want to publish must have Serializable attribute defined, for example it might be:
public enum Enum1 { Unknown, Dog, Cat, Cow } [Serializable] // <----- public class Demo1 { public Enum1 EnumField; public string StringField; }
How to create XML from this class and its instance?
var instance = new Interface.Demo1() { EnumField = Interface.Enum1.Cow, StringField = "Malina" };
Let’s write a method that uses .Net serializer:
public static XDocument Save(object value) { var document = new XDocument(); var xmlSerializer = new XmlSerializer(value.GetType()); using (var xmlWriter = document.CreateWriter()) xmlSerializer.Serialize(xmlWriter, value); return document; }
With this method we can serialize our object by calling Save (instance).ToString () where we obtain the following XML:
<Demo1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <EnumField>Cow</EnumField> <StringField>Malina</StringField> </Demo1>
Of course, we also need to be able to build such XML back into the structure, which can be done by the following method, for example:
public static T Load<T>(XContainer element) { var xmlSerializer = new XmlSerializer(typeof(T)); using (var xmlReader = element.CreateReader()) return (T)xmlSerializer.Deserialize(xmlReader); }
The use is then as follows: var instance = Load<Class>(xml). Of course deserialised instance has nothing to do with the instance serialization (only data is the same). So now we can serialise the structure, for example, in one system and stack it again in another.
Serialization is not used only in B2B. It can be used as an import / export, etc.. After all, B2B is only an automated transfer of data between systems, but the principle is the same as in export of data from one system and importing it to another.
CAUTION: Save and Load using XDocument does not support (de)serialization of base 64. Thus, if a class contains binary data, these methods die. The solution is to override these methods in older XmlDocumentu that supports base 64.
Generating schema from code
Okay, now we can transfer data between systems using XML. However, how do we describe our class format to a third party? This is what XSD is used for. XSD describes how XML could look. As I said, there are certainly many better ways to generate schema from code, I use the Xsd.exe utility that is part of Visual Studio.
Just out of curiosity, let’s extend our class by nullable int field:
[Serializable] public class Demo1 { public Enum1 EnumField; public string StringField; public int? IntField; }
XSD.EXE generates from this class Scheme0.XSD file following CMD command:
XSD {Path to DLL / EXE} /type:[NameSpace.]{Class}
Let's see what was generated:
<?xml version="1.0" encoding="utf-8"?> <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Demo1" nillable="true" type="Demo1" /> <xs:complexType name="Demo1"> <xs:sequence> <xs:element minOccurs="1" maxOccurs="1" name="EnumField" type="Enum1" /> <xs:element minOccurs="0" maxOccurs="1" name="StringField" type="xs:string" /> <xs:element minOccurs="1" maxOccurs="1" name="IntField" nillable="true" type="xs:int" /> </xs:sequence> </xs:complexType> <xs:simpleType name="Enum1"> <xs:restriction base="xs:string"> <xs:enumeration value="Unknown" /> <xs:enumeration value="Dog" /> <xs:enumeration value="Cat" /> <xs:enumeration value="Cow" /> </xs:restriction> </xs:simpleType> </xs:schema>
As we can see, this XSD describes our class - the name of the fields and in the enumeration even the "allowed" values.
Thus generated XSD could be therefore sent to company programmers that should teach their software consume our XML, but XML will be sent to our system in given format.
It is commonly the case that the published structure is commonly used in the code. By refactoring, the programmer can accidentally change the name and it basically means that if this change is not reported to third parties or internal systems are not modified, the service may stop working FOR THEM (and the phone calls tend to be unpleasant then).
In this way we can write UnitTest that generates XSD and compares it to the old interface. Instantly, when there is a change, unit test calls out the error and the programmer must evaluate whether a change in the interface was required, or it endangers other systems (adding new field can be compatible backward) and the like. Then either the change reverses or the remaining systems are also modified. Thus interface of services can be guarded fairly well!
Validation of XML against XSD
Normally there are situations when the schema changes on the server side and people "forget" to notify the client [1]. It still generates data in the old format, but our system expects the new format. Imagine that once there was item "horse" in our enumeration Now we threw it out, but the client application generates it there (we can capture this precisely by comparing XSD schemas in Unit Testing - see previous chapter).
[1] It was in Britain – I am not sure if similar communication with state authorities is possible anywhere in Slovakia
Some schemas are pretty complicated. I dealt with communication with the state administration [1], that had to send some data to the central register. In this case, I wrote a client. Their server was not fully finished yet (certificates, legislation, etc.), but the format of the data was clear. So we did not have to wait for the state administration, we began slowly to generate XML from our data that would their server consume. To verify the format, we used just exactly their schemas. Once again, .Net Again offers a solution:
[1] In this context, the client is meant as application, which call SOAP services – of course, in this case, server application can also be the client
public static List<ValidationEventArgs> Validate(this XDocument document, string targetNamespace, string schemaUri) { var result = new List<ValidationEventArgs>(); var schemaSet = new XmlSchemaSet(); schemaSet.Add(targetNamespace, schemaUri); document.Validate(schemaSet, (o, e) => result.Add(e)); return result; }
So now we can try to validate XML that is serialized. Of course it must be valid because the schema and the XML are generated from the same sources:
var instancia = new Interface.Demo1() { EnumField = Interface.Enum1.Cow, StringField = "Malina" }; var xml = XmlUtils.Save(instancia); var validacneChyby = XmlUtils.Validate(xml, /*nameSpace*/null, /*schema*/@"SchemyDemo1.xsd"); MessageBox.Show("Chyb: " + validacneChyby.Count);
Let's try to validate a "hand" made XML:
xml = XDocument.Parse("<Demo1><EnumField>Cow</EnumField><StringField /><IntField>1</IntField></Demo1>"); validacneChyby = XmlUtils.Validate(xml, /*nameSpace*/null, /*schema*/@"SchemyDemo1.xsd"); MessageBox.Show("Chyb: " + validacneChyby.Count);
Now an XML that has some bug. E.g. it sends the enum field for the aforementioned bug "horse". Instantly we receive an error message:
The 'EnumField' element is invalid - The value 'KON' is invalid according to its datatype 'Enum1' - The Enumeration constraint failed.
Nill vs. NULL
So far everything went smoothly. But as it is usual (especially in MS technology) at the point, where we start to have higher demands, technology betrays us. Not coincidentally I included NULL integer in the structure. NULLable Fields are often added, as they have an uncanny feature, if the change is backward compatible, the integrating applications does not have to apply the new schema (unless required by business logic, of course).
But there is a problem. Probably for historical reasons (.Net 1.1 contained NULLable fields), after their implementation, the change had to be backward compatible with v1.1. So instead of NULL XML values we got NIL attribute. Or maybe just the MS wanted to be able to manage mandatory fields that can be NULLable.
If an element is not used or it is closed with an empty value (<Field />) in XML, it is a NULL value. But let's try to validate such XML over our scheme (it will be possible to de-serialise, but we cannot use the schema to verify the format):
var xml = XDocument.Parse("<Demo1><EnumField>Cow</EnumField></Demo1>"); var deserializovanaInstancia = XmlUtils.Load<Interface.Demo1>(xml); var validacneChyby = XmlUtils.Validate(xml, /*nameSpace*/null, /*schema*/@"SchemyDemo1.xsd"); MessageBox.Show("Chyb: " + string.Join(" | ", validacneChyby.Select(item => item.Message)));
So despite the fact that i tis possible to de-serialise XML (where the missing values are NULL) - such XML is not valid against the schema that we generated!
This therefore means that although XSD.EXE generates schema from the code, it is not in the usual way. Take a closer look at the NULLable int in the scheme:
<xs:element minOccurs="1" maxOccurs="1" name="IntField" nillable="true" type="xs:int" />
Here we see that even though the field is nullable, the scheme will be expecting it, as a minimum incidence in XML is 1. Fixing it is easy, I wrote a small open-source utility that can be downloaded from link on the end (Nill Tool removal).
When generating XSD, the resulting XSD is run on utility (parameter is only the filename) and we get XSD in which NULLABLE fields are not considered mandatory.
Generating structuere from XSD
As I said, sometimes it is more appropriate to map the internal items on the client side, instead of creating XML. If e.g. a structure is quite complex and we know that it must be the same in each system at any time, we do not have to bother with XML, we have two options:- Publish the structure from DLL and then use the DLL in the systems. Then we really can send the deserialisator the type of structure – essentially, it is done by our demo. As the same application also includes the mentioned class, we can simply de-serialise it.
- Publish the structure from DLL and then use the DLL in the systems. Then we really can send the deserialisator the type of structure – essentially, it is done by our demo. As the same application also includes the mentioned class, we can simply de-serialise it.
- Generate the structure from XSD. However, if it is a structure of a third party, probably we will be sent only an XSD file (because the system can be written in Java). If we don’t want to bother with XML, we can thus stack the class which we'll use during deserialisation. This approach is virtually identical to the WSDL - because every time we change the format, we have to generate a new CS file for a new XSD.
Again, we can use XSD utility. This time, XSD create CS by this command:
XSD [File] /classes
From Demo1.XSD this command creates a CS file Demo1.CS which can be included in the project and used for deserialization as our original class!
But it is more interesting to see what happens with XSD that does not use NIL. We can see that the XSD created CS file, where property has been added IntFieldSpecified (bool). However, IntField is not null! This can cause problems because such schema, whether governed by our system or generated from a Java server, imports CS differently than expected.
So, as we before pre-generated the schema to validate such XML, we need to bring it back somehow. We can use the same utility, just add another parameter -r:
NillToMinOccurs [XSD] –r
Namespaces
If our structures are more complex and we use different namespaces (I prefer to call it a namespace), the XSD utility can nicely identify the difference. A new XSD file is created for each namespace, then it is up to us to make order in this output. We can simulate it in such a way that we force the other namespace to an existing enumeration:
[Serializable] public class DemoNS { [XmlElement(Namespace = "nejmspejs")] public Enum1 Enum; }
So we have two XSD files which we cannot work with without further intervention. Let's see why. "Null" XSD is the main thing. Administration has defined that it will use another namespace:
<xs:import namespace="nejmspejs" />
But as soon as we use the schema, it dies on this:
The 'nejmspejs:Enum' element is not declared.
The schema can of course be manually corrected. We just have to say in which files is the definition, for example, as follows:
<xs:import namespace="nejmspejs" schemaLocation="DemoNS2.xsd" />
Immediately, the XML passes through validation. So we did not have to do it every time we generate a chart, my utility can do it for us. Schem* .XSD manages files and logically renames them by this command:
NillToMinOccurs [XSD] –sl
Unit testing
For example how you can work with xsd, see this link: Xsd comparator