The other day, I needed to use a bit of external code to check and possibly update values in an InfoPath form stored in a MOSS 2007 document library. Specifically, I wanted to update a field in the InfoPath form from within a WF Code Activity.
In taking the direct path to success, I broke stuff. Isn't that always the way?
The key take away is that if you de-serialize and then re-serialize a document using XmlSerializer, you will get a valid document along with the namespaces that were a part of the original...but, the names may be changed. If anything were to be looking for an element using the tag of the namespace (cough, cough, SharePoint promoted properties) instead of matching the namespace URI, such a change will break the connection.
Quick hit
For those of you simply looking for an answer to "how do I maintain my namespace tags so they are the same on Serialize as they were before Deserialize?", here's a quick cheat-sheet.
At the top level of the class you are deserializing to, perhaps created by using xsd.exe, include the following two lines:
public partial class Sample {
[System.Xml.Serialization.XmlNamespaceDeclarations]
public System.Xml.Serialization.XmlSerializerNamespaces xmlns;
...blah blah blah
The xmlns field will used to save "as is" namespace definitions on deserialization and reapply the identical definition on serialization. You do not need to do any further manipulations in the calling code. (Although you will find xsi and xsd namespace declarations also added if they weren't there before. To avoid then, skip to the bottom of the post.)
Note that the field must be public. Either protected or private will not capture the information on deserialization and the process will end up working as if the field weren't there.
Ok, those in for a quickie may leave now. The rest of the tale continues...
Ignorance is bliss
So, there I was, wanting to modify the XML of an InfoPath from from within a workflow running in SharePoint. I started on my merry way, extracting myschema.xsd from the InfoPath form, using xsd.exe to create a class and creating code to use an XmlSerializer for my de-serialization and serialization.
I did need help to find a way to make sure that correct XML processing instructions for InfoPath were produced, but otherwise it went fine. I checked for an empty field and updated it to "the right thing" if it needed updating, re-serialized and *bam* had an updated InfoPath document. It opened in InfoPath, it opened through the forms server and I was happy.
Until my workflow blew up because the promoted properties were not available anymore.
What's in a name?
A namespace in XML is a pairing of a tag and a URI that allows an XML document to disambiguate elements, thus allowing different pieces of schema to cohabit happily in one document without arguments about names. A conceptually valid piece of XML may have multiple ways of expressing the same thing with different names. That is:
<?xml version="1.0" encoding="utf-8" ?>
<a:Sample xmlns:a="http://example.com/namespace/test">
<a:child attribute="My Attribute">My Contents</a:child>
</a:Sample>
and
<?xml version="1.0" encoding="utf-8" ?>
<b:Sample xmlns:b="http://example.com/namespace/test">
<b:child attribute="My Attribute">My Contents</b:child>
</b:Sample>
and
<?xml version="1.0" encoding="utf-8" ?>
<Sample xmlns="http://example.com/namespace/test">
<child attribute="My Attribute">My Contents</child>
</Sample>
are essentially the same. The fact that in the first
a is used as a label, in the second
b, and in the third the default empty tag makes no difference, since
a, b, and the empty tag are using the same URI,
http://example.com/namespace/test.A well behaved program with treat them both the same. Unfortunately for me, properties promoted from InfoPath to SharePoint are identified by the equivalent of
a:child instead of
a:child and namespace info xmlns:a = http://example.com/namespace/test.The latter would allow a different document with
b:child and namespace info xmlns:b = http://example.com/namespace/test to correctly match the URI and identify b:child as a promoted property. XMLSerializer, Deserialize, and Serialize
Once upon a time in the land of DOM, you would simply load an XML string into an XML document and play funny access games with XPath. This is not nearly as much excitement as it sounds once the document progress beyond the complexity of those above. So instead, we create classes with convenient access to represent the elements of the document as properties in the class and
deserialize the document into this. Documents can be saved by setting the properties and then using the same
XMLSerializer to
serialize them.Let's take an example. First, I used Visual Studio to create an XML Schema file,
Sample.xsd from the first XML sample above. It created:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:a="http://example.com/namespace/a" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://example.com/namespace/a" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Sample">
<xs:complexType>
<xs:sequence>
<xs:element name="child">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="attribute" type="xs:string" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Running
xsd.exe to generate a serializable class representing the schema gives a class
Sample, starts with something like:
using System.Xml.Serialization;
[System.SerializableAttribute()]
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true, Namespace="http://example.com/namespace/test")]
[System.Xml.Serialization.XmlRootAttribute(Namespace="http://example.com/namespace/test", IsNullable=false)]
public partial class Sample {
private SampleChild childField;
public SampleChild child {
get {
return this.childField;
}
set {
this.childField = value;
}
}
}
...more definitions for SampleChild....
Finally, assuming that a text box
txtSource held some XML and a text box
txtOutput is being used to display the re-serialized results, a sample bit of code might be:
XmlSerializer serializer = new XmlSerializer(typeof(Sample));
StringReader reader = new StringReader(txtSource.Text);
Sample what = (Sample)serializer.Deserialize(reader);
using (MemoryStream memoryStream = new MemoryStream() )
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.Encoding = new UTF8Encoding();
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
serializer.Serialize(writer, what);
writer.Close();
txtOutput.Text = Encoding.UTF8.GetString(memoryStream.ToArray());
}
}
We should (and do) end up with a compatible document. For example,
<?xml version="1.0" encoding="utf-8" ?>
<a:Sample xmlns:a="http://example.com/namespace/test">
<a:child attribute="My Attribute">My Contents</a:child>
</a:Sample>
ends up yielding
<?xml version="1.0" encoding="utf-8"?>
<Sample xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://example.com/namespace/test">
<child attribute="My Attribute">My Contents</child>
</Sample>
Note that the structure is the same, but the
a tag for the namespace URI
http://example.com/namespace/test has morphed into the default blank namespace. As mentioned above, this is fine for well behaved XML document manipulation programs. But for anything that is referring to
a:child, it will be a problem. The process also added two unused namespaces,
xsi and
xsd, that don't affect things in this case. Still, the rest of the solution will remove them, too, to make sure that namespace-in becomes namespace-out and that's it.
Maintaining the status quo
At this point I started fiddling around with things to make sure I ended up in the same namespace. Dan Miser pointed me to a post of his explaining
how to remove namespace information using the XmlSerializerNamespaces class. Using it in the way Bill G intended, I could add explicit namespace declarations or, of course, I could return to the old reliable "make it an
XMLDocument and go from there." The former required hard-coding the namespace information and the latter was a lot more code (or would be if I actually wanted to DO something with this document.)
Each way seemed way too inelegant, so I kept looking and ran across the
XmlNamespaceDeclarations attribute. Looking at samples, I decided to try adding
public partial class Sample {
[System.Xml.Serialization.XmlNamespaceDeclarations]
public System.Xml.Serialization.XmlSerializerNamespaces xmlns;
...blah blah blah...
}
to my generated class. Indeed, just that small change solved my problem. The
xmlns element (which must be public to be correctly initialized by
XmlSerializer.Deserialize) retained the tag/URI matching from my original document and the results of
XmlSerializer.Serialize gave
<?xml version="1.0" encoding="utf-8"?>
<a:Sample xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:a="http://example.com/namespace/test">
<a:child attribute="My Attribute">My Contents</a:child>
</a:Sample>
Now I was almost there. Doing this in the InfoPath world was perfectly acceptable -- the
my: tag for the namespace was retained and the references for the promoted SharePoint properties were just dandy.
Being a completist
The last niggle is the addition of the
xsi and
xsd namespace declarations. Since they are unused, they are unlikely to cause problems. But to make sure we get identical output, we can alter the calling code to grab
XmlSerializerNamespaces xmlns field and feed it back into the
Serialize method:
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
serializer.Serialize(writer, what, what.xmlns );
writer.Close();
txtOutput.Text = Encoding.UTF8.GetString(memoryStream.ToArray());
}
We end up with a final, matching document.
<?xml version="1.0" encoding="utf-8"?>
<a:Sample xmlns:a="http://example.com/namespace/test">
<a:child attribute="My Attribute">My Contents</a:child>
</a:Sample>
Now that's probably too much typing to justify adding two lines of code and modifying one, but so it goes.