Gobbledygooks by @riha78

A random pile of .NET, BizTalk and integration focused scribbles by @riha78.

A bit more about me and a few projects I am currently spending time on.

Using XSLT 1.0 to summarize a node-set with comma separated values

   

Pure XSLT is very powerful but it definitely has its weaknesses (I’ve written about how to extend XSLT using mapping and BizTalk previously here) … One of those are handling numbers that uses a different decimal-separator than a point (“.”).

Take for example the XML below

<Prices>
  <Price>10,1</Price>
  <Price>10,2</Price>
  <Price>10,3</Price>
</Prices>

Just using the XSLT sum-function on these values will give us a “NaN” values. To solve it we’ll have to use recursion and something like in the sample below.

The sample will select the node-set to summarize and send it to the “SummarizePrice” template. It will then add the value for the first Price tag of the by transforming the comma to a point. It will then check if it’s the last value and if not use recursion to call into itself again with the next value. It will  keep adding to the total amount until it reaches the last value of the node set.

<xsl:template match="Prices">
  <xsl:call-template name="SummurizePrice">
    <xsl:with-param name="nodes" select="Price" />
  </xsl:call-template>
</xsl:template>

<xsl:template name="SummurizePrice">
  <xsl:param name="index" select="1" />
  <xsl:param name="nodes" />
  <xsl:param name="totalPrice" select="0" />

  <xsl:variable name="currentPrice" select="translate($nodes[$index], ',', '.')"/>

  <xsl:choose>
    <xsl:when test="$index=count($nodes)">
      <xsl:value-of select="$totalPrice + $currentPrice"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:call-template name="SummurizePrice">
        <xsl:with-param name="index" select="$index + 1" />
        <xsl:with-param name="totalPrice" select="$totalPrice + $currentPrice" />
        <xsl:with-param name="nodes" select="$nodes" />
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>

</xsl:template>

Simple but a bit messy and nice to have for future cut and paste ;)

Streaming pipeline and using context ResourceTracker to avoid disposed streams

   

Recently there’s been a few really good resources on streaming pipeline handling published. You can find some of the here.aspx) and here.aspx).

The Optimizing Pipeline Performance.aspx) MSDN article has two great examples of how to use some of the Microsoft.BizTalk.Streaming.dl.aspx) classes. The execute method of first example looks something like below.

public IBaseMessage Execute(IPipelineContext context, IBaseMessage message)
{
    try
    {
        ...
        IBaseMessageContext messageContext = message.Context;
        if (string.IsNullOrEmpty(xPath) && string.IsNullOrEmpty(propertyValue))
        {
            throw new ArgumentException(...);
        }
        IBaseMessagePart bodyPart = message.BodyPart;
        Stream inboundStream = bodyPart.GetOriginalDataStream();
        VirtualStream virtualStream = new VirtualStream(bufferSize, thresholdSize);
        ReadOnlySeekableStream readOnlySeekableStream = new ReadOnlySeekableStream(inboundStream, virtualStream, bufferSize);
        XmlTextReader xmlTextReader = new XmlTextReader(readOnlySeekableStream);
        XPathCollection xPathCollection = new XPathCollection();
        XPathReader xPathReader = new XPathReader(xmlTextReader, xPathCollection);
        xPathCollection.Add(xPath);
        bool ok = false;
        while (xPathReader.ReadUntilMatch())
        {
            if (xPathReader.Match(0) && !ok)
            {
                propertyValue = xPathReader.ReadString();
                messageContext.Promote(propertyName, propertyNamespace, propertyValue);
                ok = true;
            }
        }
        readOnlySeekableStream.Position = 0;
        bodyPart.Data = readOnlySeekableStream;
    }
    catch (Exception ex)
    {
        if (message != null)
        {
            message.SetErrorInfo(ex);
        }
        ...
        throw ex;
    }
    return message;
}

We used this example as a base when developing something very similar in a recent project. At first every thing worked fine but after a while we stared getting an error saying:

Cannot access a disposed object. Object name: DataReader

It took us a while to figure out the real problem here, everything worked fine when sending in simple messages but as soon as we used to code in a pipeline were we also debatched messages we got the “disposed object” problem.

imageIt turns out that when we debatched messages the execute method of the custom pipeline ran multiple times, one time for each sub-messages. This forced the .NET Garbage Collector to run.

The GC found the XmlTextReader that we used to read the stream as unreferenced and decided to destoy it.

The problem is that will also dispose the readOnlySeekable-Stream stream that we connected to our message data object!

It’s then the BizTalk End Point Manager (EPM) that throws the error as it hits a disposed stream object when trying to read the message body and save it to the BizTalkMsgBox!

ResourceTracker to the rescue!

Turns out that the BizTalk message context object has a nice little class connected to it called the ResourceTracker. This object has a “AddResouce”-method that makes it possible to add an object and the context will the hold a reference to this object, this will tell the GC not to dispose it!

So when adding the below before ending the method everything works fine - even when debatching messages!

context.ResourceTracker.AddResource(xmlTextReader);

Checking if BizTalk binding file is up-to date during deployment

   

As all of you know the number one time consuming task in BizTalk is deployment. How many times have you worked your way through the steps below (and even more interesting - how much time have you spent on them …)

  • Build

  • Create application

  • Deploy schemas

  • Deploy transformations

  • Deploy orchestration

  • Deploy components

  • Deploy pipelines

  • Deploy web services

  • Create the external databases

  • Change config settings

  • GAC libraries

  • Apply bindings on applications

  • Bounce the host instances

  • Send test messages

  • Etc, etc …

Not only is this time consuming it’s also drop dead boring and therefore also very prone - small mistakes that takes ages to find and fix.

The good news is however that the steps are quite easy to script. We use a combination of a couple of different open-source MsBuild libraries (like this and this) and have created our own little build framework. There is however the BizTalk Deployment Framework by Scott Colescott and Thomas F. Abraham that looks great and is very similar to what we have (ok, ok, it’s a bit more polished …).

Binding files problem

Keeping the binding files in a source control system is of course a super-important part of the whole build concept. If something goes wrong and you need to roll back, or even rebuild the whole solution, having the right version of the binding file is critical.

A problem is however that if someone has done changes to the configuration via the administration console and missed to export these binding to source control we’ll deploy an old version of the binding when redeploying. This can be a huge problem when for example addresses etc have changed on ports and we redeploy old configurations.

What!? If fixed that configuration issue in production last week and now it back …

So how can we reassure that the binding file is up-to-date when deploying?

One solution is to do and export of the current binding file and compare that to one we’re about to deploy in a “pre-deploy”-step using a custom MsBuild target.

Custom build task

Custom build task in MsBuild are easy, a good explanation of how to write one can be found here. The custom task below does the following.

  1. Require a path to the binding file being deployed.

  2. Require a path to the old deployed binding file to compare against.

  3. Using a regular expression to strip out the time stamp in the files as this is the time the file was exported and that will otherwise differ between the files.

  4. Compare the content of the files and return a boolean saying if they are equal or not.

     public class CompareBindingFiles : Task
     {
         string _bindingFileToInstallPath;
         string _bindingFileDeployedPath;
         bool _value = false; 
    
         [Required]
         public string BindingFileToInstallPath
         {
             get { return _bindingFileToInstallPath; }
             set { _bindingFileToInstallPath = value; }
         } 
    
         [Required]
         public string BindingFileDeployedPath
         {
             get { return _bindingFileDeployedPath; }
             set { _bindingFileDeployedPath = value; }
         } 
    
         [Output]
         public bool Value
         {
             get { return _value; }
             set { _value = value; } 
    
         } 
    
         public override bool Execute()
         {
             _value = GetStrippedXmlContent(_bindingFileDeployedPath).Equals(GetStrippedXmlContent(_bindingFileToInstallPath));
             return true; //successful
         } 
    
         private string GetStrippedXmlContent(string path)
         {
             StreamReader reader = new StreamReader(path);
             string content = reader.ReadToEnd();
             Regex pattern = new Regex("<Timestamp>.*</Timestamp>");
             return pattern.Replace(content, string.Empty);
         } 
    
     }
    

Using the build task in MsBuild

After compiling the task above when have to reference the dll in element like below.

<UsingTask AssemblyFile="My.Shared.MSBuildTasks.dll" TaskName="My.Shared.MSBuildTasks.CompareBindingFiles"/>

We can then do the following in out build script!

<!--
    This target will export the current binding file, save as a temporary biding file and use a custom target to compare the exported file against the one we’re about to deploy.
    A boolean value will be returned as IsValidBindingFile telling us if they are equal of not.
-->
<Target Name="IsValidBindingFile" Condition="$(ApplicationExists)=='True'">
    <Message Text="Comparing binding file to the one deployed"/> 

    <Exec Command='BTSTask ExportBindings /ApplicationName:$(ApplicationName) "/Destination:Temp_$(BindingFile)"'/> 

    <CompareBindingFiles BindingFileToInstallPath="$(BindingFile)"
                         BindingFileDeployedPath="Temp_$(BindingFile)">
        <Output TaskParameter="Value" PropertyName="IsValidBindingFile" />
    </CompareBindingFiles> 

    <Message Text="Binding files is equal: $(IsValidBindingFile)" /> 

</Target>

<!--
    This pre-build step runs only if the application exists from before. If so it will check if the binding file we try to deploy is equal to one deployed. If not this step will break the build.
-->
<Target Name="PreBuild" Condition="$(ApplicationExists)=='True'" DependsOnTargets="ApplicationExists;IsValidBindingFile">
    <!--We'll break the build if the deployed binding files doesn't match the one being deployed-->
    <Error Condition="$(IsValidBindingFile) == 'False'" Text="Binding files is not equal to deployed" /> 

<!--All other pre-build steps goes here-->

</Target>

So we now break the build if the binding file being deployed aren’t up-to-date!

This is far from rocket science but can potentially save you from making some stupid mistakes.

BAM tracking data not moved to BAM Archive database

   

There are a few really good blog post that explains BAM – like this from Saravana Kumar and this by Andy Morrison. They both do a great job explaining the complete BAM process in detail.

This post will however focus on some details in the last step of the process that has to do with archiving the data. Let’s start with a quick walk-through of the whole process.

BAM Tracking Data lifecycle

image

  1. The tracked data is intercepted in the BizTalk process and written to the “BizTalk MsgBox” database.

  2. The TDDS service reads the messages and moves them to the correct table in the “BAM Primary Import” database.

  3. The SSIS package for the current BAM activity has to be triggered (this is manual job or something that one has to schedule). When executing the job will do couple of things.

    1. Create a new partitioned table with a name that is a combination of the active table name and a GUID.

    2. Move data from the active table to this new table. The whole point is of course to keep the active table as small and efficient as possible for writing new data to.

    3. Add the newly created table to a database view definition. It is this view we can then use to read all tracked data (including data from the active and partitioned tables).

    4. Read from the “BAM Metadata Activities” table to find out the configured time to keep data in the BAM Primary Import database. This value is called the “online window”.

    5. Move data that is older than the online window to the “BAM Archive” database (or delete it if you have that option).

Sound simple doesn’t it? I was however surprised to see that my data was not moved to the BAM Archive database, even if it was clearly outside of the configured online window.

So, what data is moved to the BAM Archive database then?

Below there is a deployed tracking activity called “SimpleTracking” with a online window of 7 days. Ergo, all data that is older than 7 days should be moved to the BAM Archive database when we run the SSIS job for the activity.

image

If we then look at the “BAM Completed” table for this activity we see that all the data is much older than 7 days as today's date is “13-11-2009”.

So if we run the SSIS job these rows should be moved to the archive database. lets run the SSIS job. Right?

image

But when we execute the SSIS job the BAM Archive database is still empty! All we see are the partitioned tables that were created as part of the first steps of the SSIS job. All data from the active table is however moved to the new partitioned table but not moved to the Archive database.

image

It turns out that the SSIS job does not at all look at the the “Last Modified” values of each row but on the “Creation Time” of the partitioned table in the “BAM MetaData Partitions” table that is shown below.

image

The idea behind this is of course to not have to read from tables that potentially are huge and find those rows that should be moved. But it also means that it will take another 7 days before the data in the partitioned view is actually move to the archive database.

This might actually be a problem if you haven not scheduled the SSIS job to run from day one and you BAM Primary Import database is starting to get to big and you quickly have to move data over to archiving. All you then have to is of course to change that “Creation Time” value in the BAM Metadata Partitions table so it is outside of the online window value for the activity.

What kind of integration patters do you implement in BizTalk?

   

I like to divide BizTalk based integrations into three different patterns.

  1. Point-to-point integration

  2. Broker-based integration

  3. ESB-based integration

I am sure someone could come up with fancier names but I will in this post try and dig into each of them, explain what I mean by them. I will also try and highlight some issues with each of the patterns.

Point-to-point integration

image This is where a lot of us started. The idea is that the sending system has information and knowledge about the receiving system.

This usually means that the messages are exported in a message format that is tailored to the format of the particular receiving system’s needs. It also means that we usually get one integration process for each receiving system there is.

The example in the figure on the right shows a sending system that sends information about invoices to two different receiving systems (“System A” and “System B”). Using a point-to-point pattern in this scenario we end up with two different types of messages that are exported. Each of the exported messages that are suited and tailored to the format of the receiving system.

Issues

The main problem with this kind of point-to-point integration, where the sending system has detailed knowledge about the receiving systems, is that this knowledge translates into a coupling between the systems. When we tie the export message format from the sending system to the format of the receiving, all changes in the receiving system will also cause changes in the sending system.

Suppose there is a sudden need to change the format of the receiving system - as we use that format as our export format we now also have to change it there.

Another problem is the lack of agility. In this invoice scenario all other system that also has the need of invoice based information has to get this by going all the way back to the sending system and develop a whole new specific integration - separate to the existing ones.

Broker-based integration

imageIn the broker-based scenario the integration platform is used as a broker of messages.

This means that only one canonical format is exported from the sending system. The broker then handles the routing to the different systems and the transformation to the message formats that the receiving systems expects.

The main advantage between this approach - where the sending system do not know anything about the receiving systems – and the point-to-point pattern is agility.

If there now is a need for invoice information in a third, new system, we do not have to change the export from the sending system (as long as we have all the information need that is) or develop a whole new integration. All we have to do is to route the invoices so that they also are sent to the third system and transformed into the format that the system expects. A new BizTalk map and port and we are done!

Issues

In my point of view this approach to integration has a lot of advantages over the point-to-point integrations previously discussed. And in a lot of simpler, stabile scenarios it works just fine and is the way to go.

But in some scenarios it kind of breaks down and becomes hard to work with. The problems is related to configuration and how we define the routing information in BizTalk. In BizTalk we can either create an orchestration and “hardcode” the process that defines which systems the messages should be sent to. We can also create a “messaging” scenario where we configure this routing information in the different BizTalk port artifacts by setting up filters.

Regardless if we choose a “messaging” or orchestration based solution the routing information becomes hard to update as the solution grow in size. We either get very complicated orchestrations or loads of ports to maintain.

Furthermore the whole process is very coupled to the one canonical schema which makes versioning and updates hard. If the canonical schema needs to be updated and we still need to be able to send information using the old schema (so we have a “version 1” and “version 2” of the schema side-by-side) all artifacts needs to be more or less duplicated and the complexity of the solutions grows fast.

This makes business agility hard and any changes takes long time to develop and deploy correctly.

I guess these issues has more to do with how BizTalk works than the integration pattern itself - but this is a BizTalk blog!

ESB-based integration

imageESB-based integration in BizTalk is today achieved using ESB Toolkit from Microsoft.

One of the ideas of ESB-based integration is that the integration platform should not have hardcoded routing information. These routing rules should either be looked up at run time or travel with the actual payload of the message (called “dynamic routing”).

This in combination with having generic on- and off-ramps instead of loads of separate ports to maintain promises to create more agile and configurable solutions (called “Loosely coupled service composition” and “Endpoint run-time discovery and virtualization”).

The figure again shows the invoice scenario used previously. In this case we export a “Invoice 1.0”-format from the sending system. The format is not directly bound to a schema in the port (as we are used to in BizTalk) but it is basically possible to send any message format to the on-ramp.

The on-ramp then identifies what kind of message it received (for example using the XML-namespace) and looks up the routing rules for that specific message (the message “itinerary”) in a repository.

In this scenario there could be rules to route the message to a orchestration or directly to a send port. _All the configuration for this port is however configured as part of the itinerary and applied at run-time on the generic off-ramp port. And as itineraries are simple XML documents they are super light to update and deploy in the repository! _

So if we now wanted to update the “Invoice”-format to “version 2.0” all we would have to do is to create a new XML itinerary for the “Invoice 2.0” message type and possibly create new orchestration to handle the new message type. No new ports, no new bindings etc, etc. The configuration and deployment would be a lot simpler than before! We would end up with a lot fewer artifacts to maintain.

And the itinerary for “Invoice 1.0” would still be applied to all incoming Invoice 1.0 messages. Thus we have achieved what used to be so hard with a lot less development power!

Issues

With agility and dynamic capabilities comes complexity in debugging when something bad happens … I also feel that we give up a lot of validating and control capabilities that we had in the more schema-coupled patterns previously discussed.

I am new to this ESB-based model and I would love to hear your experiences, problems, stories etc!

I am however sure that this is the way to go and I would not be surprised if ESB like patterns will play an ever bigger role in future BizTalk versions!