BizTalk Server 2004 opens up a whole new world, full of new concepts, principles and opportunities, ready for you to be discovered. Before we get started, I like to use your attention to call on participation in the BizTalk Server community. If you feel like sharing your knowledge, your findings or methodologies, please do so! It will not only encourage further activities in this community and help others; it also gives you an opportunity to validate your knowledge in any area you like to open for discussion…
BizTalk Server 2004 has just been released and a lot of people are planning to hit the road with it. While BizTalk Server 2004 solves many business problems, one issue with the software is the learning curve one has to go through to use the product to its maximum extent. Because the new version is completely rearchitected and has many new features, it is difficult to compare previous versions of BizTalk Server to BizTalk Server 2004. So, before you read on, clear your mind and forget about BizTalk Server 2002! It’ll help you, believe me.
In this paper, my goal is to provide you with a (relatively) short overview of the core workings of this brand new server. More in particular, I’ll focus on the messaging features it implements. Doing this, I’ll touch topics as: the MessageBox, pipelines, property promotion, and so on. With this in mind, don’t expect a complete feature overview or an in depth discussion of BizTalk Server 2004’s technologies as: human workflow, SSO, mapper, flat files, or even orchestration.
To make my life a little easier (at least: for writing this paper), I made following assumptions:
You are familiar with .NET, as well as with XML and XML schemas
You work with Visual Studio .NET
You know what EAI (Enterprise Application Integration) and ‘orchestration’ is about
You are willing to forget about any previous versions of BizTalk Server ;-)
Read on and discover what makes the core infrastructure of the server and how to use it.
BizTalk Server 2004 covers an enormous broad range of both technologies and features. For those of you who are completely new to EAI and BPM (Business Process Management), perhaps some context can clarify the ideas behind messaging and orchestration. In addition this section will attempt to situate the concept of “messaging”.
If you’re familiar with some basic EAI principles like loose coupling, pub/sub and orchestration, you may safely skip following sections.
When integrating applications, “loose coupling” is a tremendously cool word! Let’s look into this: what is loose coupling really about? If two applications need to establish some form of communication between each other, this can be done in a thousand ways.
An example: application A needs information from application B, which was developed prior to A. This means B isn’t even aware of A’s existence. Luckily, the developers of B have exposed an API to integrate with. Application A could thus call B’s API to gather the information it needs. No problem, works fine, life’s easy isn’t it?
Not always! Integrating applications usually is about making assumptions. And in the case of our example: we’ve made a whole lot of assumptions!! A few things we assumed:
B is running while A makes the call
B lives on the same box A does
B runs on the same platform as A does
B was programmed in the same language as A is
A will never need another application to provide the information
Most of these seem obvious, don’t they? And still there’s a problem with this! Imagine following scenarios for a moment:
What if we ever want to replace B by another, way better, application?
What if we want to scale-out B and move it to another box or cluster it?
What if application B is down while A needs it?
Because A makes quite a few assumptions about B, we’re in serious trouble now… Loose coupling of applications is exactly about preventing these kinds of problems! The less assumptions an application makes when integrating, the more loosely coupled it is.
Messaging is by far the most popular (and obvious) way to reduce coupling between applications. Messaging relies on middle-ware to transmit chunks of data (“messages”) from one application to another. (For example: Microsoft Message Queuing or IBM WebSphere MQ.) Usually this is done using “message queues”: a kind of “channel” that:
On it’s input side receives messages
On it’s output side delivers messages (or, to be more exact: enables the receiving application to pull messages from it)
The nice thing about queuing is that it works asynchronously. This means that the sending application can put messages in the queue while the receiving application can pull the messages out of the queue as soon as it has time for this. (For example when it has finished processing the previous message or when it was been down and brought up again.
Obviously, not all applications know how to integrate with queues. In most scenarios there will be some “adapter” converting the application’s output or input, to or from messages.
So, how does messaging reduce coupling?
Both applications don’t need to be running at the same time
Messaging middleware is not coupled to a single box
Messaging middleware is (in some cases) not coupled to a particular OS (WebSphere MQ for example)
Messaging is unaware of the programming language used to develop an application
Queues can easily be rerouted when applications are moved, this is a matter of configuration
In addition to reduce coupling, messaging makes solutions very scalable. Because of the queues acting as a load buffer, each application can work at its own speed.
Point to point, heading to spaghetti
We saw now that messaging can help us to reduce coupling between applications. This is a good thing but… often not enough! Imagine your:
Website talks to your order application
Order application talks to your stock application
Order application talks to an application checking credits
Your stock application talks to your supplier’s infrastructure
See what I mean? Even if every single communication is established using messaging, this would not be a good situation! This “spaghetti” kind of integration is likely to become unmanageable any time soon. Having a lot of point to point connection acting separately from each other is not only unmanageable; it also makes “business overviews” impossible. What do I mean with “business overviews” (sorry, couldn’t figure out a better word for this – except for the buzzwords BAS, BAM …)?
If you succeed in connecting each and every system in a whole ordering process, you’d expect to be able to make queries like: “How many red chairs, shipped to Europe, were shipped within first week of ordering?”. This requires some “business overview”, exceeding point to point communications.
Conclusion: spaghetti is just a very bad thing, it should be you enemy number one when integrating!
To overcome the problems related to having point to point connections, orchestration was invented. To be honest, in my opinion BizTalk Orchestration is actually one of the most exciting features in the server! So, what is orchestration – which is the core part of the Business Process Management (BPM) – about?
Instead of connecting each application directly to each other in a spaghetti-like way, without considering the bigger picture, orchestration enables you to define the message flow between multiple applications. This message flow can consist out of:
What can this look like for example:
“Start a new orchestration when an order message comes in. Check the customer’s credibility by sending out a request/response message to the companies Order Policy webservice. If customer is considered reliable, check if each of the products is in stock by sending out a request to the Stock Management webservice. Then send an invoice message to the Invoice service and wait until an InvoicePaid message is returned, notifying that the customer has paid. At the very end, send out the ShipProduct message to the Shipping service.”
By defining the flow between multiple applications, the message’s life can be tracked – which enables building those “business overviews” I talked about, or more formally called BAM. (Business Activity Monitoring)
At this point, we discussed how messaging and orchestration can make your life easier, but are these solving all problems? Assume an application A communicating with application B using messaging. This means that A and B both have to agree on the message queue to use. This is a shared piece of knowledge between both app A and app B which causes trouble to arise… What, for example, if we don’t know in advance where to route a message to? (A may produce messages that are of interest to both app B and C for example, depending on the message’s context.)
When purely using queuing, we are basically hard wiring connections between applications, which tends to grow in an unmanageable mess! So, someway, we should try to remove the shared knowledge of app A and B, in order to further reduce the coupling between both. Let’s finally get to the point: this is exactly what publish/subscribe is about!
In the publish/subscribe pattern two parties communicate without any shared knowledge! (At least, no shared knowledge is required besides an agreement on the message format.) These two parties are often referred to as the producer and the consumer or – more appropriate in the context of publish subscribe – the publisher and the subscriber.
The publisher publishes its output using messaging to the pub/sub engine while the subscriber subscribes itself with the engine using a filter. The filter, based upon message content criteria, specifies which messages are of interest to the subscriber. But, we’ll talk about that later J
Anyway, hope this gave you a little intro in the why and where of the messaging pattern. Following sections will discuss how these principles are used within Microsoft BizTalk Server 2004.
A Message is Born
Let’s make things easy…
Let simplify things from now on! Let’s introduce and discuss some concepts, step by step. BizTalk Server is about messages right? To put it very simple: receiving and sending messages… (Which is really really simplified, but hey, we have to start somewhere!) And most of them probably will be some form of XML. (Although, it’s perfectly possible to work with binary message formats as well!)
Unless you’ve lived your life on Mars the past few years, you know that XML message formats nowadays are usually described using the well known, well documented, proven and standardised XML Schema language. Unlike its predecessor, BizTalk Server 2004 has full support for XML Schema.
To do some useful things in BizTalk Server 2004, you need to tell BizTalk Server what kind of data you are going to pass and what structure the data has. When you do this, BizTalk Server can, based upon that information:
Make routing decisions
Make flow decisions in orchestrations
Transform a message between two different data structures, and so on.
As you can see, almost every feature relies on some knowledge about the structure of data. So in some way you have to pass this info to BizTalk Server!
Obviously, like stated before, XML schema is used to describe message structures. When you install BizTalk Server on a box with Visual Studio.NET installed, the installation will provide you with a second XML schema editor inside Visual Studio .NET. This may seem strange. I mean, “Why not leverage the existing Visual Studio .NET schema editor?” Good question! In addition to the features Visual Studio.NET has by default, the BizTalk Editor:
Enables you to define flat file structures
Prevents you from creating invalid schema’s
Gives you a convenient, graphical, user interface (not text-based), reducing the required technical level to write valid schemas
Supports BizTalk Server 2004 specific concepts like:
Now, how to create a schema and let BizTalk Server know it’s there?
These kinds of things are very well documented in the docs, but to be short: create a BizTalk Server 2004 project and add an item of type “Schema”. Then you can create the schema using BizTalk Editor. Eventually, when saving things, this results in an “.xsd” file on your file system. However this file is not what will be used by BizTalk Server.
To introduce the schema in BizTalk Server, the project has to be compiled and deployed. Compilation makes it a .NET DLL while deployment introduces the schema into the server. Upon deployment two important things happen:
The schema DLL is placed in the global assembly cache
BizTalk Server takes a look at the schema, extracts the most important information and places this information in its configuration database (BizTalkMgmtDb). Among that info is:
The schema’s target namespace and root tag name. (You’ll see later on why this is important to know this.)
The assembly’s name and the namespace and type of the schema class
Whether it is a flat file, XML schema or property schema, and so on
Ok, now we can make and introduce XML schemas into BizTalk Server. But we're not done yet. BizTalk Server still does not invent messages by itself, they have to originate somewhere. The following section takes a look at how to receive messages.
When you want BizTalk Server to receive data, you obviously have to tell it from where to receive messages. (Up till now, I am not aware of any paranormal features in the product.) Telling BizTalk Server where to receive messages is done by specifying a “receive port”. Each receive port can have multiple “receive locations” associated with it. Each receive location basically is a combination of:
A transport protocol - for example, FILE, BizTalk message queuing, MQSeries, FTP, SOAP, HTTP, SQL, and so on.
The transport protocol’s specific properties that specify from where exactly to get messages. For the FILE protocol, for instance, this is the directory from where to receive files. For the FTP protocol this would be a server name, a user name, a password, and so on.
(There’s more to configure but, as we simplifying things this is all you need to know.)
When you create receive ports and receive locations, give them names that you can recognise what they are for and can refer to them using their name.
Ok, fine, but how to do this? At development time (let’s keep deployment discussions out of the scope) BizTalk Server 2004 relies on Visual Studio.NET for this kind of configuration. In particular, a new tool called “BizTalk Explorer” is designed to do this. After creating a receive port and an associated receive location, the tool gives you the option of “starting” and “stopping” the location. Check the documentation for more information.
There’s more: a word on pipelines and normalisation
Let’s take a short look at the pipeline processing model now. When messages come in or are sent out of BizTalk Server, some pre-processing can happen. That processing is done in so called “pipelines”. Pipelines are implemented as .NET components. (To create them, use a “pipeline” item in a VS.NET BizTalk Server project.)
Pre-processing messages is useful in many scenarios, for example in:
Signing and verifying messages
Putting additional context on the message
Splitting a bulk message into smaller parts
For now it suffices to say that BizTalk Server offers the option of associating pipelines with receive ports as well as send ports (which will be discussed later on). In addition to pipelines, both receive and send ports can be configured to do message normalisation as well.
Normalising a message is useful for example when you expect 5 different message formats to come in, while all of them are destined to the same business process. Normalisation is this case takes care of transforming each message to a generic internal format, understood by the business process. (On the outbound side the opposite can be configured.)
Where did it go?
Great! We can receive messages now! But hmm… where do they go to? Well, the most straightforward answer is very simple: “the MessageBox”. The effective and more complete answer is also the more complicated one, which I will give an overview of in this section.
In fact, BizTalk Server 2004 can only do one single trick with an incoming message: route it! And when routing it, it only has two options:
Route it directly to an external location using one of it’s supported protocols
Route it to an orchestration (which can be a new one, but just as well can be a running one)
BizTalk does not make that decision by random, so again, we’ll have to tell it somehow, some way how to make this choice. To do this “subscription” is the keyword! Some background info will help here:
In everything it does, BizTalk builds upon a publish/subscribe architecture. This means that every message that passes the messaging infrastructure of the server (which obviously includes any incoming message), is routed based upon rules that evaluate:
The content of the message (Which is every form of data that can be found ínside the message itself.)
The context of the message (Which, as opposed to “content”, is all information that relates to a message, but can’t be found as part of the actual message data. For example: the name of the protocol used to receive the message; for a message that is receive by the FILE protocol: the original filename; for a message that originated in an MQSeries queue: the MQ message ID of it…)
Those “rules” are called “subscriptions”. An example of such subscription could be:
“(ProductID = 20) AND (Price > 5000)”.
If BizTalk is supposed to evaluate subscriptions like “(ProductID = 20) AND (Price > 5000)”, how can it ever know where to find the “ProductID” and “Price” in the message it evaluates? The answer to this one is again quite simple: because you told it! Read on to discover how…
For each message type you want to receive, you have to deploy a schema describing the structure of it. Because normal XML Schema schemas don’t offer the ability to ‘mark’ an element as “ProductID”, BizTalk Server 2004 introduces the concept called “property schema”.
Property schemas, in fact, are nothing more than regular schemas. However:
They are marked as being a ‘property schema’ using an XML Schema “AppInfo” Annotation element. (To put it simple, AppInfo annotation elements are used to provide comments to parts of a schema, meant to be used by programs.)
Directly beneath the schema node, the only allowed construct is “Element”.
Since each “Element” in the property schema has a name and a type, in the end, the schema is a list of names with corresponding types. BizTalk Server 2004 treats property schemas as a kind of template for “name-value” collections. (This is a dangerous statement, I’m aware of that. However up till a certain level, this makes sense…)
For example - if you want to create subscriptions that contain following labels: “OrderID”, “CustomerName” and “TotalAmount”, you have to create a property schema that contains 3 XML Schema elements: OrderID with type xs:int, CustomerName with type xs:string and TotalAmount with type xs:double.
At this point, there’s still no link between the properties your message can contain (defined in the property schema) and the actual message schema. The process of defining this coupling is called “property promotion”. Visual Studio.NET enables you to “link” one or more elements of a message schema to property in any of the referenced property schemas.
You can access the property promotion dialog by right clicking the message schema’s schema tree (click “Show Promotions” there). Since it has nothing to do with publish/subscribe, ignore the distinguished field tab. The tab called “Property Fields” is what defines the promotion. Again, using AppInfo annotations (containing XPath expressions), the elements are linked to properties in the actual schema.
In order to use properties, both property schema and the message schema have to be deployed. The moment the both are deployed, BizTalk knows where to find the “ProductID” in a message because the annotations on the message schema tell him that the “ProductID” in this kind of messages can be found on following XPath location: “*[local-name()='MyOrderElement' and namespace-uri()='mycompany:myapp:mymsgschema.xsd’]/*[local-name()='ProductIdentification' and namespace-uri()='mycompany:myapp:mymsgschema.xsd’]”
Ok, nice but… we were discussing how BizTalk decides on where the message is supposed to go to right? Well, each orchestration or external destination location can be associated with a subscription. BizTalk evaluates every subscription the moment a message needs to be routed, makes a copy of the message for each owner of a subscription that matches and routes the copy to its destination.
As explained, both orchestration as well as “external destination locations” can have subscriptions associated with them. At this point the only gap in the story is: “How can something ‘own’ a subscription?” right? Let’s fill that gap by explaining how sending of messages is done.
Let’s discuss two options when sending messages:
sending messages to an external location, using one of the supported protocols
sending messages to an orchestration
Sending messages to an external destination location
From now on, I’ll refer to an “external destination location” as a “send port”. Send ports are the concept used for all of BizTalk’s outgoing traffic. Remember receive ports which were nothing more than a grouping of one or more receive locations? Send ports as opposed to receive ports are not ‘subdivided’ in any way. (There is nothing such as a “send location”.)
A send port is (like a receive location is) a configurable item that gathers information on:
Which protocol to use for sending messages which use that port
To which location, specific to the protocol, the message is destined to go to (For example: for the HTTP protocol, this would be the server name and URL where as for the FILE protocol, a directory and filename has to be specified.)
The subscription (Yes!) a message has to comply with in order to be taken by the send port for transmission
(Once again: there’s more to tell and more to configure but let’s take it step by step. For now, this is all you need to be aware of…)
Just as with receive ports and location, send ports can be configured at development time using the BizTalk explorer inside Visual Studio.NET. When you try to do this, you’ll notice that next to the regular start/stop options like you had on receive locations, there’s an additional “enlist” and “unenlist” option!
Enlisting a send port technically means: adding it’s subscription to the subscriptions table in the MessageBox database (BizTalkMsgBoxDb). From the moment the send port is enlisted, all messages that come in and comply with the subscription, are queued and will wait until:
the send port is started AND
the protocol’s corresponding adapter is ready to transmit them
Unenlisting is doing just the opposite: it removes the subscription from the BizTalk configuration. From the moment on, a send port is unenlisted, no message at all will be transferred or queued using that port. (Even if it would comply with the subscription… The subscription in this case is just completely deactivated and removed!)
So, typically when you want to create a working send port, you’ll both enlist it and start it. When you want to prevent the port temporarily from being used, you can stop it and start it again at the appropriate time. You don’t lose any message because while it was stopped, messages were queued.
Sending messages to an orchestration
Just for the purpose of being complete, I’ll mention how orchestrations can subscribe to messages as well. Please note: I don’t have the intention of being complete in describing the workings of orchestrations at all!
Getting an orchestration to receive messages from somewhere can be done in two ways:
direct MessageBox binding
An orchestration can be started in two ways:
Another orchestration can cause it to start by ‘calling’ it.
A message is routed to it
Let’s leave the first option aside.
BizTalk orchestrations receive messages using two combined constructs:
To be short: receive shapes are the actual shape that will ‘contain’ your message. It is the shape that actually makes part of your message flow. When you want the orchestration to start a new instance as soon as a message arrives, you have to mark the receive shape (which will be the first shape in your flow) with “Activate = true”. (“Activate” is a just property on the shape that can be set using the Visual Studio.NET property grid.)
The orchestration port is your ‘connection with the outside world’. (“Outside” meaning: everything outside of the orchestration.) The port specifies your messaging pattern (receive, send, and request response…) and makes abstraction of things like:
Where the message comes from
Over what protocol it will be send
The above mentioned information is typically something you don’t want to specify at design time yet. The orchestration port enables you to make abstraction of it.
When you configure your solution (for testing purposes or deployment), BizTalk Server requires you to “bind” every orchestration port to a configured receive port. (The receive port exactly fills in the information that was “missing” at design time, remember? Protocol, receive location ...)
As soon as the receive shape has it’s “activate” set to “true”, a new property will come available on it: the “Filter Expression”. (Note: BizTalk Server’s terminology refers to such a shape as an “activate receive”.) The “Filter Expression” on the receive shape defines the subscription that will be created when the orchestration is deployed and started. The dialog used is exactly the one used also for the subscription configuration of send ports.
In the end, messages that comply with all of following criteria will be routed to the orchestration:
Messages that come in via every receive location that makes part of the receive port that was bound to the particular orchestration port
Messages that fulfil the subscription criteria, specified in the receive shape’s filter expression.
Remark: For those of you who are familiar with property promotion already - the properties you can use in the receive shape’s subscription are restricted to Message Context properties (you can’t use Message Data properties for this purpose).
Direct MessageBox binding
First, let me make a note: I don’t encourage the use of this advanced method for receiving messages! However, I included some comments on it for you, so when you really need it, you know it’s there… (Up till now, I have encountered only a few viable patterns in which the use of it, réally makes sense! If you can get away without using it, do so! Remember it’s an advanced feature, only needed in some cases.)
Regular binding requires you to bind your orchestration ports to a configured receive port, thus, requiring that the message comes in via that port! In some cases however:
You don’t always know in advance which port the message will come from
Messages don’t always originate in a receive port! Orchestrations, in some cases, can drop messages directly on the MessageBox, not using a send port, (see: Storage and queuing: the MessageBox) as well! (This is in fact an interesting discussion, which I will leave aside here.)
Direct MessageBox binding enables you to make subscriptions that apply to all and every message that needs routing.
An orchestration’s receive port, when being configured, offers three options for binding:
Specify now: this one is meant for development purposes (when the project is deployed, the necessary bindings and ports are created automatically)
Specify later: this option is what you’ll use in most of the cases
Direct: this one’s is what you need for MessageBox binding
Direct binding comes in a few flavours:
Bound to a port on a partner orchestration: enables you to receive messages from a known port on a partner orchestration
Self correlating: correlation happens based on a public port type
The last one applies the filter on the activate receive to every message that comes on to the MessageBox! Please note that this can be a dangerous thing to do since, for example, when an orchestration’s outbound message matches the subscription applied to a direct bound receive port, you’re stuck in a loop! Use this feature with care.
Storage and queuing: the MessageBox
Let’s go back to where we started this whole discussion about routing: what happens to my incoming message? My first answer was: the MessageBox. The MessageBox basically is (like more concepts are) just a bunch of SQL Server tables (contained in the BizTalkMsgBoxDb database).
The very most important concept in BizTalk Server 2004 is its MessageBox. (Also often abbreviated as “the MBox”.) Every (really every) message that has something to do with BizTalk passes the MessageBox.
It has several main functions:
Provide a persistent and transactional storage for messages that some way are being processed by the server. This includes:
Messages inside orchestrations
Messages outside an orchestration that are waiting to be picked up by one
Messages outside an orchestration that are waiting to be transferred to an external location (using one of the available protocol options)
Messages, waiting for their transfer to the tracking database (not discussed in this paper)
For orchestrations: when there are resources available to start a new orchestration instance, another message will be picked up from the appropriate orchestration’s queue. (Read: “some SQL Server construct”.)
For outgoing transport protocols: when a protocol’s “adapter” is ready to transfer new messages, it picks its next message up from its queue. (Neatly managed by BizTalk.) (I did mention “adapter” here: it is basically just the piece of software that handles a specific protocol’s details. Not going to discuss those details here.)
Some tables in the MessageBox contain all subscription-related data. Like already explained, when a subscription is enlisted the data is written to some persistent storage. That “storage” is the MessageBox!
Routing: happens inside the MessageBox.
Hopefully this could clarify some of the core principles of BizTalk Server 2004 regarding it’s messaging architecture. What was explained here certainly does not make up a complete overview, there’s more! For example: the important concepts called “correlation” and “convoys” are even not discussed here. I’ll leave those subjects for another time, paper or perhaps even another author.
I really hope to see some movement in the BizTalk Server community so that people together, sharing their knowledge and experience, can make stronger solutions using the right features and principles for the right purposes. I hope this paper already makes a step in going that way.
Try what you read, start experimenting, do some POC’s and decide for yourself which solutions could leverage which concepts.