This blog entry has been literally a year in the making!
While working at a client, the requirement to seperate claims, decide which system it went to (QNXT or the exising mainframe system).
I decided that using the multiple 837 schema was the best approach. This means that the HIPAA accelerator is going to take the HIPAA file and submit claims (in XML format) individually to the message box. With those messages, I created a singleton orchestration process to pick up each of the messages. It would pick up each of the messages, and individually go through some calls to find out which system it went to.
Once the decision was made, I would concatonate the message to the rest of the messages that have already come in for this HIPAA transaction.
What I saw happening was that during the concatonation process it was taking longer and longer to append the current message to the rest of the messages.
Directions changed, and we moved from having BizTalk being the routing application for various reasons; speed of processing being one of the many reasons.
I worked at another client, and the same issue came up. We started off working with eligibility files (834) and I started with the same approach, and immediately saw the concatonation process increasing in time to complete as the more messages it processed. This time we were able to test with some significantly large files so I could get some real numbers to look at.
It started out taking 1 second to process the first subscriber in the 834, and then by the time it was down to subscriber 1000, it was taking 10 seconds to finish the process. I then needed to come up with a different approach, because these were relitivley small files, and we were looking at getting files that had 200,000 subscribers.
I thought, I need a way to store the data in a manner that will not continually increase in time as the dataset grew, what could I use? Well, a database table came to mind, I could place the data in a table, process them and then when it has all been completed, I then could just extract the data out of the database table and send it off.
I implemented sending the data to a database table and not concatonating the messages together. Once I started testing I immediately saw an improvement in performance! In looking at the details, I was seeing that the first subscriber took 1 second to process, so also did subscriber 1000!
I was not satisfied though: if each subscriber was going to take 1 second, then the graph below shows the time to process the file.
|Subscribers ||Minutes ||Hours |
|1,000 ||16.67 ||0.28|
|2,000 ||33.33 ||0.56|
|3,000 ||50.00 ||0.83|
|4,000 ||66.67 ||1.11|
|5,000 ||83.33 ||1.39|
|6,000 ||100.00 ||1.67|
|7,000 ||116.67 ||1.94|
|8,000 ||133.33 ||2.22|
|9,000 ||150.00 ||2.50|
|100,000 ||1,666.67 ||27.78|
The question then was, how do I process them faster? How could I send them to the database faster than what improvements I have already done. I might be able to optimize the extraction and sending to the database a little, but even if I were to cut it in half, I would still be looking at 13 plus hours to complete a single file.
What if I ran multiple occurrences of the extraction process at the same time? I would break my singleton orchestration, but I would essentially open up the flood gates, and it could process as many messages as possible at the same time. The next question then came to mind, what about the very distinct possibility that there would be table locking issues as I would be doing multiple inserts into the same table at once? I need a highly optimized process to insert data into the database that can handle the possiblity of many inserts happening at once. I am also not a database guru, so I needed something that someone else has developed that I can implement.
BAM - it hit me. BAM (Business Activity Monitoring) is optimized to accept many messags and insert them into a table and it definately has to be designed to capture many messages at the same time. There are two flavors of BAM that can be invoked from 2004, DirectEventStream and BufferedEventStream. I decided that because using DirectEventStream would cause performance issues, going to the BufferedEventStream route would be possibly the best approach. So I have many messages being processed, and then BAM data is sent to the MessageBox to be inserted into the BAMPrimaryImport database when BizTalk got around to it.
I implemented this approach, and increased the processing speed from 1 subscriber per second to 10 per second!
The next issue was, how do I know when it is complete and when can I extract the data from the BAM database? I needed a monitoring service to watch and see when inserts were done for this file and once it has completed, extract the data and create the output.
What if I had each of the processes that sent data to BAM send a message to another orchestration and consumes those messages, as soon as the messages quit coming, go and check the database to make sure that the rows are there, as soon as all of the rows are there, then extract the data.
This is where I thought would be a very simple process, it ended up being yes (kinda), but I normally have to do things the hard way before finally getting it working successully, and this did not stray too far from my past experiences.
This is the design that I had, many orchestrations would be running, I would have an orchestration that would be picking up all of the messages created by the HIPAA to BAM orchestrations, as soon as I quit receiving the messages, I would make sure that the same number of rows in BAM matched the same number that I picked up. Once everything matched, I would extract the data. I have to check the number of rows against what I picked up because with BufferedEventStream, messages are sent to the MessageBox and inserted when resources are availble, not directly like DirectEventStream. So I could get the last message from the HIPAA to BAM orchestration before the last row is inserted. Below the vision I had:
This is where it got fun!
After using Kevin Lam's blog as a guide, I implemented forward partner direct binding.
I have created a simple prototype on implementing the forward partner direct binding approach. The first orchestration consumes all messages from a particular port. The sample message looks like this:
<ns0:Root SenderId="123456" xmlns:ns0="http://PartnerPortExample.Input">
It would then create create a message that just had the SenderId to be sent to the Singleton Orchestration that it would correlate on and pick up all messages for that SenderId.
<ns0:Root SenderId="123456" xmlns:ns0="http://PartnerPortExample.Status" />
I promoted the SenderId in the http://PartnerPortExample.Status message. One key thing to take away is that the property item needs to be a MessageDataPropertyBase. If it is a MessageContextBase it will not work. If the promoted field is a context property, the the subscription engine cannot match the message the the HIPAA to BAM orchestration to the Singleton Orchestration, it will state, that no matching subscription could be found.
I then set outgoing port on the HIPAA to BAM orchestration to Direct, and chose the Singleton orchestration as the partner port. In the Singleton orchestration, I set the partner port to itself.
I also set up the correlation set to drive off of the SenderId.
Below are some screen shots of the prototype:
Here is the Process orchestration that takes the original file, and extracts the SenderId into the Status message, some things to notice is that the binding on the InternalPort is set to Direct and the Partner Orchestration Port is set to the Singleton orchestration.
The code in the Assign Promotion Message Assignment shape is the following:
Here is the Singleton Orchestration that loops thru capturing all of the messages that have the PartnerPortExample.id correlation set.
It then creates an message recording the number of files it processed an sends the following message with the SenderId as the filename:
<ns0:Root Count="95" xmlns:ns0="http://PartnerPortExample.Result" />
Here is the code in the message assignment shape:
TempXML.LoadXml("<ns0:Root Count=\""+System.Convert.ToString(Count)+"\" xmlns:ns0=\"http://PartnerPortExample.Result\" />");
I want to thank Jeff Davis, Keith Lim, Kevin Lam, and Adrian Hamza on helping me determine that you cannot have context properties be the correlation set on partner ports.
Through my 'contact me' page,let me know if you would like to get a copy of my prototype.