Tuesday, August 16, 2005

The SOA Triangle

Recently I’ve worked on a SOA Readiness Assessment for a client from a data-centric perspective. Some deficiencies I observed there are probably representative of other enterprises with a heterogeneous IT system landscape that has grown organically and was integrated as need arose in an ad-hoc and piece-meal fashion:

Limited Flexibility
  • mainly batch base file integration between a fragmented system landscape
  • difficult to implement new services and answer to customer needs in an adequate time
Data Duplication and Inconsistencies
  • Data assets are stored in a wide array of disparate systems and sources ranging from mainframes to relational databases;
  • Data is replicated between individual systems that each use disparate databases and directories. Data replication takes place with time delay. Consequently there’s a window of time where conflicting changes to replicated database records can happen. This makes it a challenge to maintain an aequate level of data quality. Manual intervention is frequently necessary.
  • Every disparate system has its own master dataset. There‘s no clear concept of what information is relevant where at what point in time. Data is fragmented, no unified view of enterprise data exists; in some instances this becomes obvious to the customers (e.g., multiple call centers, letters going to two different addresses).
Unreliable Data Exchange
  • Risk of losing in-flow information during a crash (ad-hoc and often batch-file-transfer based processes).
To address these issues I came up with a graphical representation that I want to call the "SOA triangle" (not to be confused with the Bermuda triangle -- although you can get lost in both):

Data Access Services (DAS) : A critical foundation and abstraction layer for the access of enterprise information. They isolate both applications/service and underlying datastores/directories from change by acting as a layer of abstraction.

Data Ownership
  • Role- and area-based access concepts can be implemented based on a multidimensional CRUD matrix
  • Fine grained access control based e.g., on process-level authentication tokens
  • Tighter management/operational controls
  • Provide a data model that is specific to the operational unit or a process
  • Shield applications from the complexities, location, access protocols and consistency requirements of underlying enterprise data source
  • Useful to percolate updates simultaneouosly to multiple backend datastores (see performance/caching later). This improves the consistency/integrity/quality of enterprise data.
  • Basis for a step-wise enterprise architecture transformation: Applications/services and underlying datastores are not directly dependent on each other. Therefore, applications are isolated from structural changes/merges/etc of underlying datastores. At the same time, applications can be modified to use the new DAS-interfaces without modifying the databases.
Avoid impedance mismatch
  • The DAS can offer an multiple process- or OU-specific interfaces to the same backend datastore with high level, business-event oriented semantics.
  • This is more realistic than a unified company-wide data model on which everybody must agree (or face a winner-loser situation with some OUs being forced to use a model that doens‘t fulfil their expectations)

Messaging/ESB : Serves as the SOA Communication Backbone/Infrastructure

Abstract Communication Layer with Virtualised Service Endpoints
  • Avoids "SOA Spaghetti" : Services are connected declaratively. Crosscutting concerns, such as delivery guarantees, security, messaging patterns should not be in the responsibility of services.
  • Reliable Information Exchange between Disparate Systems: Once-and-only-once semantics guarantee that messages are not lost during machine or network failures.
  • Asynchronous Communication: Failure of a component in a business process delays the process rather than failing it.
  • Improved overall system reliability. This is important, because in a complex IT landscape like Cablecom‘s the chances of individual components failing increases exponentially. Store-and-forward means a process can continue once a failed component comes back online.
  • Different message exchange patterns (point-to-point, publish-subscribe) possible
Straight-Through Processing (STP)
  • Improved agility through harnessing the real-time data that are flowing around the business
  • Ability to quickly acquire, analyze, and act on both opportunities and issues within the SOA implementation.

Caching : A caching layer can decrease the frequency of interaction with backend data stores.

Performance is of concern in a SOA
  • DAS require frequent interaction with backend datastores
  • In addition, services ideally are statless; i.e., services keep their state in persistent storage rather than in-memory (see next slide)
This issue can be addressed by software caching:
  • Products like Times10 offer in-memory replicated reliable caches with persistence guarantees comparable to database.
  • Physical database access is therefore reduced (since most requests can be served from the cache)
  • greatly improving data access performance.

1 comment:

Arnaud said...

Hi Thomas,

I like what you say but however I have got questions comments:
Ideally don’t you want to merge the three vertices? It is not clear if you are suggesting that one DAS should not cash data or be implemented as an ESB service.
I can see that some advantages of Messaging/ESB are also true for DAS if you use a WS-fabric that implements WS-RM obviously.