Technical Insights: 2010

Wednesday, April 7, 2010

Distributed Configuration in Documentum

Single Repository Distributed Configuration

A single repository with content stored at the primary site and accessed from remote sites using ACS server and optionally BOCS Servers.
A single repository with content stored in a distributed storage area and accessed from remote sites using content file servers, ACS Servers and optionally BOCS.

Multi Repository Distributed Configuration

Multiple Repositories that replicate objects among themselves

Object replication replicates objects, both content and metadata, between repositories. Object replication jobs are user‑defined. In object replication, there is a source and target repository. A replication job replicates objects from the source repository to the target repository. Which objects are replicated and how often the job runs is part of the job’s definition. In the target repository, the replicated objects are marked as replica objects.

Multiple Repositories organized as a federation

In a federation, one repository is the governing repository. The remaining repositories are member repositories. Changes to global users and groups and external ACLS in the governing repository are propagated from the governing repository to all the member repositories automatically.

Documentum Compliance Manager

Documentum Compliance Manager (DCM) provides the ability to control documents using document classes, at its heart, to merge other configuration objects such as business applications, lifecycles, relationships, and auto-naming schemes, needed to control formatting and naming of documents checked into a DCM managed repository.

A business application must be defined and assigned to each document class created. The business application assigned to a document class determines what actions can or can not be taken against a particular document, identifies members in the coordinator and contributor roles that can access the document, and which templates and workflows that can be used to manage the document. The more items added to each of the Coordinators, Contributors, Template, and Workflow tabs, the greater the pick list in the list boxes when a controlled document is created against a particular document class.

A document class is created based on a document type selected and the business application selected. You can create a document class associated to a particular business application so that documents of the type selected are processed against the values specified for the attributes on the Info tab. Values specified for the attributes across each of the tabs match the values of the business application selected and are all inherited by default. Deselecting the Inherit option against an attribute allows you to make the necessary modifications.

Document classes provide the flexibility to create a controlled document type required for a particular business process, and to associate a set of default behavior and characteristics with that type. You can base a document class on any Documentum object type. Using the custom properties associated with a document class, you can define specific kinds of behavior, such as the document class’s versioning behavior, associated auto-naming scheme, and an associated lifecycle.

DCM provides the ability to route uncontrolled documents using standard workflows and the ability to route controlled documents using controlled workflows.

Documents managed by DCM are called controlled documents. Controlled documents advance through a series of lifecycle states and typically undergo some form of review and signoff before advancing to the Approved or Effective state in their lifecycle. The document class for a controlled document is configured by your administrator.

In DCM, a document is considered the main or active document if it is in a dcm_effective state.

Monday, March 22, 2010

Clustering vs Load Balancing

What's the difference, really?
There are actually quite a few differences, even if you ignore that clustering is generally used to refer to the capability of a software product to provide load-balancing services and load-balancing is often used to refer to a hardware-based (or at least third-party software) solution.

Clustering is most often used in conjunction with application servers such as BEA WebLogic, IBM WebSphere, and Oracle AS (10g). So are load-balancing features found within Application Delivery Controllers (ADC) like BIG-IP.

In the world of hardware load balancers the term "pool" or "farm" is used to describe a grouping of servers across which application requests will be distributed. Inthe world of software load balancing the term used is "cluster".
I will try to forget the use of the term factotum for this concept as it still gives me nightmares.

Scalability

Clustering typically makes one instance of an application server into a master controller through which all requests are processed and distributed to a number of instances using industry standard algorithms like round robin, weighted round robin, and least connections. Clustering, like load balancing, enables horizontal scalability, that is the ability to add more instances of an application server nearly transparently to increase the capacity or response time performance of an application. Clustering features usually include the ability to ensure an instance is available through the use of ICMP ping checks and, in some cases, TCP or HTTP connection checks.

ADCs typically support these same industry standard algorithms, but add more complex calculations and parameters that can include per-server CPU and memory resource utilization and fastest response times. ADCs also support health monitoring capabilities, but they generally go beyond the rudimentary capabilities of those found in application server clustering solutions. This includes the ability to verify content or perform passive monitoring which removes the relatively low impact of health checking on application server instances.

Server Affinity

Clustering uses server affinity to ensure that applications requiring the user interact with the same server during a session get to the right server. This is most often used in applications executing a process, for example order entry, in which the session is used between requests (pages) to store information that will be used to conclude a transaction, for example a shopping cart.

ADCs use persistence to provide the same functionality. While clustering solutions are generally limited in the variables that can be used, ADCs can use traditional application variables as well as custom information from within the application data or network-based information.

High Availability (Failover)

Clustering solutions claim to provide HA/Failover capabilities, when this failover is related to application process level failover, not high availability of the clustering controller itself. This is an important distinction as in the event the clustering controller instance fails, the entire system falls apart. While cluster-based load-balancing provides high availability for members of the cluster, the controller instance becomes a single point of failure in the data path.

ADCs are built for redundancy and include sophisticated features that not only ensure applications are still available if one ADC fails, but also replicates session state between two ADCs such that if the primary fails the application sessions are not lost. This replication capability is also available in most clustering application server solutions.

Transparency

Many clustering solutions require a node-agent be deployed on each instance of an application server being clustered by the controller. This agent is often already deployed, so it's often not a burden in terms of deployment and management, but it is another process running on each server that is consuming resources such as memory and CPU and which adds another point of failure into the data path.

ADCs require no server-side components, they are completely transparent.

Making A Choice
So which should you chose? That depends highly on the reasons you are considering implementing either clustering or deploying an ADC and whether or not you will have to make an additional purchase to enable clustering capabilities for your particular application server. There's also a broader question of whether you will need to provide this support for more than one application server brand. Clustering is proprietary to the application server while ADCs can provide these services for any application or web server.

Clustering
The pros:
•Generally available as part of an enterprise package for an application server
•Solution doesn't require a lot of networking skills
•Generally less expensive than a redundant ADC deployment
The cons:
•High availability is not assured using clustering solutions
•Best practices dictate the cluster controller be deployed on separate hardware
•Requires node agents on managed application server instances
•Clustering is "proprietary" in that you can only cluster homogeneous servers.

ADCs
The pros:
•Can provide high availability and load balancing across heterogeneous environments
•Offers additional value such as optimization, security, and acceleration for applications
•Transparent - doesn't require changes to applications or the servers on which they are deployed
The cons:
•Adds another piece of infrastructure to the architecture
•Generally more expensive than clustering solutions
•May require a new set of skills to deploy and manage

Wednesday, April 7, 2010

Distributed Configuration in Documentum

Documentum Compliance Manager

Monday, March 22, 2010

Clustering vs Load Balancing

Headlines Today