Oracle® Hyperion Data Relationship Management High Availability Strategies

Oracle® Hyperion Data Relationship Management High Availability Strategies

Hyperion Data Relationship Management - Version: 11.1.2.1.103 and later [Release: 11.1 and later ]
Information in this document applies to any platform.

Abstract

Oracle® Hyperion Data Relationship Management High Availability Strategies

This document goes over the High Availability strategies for Data Relationship Management and how to set it up.

Oracle® Hyperion Data Relationship Management High Availability Strategies

Release 11.1.2 1

1 Introduction

This document focuses on obtaining high availability for the Data Relationship Management system. The Data Relationship Management system does not have automatic built in failover capability therefore to obtain high availability the system needs to be configured in such a way that a secondary system can be activated in a short period of time.

The Data Relationship Management system can be scaled across multiple servers. This however is not a failover mechanism and should not be mistaken as a means to get high availability. The additional servers that can be scaled in the Data Relationship Management system provide additional Long Read Only Engines to allow long read operations like exports, queries and compares to be run without preventing short read and short write operations from completing.

2 Product Components

The Data Relationship Management system can be viewed as having the following major tiers for use in discussing high availability:

· Database Repository
· Application Server
· Web Server
· Communication Layer
· Client

The Application Server tier will be treated as a single component despite the fact that it can be scaled across multiple physical machines.
Note: This document only covers high availability considerations for core Data Relationship Management product components. Fusion Middleware components such as Hyperion Shared Services and Oracle SOA Suite that are offered with Data Relationship Management will not be addressed. 3

3 Architecture Diagram

4 Modes of Failure

The following modes of failure should be considered for each tier of a Data Relationship Management application:

· Database Repository

Mode 1 – Actual loss of the database and/or system
Mode 2 – Communications failure to the database
Mode 3 – Failure of database software (hangs, shutdowns…)
Mode 4 – Storage limitations, account issues, etc.

· Application Server

Mode 1 – Actual loss of the application server(s) (physical hardware)
Mode 2 – Failure of the application server software (hangs, shutdowns…)
Mode 3 – Data issue

· Web Server

Mode 1 – Actual loss of the web server(s) (physical hardware)
Mode 2 – Failure of the web server software (hangs, shutdowns…)

· Communication Layer

· Client

Note: The failure of the communications layer is out of the scope of this document as it relates to an organization’s overall network infrastructure and strategies. Many Client failures are related to the other failure modes and when they are not, a restart of the client should address any failures.

4.1 Database Repository Failure

Standard database procedures should be used to ensure the availability of the database repository for the Data Relationship Management system. Two main strategies can be used, periodic backups or replication.
If only periodic backups are performed, then there is the possibility of data loss based on the frequency of the backups. In addition to allow a backup system to be activated in the case of a failure the database backup would need to be restored onto the backup system. To allow for this to be done in a quick fashion, the backup could be restored to the secondary system immediately after it is completed.

If replication is being used then the restoration of the backup would not be needed as the secondary database system would be up to date.

Depending on the failure mode and the reasons behind it there are three main restart scenarios following a database layer failure.

1. Fix the database issue and restart the Data Relationship Management system
2. Reconfigure the Data Relationship Management System to the secondary database server (the database must be as up to date as possible through replication or restoring a backup)
3. Switch to the secondary Data Relationship Management system. (As in option 2 the database must be up to date)

4.2 Application Server Failure

Depending on the failure mode and the reasons behind it there are three main restart scenarios following an application server layer failure.

1. Restart the Data Relationship Management service on the primary system
2. Switch to the secondary system and configure it to use the primary database
3. Switch to the secondary system using the secondary database (the database must be as up to date as possible through replication or restoring a backup).

4.3 Web Server Failure

Data Relationship Management web applications can be clustered using Oracle HTTP Server to address high availability and failover requirements. If there is a failure on an IIS web server running a Data Relationship Management web application, Oracle HTTP Server can detect this situation and direct all user requests to other IIS web servers still in operation.

The Data Relationship Management web client is not pre-configured to run in a shared session state in IIS. Any user sessions that were active on the web server that failed will be terminated, while any sessions on other web servers will not be affected. Any new user sessions will be created on the remaining available web servers.

For more information on clustering web applications using Oracle HTTP Server, refer to the Load Balancing Data Relationship Management Web Applications section in the Oracle Hyperion Data Relationship Management Installation Guide.

5 Data Loss

Most of the transactions in the Data Relationship Management system are committed to the database repository immediately.

Operations done to a detached version however are only held in memory. The import and copy version operations always work on detached versions. The blend operation can be configured to work on a detached version.

5.1 Database Layer Failure

If the database is lost and / or the Data Relationship Management system is restarted using the secondary database then the potential loss of data depends on the backups done and or replication that is in place as well as any data loss due to detached (in memory) versions.

5.2 Application Server Failure

In the event of an application server failure, changes to detached versions will be lost along with any transactions that were in the middle of being processed. Most transactions however commit quickly and as such do not represent a large potential for data loss.

The data lost from detached versions can usually be regenerated (from copy, import and blend operations).

6 Restart Modes

Restarting the Data Relationship Management system after a failure can be done in several configurations depending on the reason for the failure and the database backup/replication that is in place.

· Restart the Data Relationship Management system on the primary server and database

Shut down the Data Relationship Management service on the primary server
Kill any hung engine processes (may be necessary if the restart is due to a hung system)
Restart the service on the primary server

· Switch the primary application server to the secondary database server

Shut down the Data Relationship Management service on the primary server
Change the configuration file to point to the secondary database server
Ensure that the secondary database server is up to date (restore of backup or via replication)
Restart the service on the primary server

· Switch to the secondary application server using the primary database server

Shut down the Data Relationship Management service on the primary server (if it is still available)
Change the configuration file on the secondary application server to point to the primary database server
Change the DNS to point the Data Relationship Management URL to the secondary application server IP
Start the Data Relationship Management service on the secondary application server

· Switch to the secondary application server and secondary database server

Shut down the Data Relationship Management service on the primary server (if it is still available)
Change the DNS to point the Data Relationship Management URL to the secondary application server IP
Ensure that the secondary database server is up to date (restore of backup or via replication)
Start the Data relationship Management service on the secondary application server

7 Failure Detection

Detecting a failure of the primary Data Relationship Management system can be done in multiple ways:

· Users attempt to log on or perform tasks via the Web Client
· Monitor processes like the Process and Event Managers on the application server
· Create a program that uses the web service API to ensure the system is running and to send out a notification if it does not respond

However, relying on users to perform manual actions is not proactive and monitoring the application server processes does not ensure detection of a failure on a system hang. An API program combined with standard database tools for monitoring the health of the database is the most effective strategy for detecting system failures.

For the program to do a quick test of the Data Relationship Management system, it should perform the following operations through the web service API:

1. Call the List Versions (tests connectivity to the system as well as ensuring the system is responding)
2. Call the Log interface to get the transactions for the session (verifies connectivity to the database)

The program could be run on a periodic basis (or designed to loop) and be used as the basis for notification or to start off an automated switchover.

Note: Running the program to frequently will put additional load on the system. Running it too infrequently means that the time between failure and detection may be longer than desired.

8 Automating System Restart and Switchover

A program that is written to test the health of the Data Relationship Management System could also initiate an automated restart / switchover procedure. However, there are multiple modes in which a restart or switchover can be done depending on the failure mode and the root issues of the failure. In addition, if the secondary database is not kept up to date via replication then some of the restart modes may actually cause users to lose data.

The automated restart / switchover process needs to be well thought out and fully tested to ensure its proper operation. It may be sufficient to have prepared scripts for the different restart modes and have the detection routine notify an administrator and then the administrator determines the best course of action and runs the appropriate script.

8.1 Automation Procedure Example

The following example illustrates the logic used for a scripted solution to automate system restart and switchover:

1. Primary application server is available and primary database server is available

o Yes

Restart the system on primary application server using primary database server

Wait for the application to initialize

Check for application availability (using failure detection program)

If the system is back up then notify the admin of the event and exit, if not then go to step 2

o No – Go to step 2

2. Primary database server is available and primary application server is not available

o Yes

Restart the system on secondary application server using the primary database server

Wait for the application to initialize

Check the application availability (using failure detection program)

If the system is back up then notify the admin of the event and exit, if not go to step 3

o No – Go to step 3

3. Primary application server is available and primary database server is not available

o Yes

Restart the system on primary application server using the secondary database server

Wait for the application to initialize

Check the application availability (using failure detection program)

If the system is back up then notify the admin of the event and exit, if not then go to step 4

o No – Go to step 4

4. Secondary application server is available and secondary database server is available

o Restart the system on secondary application server using secondary database server

o Wait for the application to initialize

o Check the application availability (using the failure detection program)

o If the system is back up then notify the admin of the event and exit, if not then go to step notify the admin of the failure to restart the system

Note: If the database is being replicated then the process could be simplified down to steps 1 and 4 only since there will be minimal data loss in switching to the secondary database server

Abdul Muqeet - Oracle Hyperion Blog

Friday, May 18, 2012

Oracle® Hyperion Data Relationship Management High Availability Strategies

Abstract

Oracle® Hyperion Data Relationship Management High Availability Strategies

Oracle® Hyperion Data Relationship Management High Availability Strategies

1 Introduction

2 Product Components

3 Architecture Diagram

4 Modes of Failure

· Database Repository

· Application Server

· Web Server

· Communication Layer

· Client

4.1 Database Repository Failure

4.2 Application Server Failure

4.3 Web Server Failure

5 Data Loss

5.1 Database Layer Failure

5.2 Application Server Failure

6 Restart Modes

7 Failure Detection

8 Automating System Restart and Switchover

8.1 Automation Procedure Example

Summary

No comments:

Post a Comment