Overview#

Availability Management is the practice of identifying levels of IT Service availability for use in Service Level Reviews with Customers.

All areas of a service must be measurable and defined within the Service Level Agreement (SLA).

To measure service availability the following areas are usually included in the SLA:

  • Agreement statistics – such as what is included within the agreed service.
  • Availability – agreed service times, response times, etc.
  • Help Desk Calls – number of incidents raised, response times, resolution times.
  • Contingency – agreed contingency details, location of documentation, contingency site, 3rd party involvement, etc.
  • Capacity – performance timings for online transactions, report production, numbers of users, etc.
  • Costing Details – charges for the service, and any penalties should service levels not be met.

Availability is usually calculated based on a model involving the Availability Ratio and techniques such as Fault Tree Analysis, and includes the following elements:

  • Serviceability – where a service is provided by a 3rd party organisation, this is the expected availability of a component.
  • Reliability – the time for which a component can be expected to perform under specific conditions without failure.
  • Recoverability – the time it should take to restore a component back to its operational state after a failure.
  • Maintainability – the ease with which a component can be maintained, which can be both remedial or preventative.
  • Resilience – the ability to withstand failure.
  • Security – the ability of components to withstand breaches of security.

Availability Management and IT Security#

IT Security is an integral part of Availability Management, this being the primary focus of ensuring IT infrastructure continues to be available for the provision of IT Services.

Some of the above elements are really the outcome of performing a risk analysis to identify any resilience measures to be put in place, identifying just how reliable elements are and how many problems have been caused as a result of system failure.

The risk analysis also recommends controls to improve availability of IT infrastructure such as development standards, testing, physical security, the right skills in the right place at the right time, etc.

Service Level Agreements#

Service level agreements are clearly of fundamental importance with respect to availability management. For further information on SLA's see our specific SLA Page

More Information#

There might be more information for this subject on one of the following:

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-4) was last changed on 16-Sep-2012 12:18 by jim