Background#

Effective event correlation should be an integral part of modern infrstrucutre administration. Reducing alarm volume while improving information content is a key determinant of a successfully managed infrstrucutre.

Why it is a Problem#

If monitoring is done at the following levels:
  • Application
  • Operating System
  • Network

If an outage happens at the Network Level, then obivously the OS and Application level should have an event generation as it is not available. But, we do not want to cause trouble tickets to be generated for erroneous or redundant reasons.

An event correlation engine would be able to determine the "root" cause to be the Network and not cause ticket to go to the application owner.

Correlation is really the final piece in a network management environment. You get just so many events out there, whether polled events or trap events. Taking all that information and correlating it, and efficiently figuring out what’s important and what‘s not – what are false positives, etc -- is critical.

If I get “xyz” it might mean one thing; if I get “xyz” and you add “abc” to it, then it’s something else entirely. So, the correlation engine really drives root cause analysis as well.

Without correlation, you’re stuck with additional manual processes. If I can identify the event quickly with root cause analysis and I can figure out what’s causing that event through automated correlation, then I’ve effectively driven down response time and trouble shooting time.

More Information#

There might be more information for this subject on one of the following:

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-3) was last changed on 13-Apr-2014 09:19 by jim