Background#
Effective event correlation should be an integral part of modern infrstrucutre administration. Reducing alarm volume while improving information content is a key determinant of a successfully managed infrstrucutre.Why it is a Problem#
If monitoring is done at the following levels:If an outage happens at the Network Level, then obviously the Operating System and Application level should have an event generation as about Availability. But, we do not want to cause trouble tickets to be generated for erroneous or redundant reasons or to the teams that can not help fix the problem.
An Event Correlation engine would be able to determine the "root" cause to be the Network and not cause ticket to go to the application owner.
Event Correlation is really the final piece in a network management environment. You get just so many events out there, whether polled events or trap events. Taking all that information and correlating it, and efficiently figuring out whats important and whats not what are false positives, etc -- is critical.
If I get xyz it might mean one thing; if I get xyz and you add abc to it, then its something else entirely. So, the Event Correlation engine really drives root cause analysis as well.
Without Event Correlation, youre stuck with additional manual processes. If I can identify the event quickly with root cause analysis and I can figure out whats causing that event through automated correlation, then effectively driven down response time and trouble shooting time.