Quantcast
Channel: THWACK: Message List
Viewing all articles
Browse latest Browse all 20490

How do I detect if an alert action has failed?

$
0
0

We recently had a PRODUCTION level issue which uncovered an issue with one of the multi-action alerts that I setup.

 

Basically, I wanted an email to be sent, a NetPerfmon event to be logged, and the 3rd was an imported action which sends a SOAP request to a 3rd party alerting system -- which pages our on-call technicians (PagerDuty).

 

During this particular outage, the SOAP request failed -- so the on-call's were not paged/alerted and resulted in HOURS of downtime.  Management is now perceiving Solarwinds as a 'single point of failure' as far as production alerting goes.  It had been reliable up til that point - so I would rather just DETECT that the alert action had failed, and take an alternative action.

 

Is this possible?  Basically, I want to show that we can 'monitor the monitor' itself -- and provide a fail safe method of understanding that any of the alert actions we assume worked -- actually did.  Perhaps I can setup a Powershell script to monitor a specific DB table for failure codes?

 

If someone can provide suggestions or examples of how they would handle this, that would be great.  Thanks!


Viewing all articles
Browse latest Browse all 20490

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>