Quantcast
Channel: THWACK: Message List
Viewing all articles
Browse latest Browse all 20490

Re: NPM: Mass Outage Alerting- Over 50 nodes down, Send Alert

$
0
0

My reply to this thread is very late, however I ran across a similar scenario and I needed a solution. Hopefully if someone else runs across a similar problem this can help them. I'm sure there is a much better way to this. In my scenario I needed to be alerted only if 2 smtp application monitors were down simultaneously on different nodes. Furthermore, there shouldn't be an alert sent if only a single application monitor was down and only a single alert should be sent. My solution, however, can also be applied to mass outages.

 

Terms:

Group - A collection of objects you are trying to poll for a similar status. This is not related to the groups in solarwinds.

Polled Node - This is a node that is either part of the group, or an independent node that just has to exist.

 

Here is an example based off my application monitor scenario:

SELECT APM_ApplicationAlertsData.ApplicationID AS NetObjectID, APM_ApplicationAlertsData.Name AS Name

FROM APM_ApplicationAlertsData

WHERE(

(APM_ApplicationAlertsData.Availability = 'Down') AND

(APM_ApplicationAlertsData.Name = 'SMTPTest') AND

(APM_ApplicationAlertsData.ApplicationID= '323')

)

GROUP BY APM_ApplicationAlertsData.ApplicationID, APM_ApplicationAlertsData.Name

HAVING( SELECT COUNT(*) FROM APM_ApplicationAlertsData

WHERE APM_ApplicationAlertsData.Availability = 'Down' AND

APM_ApplicationAlertsData.Name = 'SMTPTest'

) = 2

 

I have not modified the SELECT statement from what the Advanced Alert Manager generates. Here my "Polled Node" has the ApplicationID 323. This node is part of the "Group". I could have just as easily created a node that wasn't part of the group. If the "Polled Node" isn't part of the group we want our WHERE section to always cause the query (omitting the Having section) to return our "Polled Node".

The HAVING statement is where the magic happens. Our sql query wont return our "Polled Node" (even if it is down) unless we have at least 2 application monitors named SMTPTest in the down state.

 

The reset action can't be left default.

SELECT APM_ApplicationAlertsData.ApplicationID AS NetObjectID, APM_ApplicationAlertsData.Name AS Name

FROM APM_ApplicationAlertsData

WHERE(

(APM_ApplicationAlertsData.Name = 'SMTPTest') AND

(APM_ApplicationAlertsData.ApplicationID= '323')

)

GROUP BY APM_ApplicationAlertsData.ApplicationID, APM_ApplicationAlertsData.Name

HAVING( SELECT COUNT(*) FROM APM_ApplicationAlertsData

WHERE APM_ApplicationAlertsData.Availability = 'Down' AND

APM_ApplicationAlertsData.Name = 'SMTPTest'

) != 2

 

Here the WHERE statement will always return true as long as that application monitor exists, however our reset sql query wont return our "Polled Node" until we no longer have 2 SMTPTest application monitors down.

 

The concept is similar to what the original poster had, however OP is using the TOP command to select the first node for his alerts. Well what if that node goes up, but there are still 50 nodes down? Another alert will be generated about 50 nodes being down on the new first node. Instead of selecting the first node, just create a brand new node and name it "Mass Outage". Poll this node in the "Where" section.

 

Where the NodeID of the "Mass Outage Node" is 311 (made up number)

 

SELECT Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name

FROM Nodes

WHERE Nodes.NodeID = '311'

GROUP BY NodeID, Caption

HAVING (SELECT COUNT(*)

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.')>=50

 

Reset:

SELECT Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name

FROM Nodes

WHERE Nodes.NodeID = '311'

GROUP BY NodeID, Caption

HAVING (SELECT COUNT(*)

FROM Nodes

WHERE StatusDescription LIKE 'Node Status is Down.')<50

 

Please let me know if I'm retarded and this isn't working how I think it is. I rarely touch SQL.


Viewing all articles
Browse latest Browse all 20490

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>