Modeling a VPN network in nagios to suppress spurious notifications.

John P. Rouillard rouilj at cs.umb.edu
Mon Oct 10 23:09:15 CEST 2005


Hello all:

I have reached bit of fork in the road in a nagios deployment. I have
a physical network that has a VPN running on it across the Internet to
multiple sites. A large number of the hosts/services I need to monitor
are on the private network only.

The legend for the ASCII art network diagrams below.

    P(h1)     - Private interface for host 1 (172.16...)
    U(h1)     - pUblic interface for host 1 (some public IP)
    P(h1..h3) - Private interfaces for host 1 to host 3.
    P(s1)     - Private traffic on switch 1
    U(s1)     - public traffic on switch 1
    UP(s1)    - public and private traffic on switch 1
    U(I)      - Internet (public)
    P(I)      - Internet (private VPN tunneled traffic)
    P(v1)     - private address for VPN box
    U(v1)     - public address for VPN box
    U(r1)     - public router 1
    U(nagios) - public interface for nagios 
    P(nagios) - private interface for nagios 

Hopefully you can make some sense of the ASCII art. BTW does anybody
have a tool to generate ASCII diagrams from some sort of file
specification like dot/graphviz?

So the question is: what is the most efficient way to represent the
dependencies (via parent or host dependencies) to prevent
notifications on the VPN when the public network goes down?

I was thinking of creating two separate networks. Each would have the
parents defined as seen within its topology. So:

  private:
 P(h1..h3) <- P(s1) <- P(v1) <- private_internet <- P(v2) <- P(s2) <- P(nagios)
                                       v
                                       |
        P(h14..h16) <- P(s3) <- P(v3) -+

with switches and the private VPN addresses going to a "private
internet" and creating all the parents as though the private network
was a regular network.

Also create a similar one for the public traffic portion of the net.

  public:
   U(h3..h5) <- U(s1) <- U(r1) <- U(internet) <- U(r2) <- U(s2) <- U(nagios)
        U(v1) <---+                    v                    v
                                       |                    +-> U(v2)
                           U(v3) <- U(s3)
      U(h17..h19) <- U(s4) <-+

Then at each VPN point create a host dependency of the private VPN
"host" on the public host. The problem is that I don't think that host
dependencies will work in the in same way as the parent' directive
does when determining failure.

Another way to represent these network is to merge the two so that
each private VPN host [P(vN)] has its public host [U(vN)] as its
parent

  private+public:

   U(h3..h5) <- U(s1) <- public_internet <- U(s2) <- U(nagios)
                  |            |              ^ U(v2)
            +--<--+            |               	  ^P(v2) <- P(s2) <- P(nagios)
          U(v1)                |                    ** Note this
            |		       |
          P(v1)		       |
            |		       |
          P(s1) 	       |
            |		       |
        P(h1..h3)	       |
                               |
                    U(v3) <- U(s3)
                      |        ^----<  U(s4) <-- U(h17..h19)
                    P(v3)
                      |
     P(h14..h16) <- P(s3)


However, if I do this then an outage of P(v2) due to failure of the
VPN software, makes it look to Nagios like there is still connectivity
to the other VPN sites since the routes/parent dependencies for
P(nagios) and U(nagios) are identical due to both having U(s2) (or
public_internet) as a common child. However in reality you can't reach
P anything downstream of P(v2) if P(v2) or P(s2) are down. That's why
the parentage graph directions at ** note are pointing away from
P(nagios) rather than away from U(nagios).

Also to make things a bit more complex, the switches usually have
internal IP addresses but are partitioned so that they have both public
and private traffic on them.

Well thanks for making it this far in the email.

Even if you don't have an answer, does what I am asking even make
sense to anybody? If you have any questions, it might help me to see
things more clearly. If you have struggled with it and decided that
nagios isn't up to the task that is good to know as well. The multiple
hosts for public and private interfaces bothers me, but until nagios
becomes "network aware" in its outage determination for multi-homed
hosts, I think this is the only way to do it.

Also if anybody thinks I am nuts after reading this email, don't worry
about I think I am nuts too but that's a good thing (TM) 8-) .

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list