Bug/Array index out of bounds

Andreas Ericsson ae at op5.se
Thu Jun 4 11:07:50 CEST 2009


Tilo Renz wrote:
> Hi all,
> 
> Concerning the array index out of bounds in nagios-3.1.0/base/utils.c:4488 I investigated a bit further.
> Original code is:
> 4485:		/* get response */
> 4486:		recv_len=sizeof(recv_buf);
> 4487:		result=my_recvall(sd,recv_buf,&recv_len,2);
> 4488:		recv_buf[sizeof(recv_buf)]='\x0';
> 
> Not only in 4488 a Nullbyte is written beyond the limit of recv_buf, its also zero-padding at the wrong place. After calling my_recvall recv_len contains the number of received bytes. Zero-Padding at the buffer-end would leave undefined content beginning at recv_buf[recv_len].
> A fix should look something like this:
> 4485:		/* get response */
> 4486:		recv_len=sizeof(recv_buf)-1;
> 4487:		result=my_recvall(sd,recv_buf,&recv_len,2);
> 4488:		recv_buf[recv_len]='\x0';
> 
> By the way: before reading anything at all my_recvall needlessly zeroes
> out the complete recv_buf,

Right. It shouldn't touch the buffer at all before or after writing the
read data into it.

> but when reading, it tries to fill the buffer
> up to the end, whithout remaining space for the null-byte.

What makes you think it should save space for the nul byte (not null, that's
something else)? Since the size of the received data is returned as well,
we can safely access all the data we just read. We're reading a bytestream,
not a string. If the caller expects the bytestream to be nul-terminated,
that's a bug in the caller, not in the routine.

man 2 read

It works the same, and it's the only sensible way it *can* work. If it all
of a sudden starts adding nul bytes to the buffer you'll all of a sudden
end up unable to read binary byte sequences from files and sockets alike.

> Things like this are the place where sniffy C-programmers gamble away the
> negligible,  but always so upholded performance advantage of their error
> prone low level language.

Please refrain from cluttering the internet with such useless flame-inciting
statements. Especially in conjunction with such erroneous statements as
those above.

> Code smell, but positive after all: my_recvall is a not used anywhere else.
> 
> 
> During static code analysis of the nagios code I found two additional bugs in nagios:
> 
> * memory leak in nagios-3.1.0/cgi/cgiutils.c:1146
> If the function is left in line 1146, the Memory associated with new_mmapfile, allocated in line 1141 is lost. 
> One can argue memory leaks in cgi arent that harmful, because the program is
> short-running and the OS will do the garbage Collection for us. Also, if opening the file fails,
> results may be fubar anyway. 
> On the other hand leaking memory always causes bad karma.
> 

Wrong again. It's far more efficient to let the kernel release the allocated memory
than it is for the process to release it, since the libc allocator will utilize
arena allocation and just marking the allocated area in the arena as free when you
*do* call free(). The kernel will mark the entire memory page as free when the
process ends, which is a lot more efficient.

> * Same issue in nagios-3.1.0/cgi/statuswrl.c:932
> Memory associated with vrml_safe_hostname, allocated in line 921 will be lost.
> 
> 
> 
> Analyzing the code from ndoutils-1.4b7 I found another error.
> In ndoutils-1.4b7/src/ndo2db.c:625  _one_ childprocess-status is cleared.
> But before the signalhandler is executed another child may have finished its job. 
> There won't be a second signal for it, as a SIGCHLD is already pending. 
> One of the two child processes will remain an uncleared zombie until somebody terminates the ndo2db-daemon.
> Possible Fix:
> current code:
> 623		/* cleanup children that exit, so we don't have zombies */
> 624		if(sig==SIGCHLD){
> 625			waitpid(-1,NULL,WNOHANG);
> 626			return;
> 627		        }
> 
> should become something like:
> 623		/* cleanup children that exit, so we don't have zombies */
> 624		if(sig==SIGCHLD){
> 625			while( waitpid(-1,NULL,WNOHANG)>0 ) ;
> 626			return;
> 627		        }
> 
> Nagios itself does not contain this flaw. Most times waitpid is called with an explicit
> pid and without WNOHANG. In events.c:988 waitpid(-1,NULL,WNOHANG) is called, but protected
> with the suggested while-loop.)
> 

This is a real bug though. I'll make sure this comes to Ethan's attention.
Thanks for reporting this.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get




More information about the Developers mailing list