[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice needed: Controlling the number of child processes



I solved this mystery and thanks to all that helped:

As I get the zombies it is obvious that the parent process does not
wait for the status which means that the signal handler for some
children ist not called.  The real problem was that enough child
processes exited at the same that not all SIGCHLD signals got
delivered.  It was not a race condition, and, yes, wait() is
reentrant...

POSIX.1 states that when we establish a signal handler for SIGCHLD, and
there exists a terminated child who we have not yet waited for, it is
unspecified whether the signal is generated.

I suspect that this is what's really happening:  While in the signal
handler, I waited for the child status and in the very moment more
children terminate.  I must add that in my case many child process
terminate at almost the same moment. My solution was to increase a
variable in the signal handler and check if it increased in the program
mainline, but not to otherwise touch the variable.  When it increases, I
repeatedly call waitpid until there are no more terminated child
processes.  This seems work very well because if a group of children
terminates it is sufficient if only on SIGCHLD gets delivered.  Here is
the code fragment, and, can you spot the flaw?:

int zombies;	/* a misnomer: "zombies" increases while the number of
actual zombies decreases... */
int nsenders = 10;
int time_to_quit = 0;
void
sigchld(int signo)
{
	++zombies;
}

int
work()
{
	int zombies_so_far;
	
	senders = 0;
	zombies = zombies_so_far = 0;
	
	signal(SIGCHLD, sigchld);
		
	while (!time_to_quit) {		
		if (senders < nsenders) {
			if (!fork()) {
				/* do work */
				_exit(0);
			}
			++senders;
		} 
		
		while (zombies_so_far < zombies) {
			++zombies_so_far;
			while (waitpid(0, 0, WNOHANG) > 0)
				--senders;
		}
	};

	/* Wait for all childs to exit */
		
	while (wait(&status) != -1) ;
	
	time(&end_time);
	
	signal(SIGCHLD, SIG_IGN);
	
	return 0;
}



Visit your host, monkey.org