158 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			158 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| 2012-05-08  cgf-000004
 | |
| 
 | |
| The change for cgf-000003 introduced a new problem:
 | |
| http://cygwin.com/ml/cygwin/2012-05/msg00154.html
 | |
| http://cygwin.com/ml/cygwin/2012-05/msg00157.html
 | |
| 
 | |
| Since a handle associated with the parent is no longer being duplicated
 | |
| into a non-cygwin "execed child", Windows is free to reuse the pid of
 | |
| the parent when the parent exits.  However, since we *did* duplicate a
 | |
| handle pointing to the pid's shared memory area into the "execed child",
 | |
| the shared memory for the pid was still active.
 | |
| 
 | |
| Since the shared memory was still available, if a new process reuses the
 | |
| previous pid, Cygwin would detect that the shared memory was not created
 | |
| and had a "PID_REAPED" flag.  That was considered an error, and, so, it
 | |
| would set procinfo to NULL and pinfo::thisproc would die since this
 | |
| situation is not supposed to occur.
 | |
| 
 | |
| I fixed this in two ways:
 | |
| 
 | |
| 1) If a shared memory region has a PID_REAPED flag then zero it and
 | |
| reuse it.  This should be safe since you are not really supposed to be
 | |
| querying the shared memory region for anything after PID_REAPED has been
 | |
| set.
 | |
| 
 | |
| 2) Forego duping a copy of myself_pinfo if we're starting a non-cygwin
 | |
| child for exec.
 | |
| 
 | |
| It seems like 2) is a common theme and an audit of all of the handles
 | |
| that are being passed to non-cygwin children is in order for 1.7.16.
 | |
| 
 | |
| The other minor modification that was made in this change was to add the
 | |
| pid of the failing process to fork error output.  This helps slightly
 | |
| when looking at strace output, even though in this case it was easy to
 | |
| find what was failing by looking for '^---' when running the "stv"
 | |
| strace dumper.  That found the offending exception quickly.
 | |
| 
 | |
| 2012-05-07  cgf-000003
 | |
| 
 | |
| <1.7.15>
 | |
| Don't make Cygwin wait for all children of a non-cygwin child program.
 | |
| Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00063.html,
 | |
|        http://cygwin.com/ml/cygwin/2012-05/msg00075.html
 | |
| </1.7.15>
 | |
| 
 | |
| This problem is due to a recent change which added some robustness and
 | |
| speed to Cygwin's exec/spawn handling by not trying to force inheritance
 | |
| every time a process is started.  See ChangeLog entries starting on
 | |
| 2012-03-20, and multiple on 2012-03-21.
 | |
| 
 | |
| Making the handle inheritable meant that, as usual, there were problems
 | |
| with non-Cygwin processes.  When Cygwin "execs" a non-Cygwin process N,
 | |
| all of its N + 1, N + 2, ...  children will also inherit the handle.
 | |
| That means that Cygwin will wait until all subprocesses have exited
 | |
| before it returns.
 | |
| 
 | |
| I was willing to make this a restriction of starting non-Cygwin
 | |
| processes but the problem with allowing that is that it can cause the
 | |
| creation of a "limbo" pid when N exits and N + 1 and friends are still
 | |
| around.  In this scenario, Cygwin dutifully notices that process N has
 | |
| died and sets the exit code to indicate that but N's parent will wait on
 | |
| rd_proc_pipe and will only return when every N + ...  windows process
 | |
| has exited.
 | |
| 
 | |
| The removal of cygheap::pid_handle was not related to the initial
 | |
| problem that I set out to fix.  The change came from the realization
 | |
| that we were duping the current process handle into the child twice and
 | |
| only needed to do it once.  The current process handle is used by exec
 | |
| to keep the Windows pid "alive" so that it will not be reused.  So, now
 | |
| we just close parent in child_info_spawn::handle_spawn iff we're not
 | |
| execing.
 | |
| 
 | |
| In debugging this it bothered me that 'ps' identified a nonactive pid as
 | |
| active.  Part of the reason for this was the 'parent' handle in
 | |
| child_info was opened in non-Cygwin processes, keeping the pid alive.
 | |
| That has been kluged around (more changes after 1.7.15) but that didn't
 | |
| fix the problem.  On further investigation, this seems to be caused by
 | |
| the fact that the shared memory region pid handles were still being
 | |
| passed to non-cygwin children, keeping the pid alive in a limbo-like
 | |
| fashion.  This was easily fixed by having pinfo::init() consider a
 | |
| memory region with PID_REAPED as not available.  A more robust fix
 | |
| should be considered for 1.7.15+ where these handles are not passed
 | |
| to non-cygwin processes.
 | |
| 
 | |
| This fixed the problem where a pid showed up in the list after a user
 | |
| does something like: "bash$ cmd /c start notepad" but, for some reason,
 | |
| it does not fix the problem where "bash$ setsid cmd /c start notepad".
 | |
| That bears investigation after 1.7.15 is released but it is not a
 | |
| regression and so is not a blocker for the release.
 | |
| 
 | |
| 2012-05-03  cgf-000002
 | |
| 
 | |
| <1.7.15>
 | |
| Fix problem where too much input was attempted to be read from a
 | |
| pty slave.  Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00049.html
 | |
| </1.7.15>
 | |
| 
 | |
| My change on 2012/04/05 reintroduced the problem first described by:
 | |
| http://cygwin.com/ml/cygwin/2011-10/threads.html#00445
 | |
| 
 | |
| The problem then was, IIRC, due to the fact that bytes sent to the pty
 | |
| pipe were not written as records.  Changing pipe to PIPE_TYPE_MESSAGE in
 | |
| pipe.cc fixed the problem since writing lines to one side of the pipe
 | |
| caused exactly that the number of characters to be read on the other
 | |
| even if there were more characters in the pipe.
 | |
| 
 | |
| To debug this, I first replaced fhandler_tty.cc with the 1.258,
 | |
| 2012/04/05 version.  The test case started working when I did that.
 | |
| 
 | |
| So, then, I replaced individual functions, one at a time, in
 | |
| fhandler_tty.cc with their previous versions.  I'd expected this to be a
 | |
| problem with fhandler_pty_master::process_slave_output since that had
 | |
| seen the most changes but was surprised to see that the culprit was
 | |
| fhandler_pty_slave::read().
 | |
| 
 | |
| The reason was that I really needed the bytes_available() function to
 | |
| return the number of bytes which would be read in the next operation
 | |
| rather than the number of bytes available in the pipe.  That's because
 | |
| there may be a number of lines available to be read but the number of
 | |
| bytes which will be read by ReadFile should reflect the mode of the pty
 | |
| and, if there is a line to read, only the number of bytes in the line
 | |
| should be seen as available for the next read.
 | |
| 
 | |
| Having bytes_available() return the number of bytes which would be read
 | |
| seemed to fix the problem but it could subtly change the behavior of
 | |
| other callers of this function.  However, I actually think this is
 | |
| probably a good thing since they probably should have been seeing the
 | |
| line behavior.
 | |
| 
 | |
| 2012-05-02  cgf-000001
 | |
| 
 | |
| <1.7.15>
 | |
| Fix problem setting parent pid to 1 when process with children execs
 | |
| itself.  Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00009.html
 | |
| </1.7.15>
 | |
| 
 | |
| Investigating this problem with strace showed that ssh-agent was
 | |
| checking the parent pid and getting a 1 when it shouldn't have.  Other
 | |
| stuff looked ok so I chose to consider this a smoking gun.
 | |
| 
 | |
| Going back to the version that the OP said did not have the problem, I
 | |
| worked forward until I found where the problem first occurred -
 | |
| somewhere around 2012-03-19.  And, indeed, the getppid call returned the
 | |
| correct value in the working version.  That means that this stopped
 | |
| working when I redid the way the process pipe was inherited around
 | |
| this time period.
 | |
| 
 | |
| It isn't clear why (and I suspect I may have to debug this further at
 | |
| some point) this hasn't always been a problem but I made the obvious fix.
 | |
| We shouldn't have been setting ppid = 1 when we're about to pass off to
 | |
| an execed process.
 | |
| 
 | |
| As I was writing this, I realized that it was necessary to add some
 | |
| additional checks.  Just checking for "have_execed" isn't enough.  If
 | |
| we've execed a non-cygwin process then it won't know how to deal with
 | |
| any inherited children.  So, always set ppid = 1 if we've execed a
 | |
| non-cygwin process.
 |