174 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			174 lines
		
	
	
		
			7.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| <sect1 id="using-textbinary"><title>Text and Binary modes</title>
 | |
| 
 | |
| <sect2> <title>The Issue</title>
 | |
| 
 | |
| <para>On a UNIX system, when an application reads from a file it gets
 | |
| exactly what's in the file on disk and the converse is true for writing.
 | |
| The situation is different in the DOS/Windows world where a file can
 | |
| be opened in one of two modes, binary or text.  In the binary mode the
 | |
| system behaves exactly as in UNIX.  However on writing in text mode, a
 | |
| NL (\n, ^J) is transformed into the sequence CR (\r, ^M) NL.
 | |
| </para>
 | |
| 
 | |
| <para>This can wreak havoc with the seek/fseek calls since the number
 | |
| of bytes actually in the file may differ from that seen by the
 | |
| application.</para>
 | |
| 
 | |
| <para>The mode can be specified explicitly as explained in the Programming
 | |
| section below.  In an ideal DOS/Windows world, all programs using lines as
 | |
| records (such as <command>bash</command>, <command>make</command>,
 | |
| <command>sed</command> ...) would open files (and change the mode of their
 | |
| standard input and output) as text.  All other programs (such as
 | |
| <command>cat</command>, <command>cmp</command>, <command>tr</command> ...)
 | |
| would use binary mode.  In practice with Cygwin, programs that deal
 | |
| explicitly with object files specify binary mode (this is the case of
 | |
| <command>od</command>, which is helpful to diagnose CR problems).  Most
 | |
| other programs (such as <command>cat</command>, <command>cmp</command>,
 | |
| <command>tr</command>) use the default mode.</para>
 | |
| 
 | |
| </sect2>
 | |
| 
 | |
| <sect2><title>The default Cygwin behavior</title>
 | |
| 
 | |
| <para>The Cygwin system gives us some flexibility in deciding how files 
 | |
| are to be opened when the mode is not specified explicitly. 
 | |
| The rules are evolving, this section gives the design goals.</para>
 | |
| <orderedlist numeration="loweralpha">
 | |
| <listitem>
 | |
| <para>If the filename is specified as a POSIX path and it appears to
 | |
| reside on a file system that is mounted (i.e.  if its pathname starts
 | |
| with a directory displayed by <command>mount</command>), then the
 | |
| default is specified by the mount flag.  If the file is a symbolic link,
 | |
| the mode of the target file system applies.</para>
 | |
| </listitem>
 | |
| <listitem>
 | |
| <para>If the file is specified via a MS-DOS pathname (i.e., it contains a
 | |
| backslash or a colon), the default is binary.
 | |
| </para>
 | |
| </listitem>
 | |
| <listitem>
 | |
| <para>Pipes and non-file devices are opened in binary mode,
 | |
| except if the <envar>CYGWIN</envar> environment variable contains
 | |
| <literal>nobinmode</literal>.</para>
 | |
| <warning><title>Warning!</title><para>In b20.1 of 12/98, a file will be opened
 | |
| in binary mode if any of the following conditions hold:</para> 
 | |
| <orderedlist numeration="arabic" spacing="compact">
 | |
| <listitem><para>binary mode is specified in the open call</para>
 | |
| </listitem>
 | |
| <listitem><para>the filename is a MS-DOS filename</para>
 | |
| </listitem>
 | |
| <listitem><para>the file resides on a binary mounted partition</para>
 | |
| </listitem>
 | |
| <listitem><para><envar>CYGWIN</envar> contains <literal>binmode</literal></para>
 | |
| </listitem>
 | |
| <listitem><para>the file is not a disk file</para>
 | |
| </listitem>
 | |
| </orderedlist>
 | |
| </warning>
 | |
| </listitem>
 | |
| 
 | |
| <listitem>
 | |
| <para> When redirecting, the Cygwin shells uses rules (a-e). For
 | |
| these shells the relevant value of <envar>CYGWIN</envar> is that at the time
 | |
| the shell was launched and not that at the time the program is executed.
 | |
| Non-Cygwin shells always pipe and redirect with binary mode. With
 | |
| non-Cygwin shells the commands <command> cat filename | program </command>
 | |
| and <command> program < filename </command> are not equivalent when
 | |
| <filename>filename</filename> is on a text-mounted partition. </para>
 | |
| </listitem>
 | |
| </orderedlist>
 | |
| </sect2>
 | |
| 
 | |
| <sect2><title>Example</title>
 | |
| <para>To illustrate the various rules, we provide scripts to delete CRs
 | |
| from files by using the <command>tr</command> program, which can only write
 | |
| to standard output. 
 | |
| The script</para>
 | |
| <screen>
 | |
| <![CDATA[
 | |
| #!/bin/sh
 | |
| # Remove \r from the file given as argument
 | |
| tr -d '\r' < "$1" > "$1".nocr
 | |
| ]]>
 | |
| </screen>
 | |
| <para> will not work on a text mounted systems because the \r will be 
 | |
| reintroduced on writing. However scripts such as </para>
 | |
| <screen>
 | |
| <![CDATA[
 | |
| #!/bin/sh
 | |
| # Remove \r from the file given as argument
 | |
| tr -d '\r' | gzip | gunzip > "$1".nocr
 | |
| ]]>
 | |
| </screen>
 | |
| <para>and the .bat file</para>
 | |
| <screen>
 | |
| <![CDATA[
 | |
| REM Remove \r from the file given as argument
 | |
| @echo off
 | |
| tr -d \r < %1 > %1.nocr
 | |
| ]]>
 | |
| </screen>
 | |
| <para> work fine. In the first case (assuming the pipes are binary)
 | |
| we rely on <command>gunzip</command> to set its output to binary mode,
 | |
| possibly overriding the mode used by the shell.
 | |
| In the second case we rely on the DOS shell to redirect in binary mode.
 | |
| </para>
 | |
| </sect2>
 | |
| 
 | |
| <sect2><title>Binary or text?</title>
 | |
| 
 | |
| <para>UNIX programs that have been written for maximum portability
 | |
| will know the difference between text and binary files and act
 | |
| appropriately under Cygwin.  For those programs, the text mode default
 | |
| is a good choice.  Programs included in official Cygwin distributions
 | |
| should work well in the default mode. </para>
 | |
| 
 | |
| <para>Text mode makes it much easier to mix files between Cygwin and
 | |
| Windows programs, since Windows programs will usually use the CRLF
 | |
| format.  Unfortunately you may still have some problems with text
 | |
| mode.  First, some of the utilities included with Cygwin do not yet
 | |
| specify binary mode when they should.
 | |
| Second, you will introduce CRs in text
 | |
| files you write, which can cause problems when moving them back to a
 | |
| UNIX system. </para>
 | |
| 
 | |
| <para>If you are mounting a remote file system from a UNIX machine,
 | |
| or moving files back and forth to a UNIX machine, you may want to
 | |
| access the files in binary mode. The text files found there will normally
 | |
| be in UNIX NL format, and you would want any files put there by Cygwin
 | |
| programs to be stored in a format understood by UNIX.
 | |
| Be sure to remove CRs from all Makefiles and
 | |
| shell scripts and make sure that you only edit the files with
 | |
| DOS/Windows editors that can cope with and preserve NL terminated lines.
 | |
| </para>
 | |
| 
 | |
| <para>Note that you can decide this on a disk by disk basis (for
 | |
| example, mounting local disks in text mode and network disks in binary
 | |
| mode).  You can also partition a disk, for example by mounting
 | |
| <filename>c:</filename> in text mode, and <filename>c:\home</filename>
 | |
| in binary mode.</para>
 | |
| 
 | |
| </sect2>
 | |
| 
 | |
| <sect2><title>Programming</title>
 | |
| 
 | |
| <para>In the <function>open()</function> function call, binary mode can be
 | |
| specified with the flag <literal>O_BINARY</literal> and text mode with
 | |
| <literal>O_TEXT</literal>. These symbols are defined in
 | |
| <filename>fcntl.h</filename>.</para>
 | |
| 
 | |
| <para>In the <function>fopen()</function> function call, binary mode can be
 | |
| specified by adding a <literal>b</literal> to the mode string. Text mode is specified
 | |
| by adding a <literal>t</literal> to the mode string.</para>
 | |
| 
 | |
| <para>The mode of a file can be changed by the call
 | |
| <function>setmode(fd,mode)</function> where <literal>fd</literal> is a file
 | |
| descriptor (an integer) and <literal>mode</literal> is
 | |
| <literal>O_BINARY</literal> or <literal>O_TEXT</literal>. The function
 | |
| returns <literal>O_BINARY</literal> or <literal>O_TEXT</literal> depending
 | |
| on the mode before the call, and <literal>EOF</literal> on error.</para>
 | |
| 
 | |
| </sect2>
 | |
| 
 | |
| </sect1>
 |