2007年9月19日星期三

File Access in C++

File Access in C++

The goal of this presentation is to give a short introductions to disks, files and file manipulation in C++. This presentation is not intended to be comprehensive. For more details on the C++ file system, I refer you to your Computer Science II text.

Disks

Before talking about files we will first discuss disk storage. I know that this will be a review for many of you, but the material is too important to treat lightly.

A disk drive contains a storage media that consists of a series of of one or more circular disks in a stack. The surfaces are capable of holding information that is recorded magnetically. I will draw a picture in class. There is a mechanically controlled arm with a read/write head that is used to read and write information to the disk.

The data on the disk is recorded in concentric circles called tracks. Tracks that are lined up on the surfaces are called cylinders. Tracks are divided into sectors. The sector is the smallest unit of disk storage that may be accessed. These sectors contain addresses by which data may be referenced. The addresses are not permanently recorded on a disk. They are instead recorded when the disk is formatted.

When you want to access data on a disk, it requires 3 steps.

  1. Seek - to position the read/write head over the proper cylinder.
  2. Latency - waiting for the desired sector to pass under the read/write head
  3. Transfer - the moving of data to and from the disk.

Disk access times have been increased over the past 10 years by increasing the speed that the disk rotates and speed that the read/write heads may be positioned. Originally, disks revolved at 3600rpm and then later speeds of 5400, 7200, 10,000, and 15000 rpm. 7200 rpm is most typical. In fact you would be making a mistake to get a 5400 rpm drive today. What is the average latency for a 7200rpm disk?

Disk capacities have grown dramatically over the years. In the late 1960s, a disk drive would contain about 2 million bytes of data. This drive would look like a washing machine. In the 1980s, a washing machine size drive would hold 128 meg. Today, we have internal drives for PCs that hold 1 terabyte. Question, how big is a terabyte Why do we need such huge sizes? My first hard disk was 10 meg. in 1984.

File Systems

No program accesses data from the disk directly. There is an organization impressed by the operating system on data stored on a disk. Each operating system, provides a set of file systems calls to access data on the disks that it controls. Why can't a program access the disk directly instead of having to go through the operating system??

Note: access to the disk via the operating system is organized by blocks or clusters rather sectors. A block or cluster is a multiple of a sector. Why? Typical range 8K-32k.

Each programming language also provides a way of accessing data from the disk. Why? When is it better to use the Operating System Calls? When is it better to access files through your programming language?

This presentation will provide information on how to access through the C++ file system. Note: there is also a C file system. There are also WIN32, MFC, .NET and UNIX file systems. First, I want to make sure we agree on the definition of a few terms.

File - collection of data on the disk.
Directory - a file that contains a list of files. Sometimes referred to as a folder.
Sub-Directory - a directory that is found in the list of files within a directory. Also called child directory.
Parent Directory - the directory that contains the given one.
Root Directory - the directory that represents the starting point for the file system. That is the directory from which all others are derived.

Discuss that there is a directory structure within all computers.

Organization of files on the disk. Contiguous vs. Block oriented. All modern systems block oriented. Discuss fragmentation and why could be expensive. Base on recent results published in PCWorld, it is not as expensive as it once was.

File format - ASCII (formatted) vs. binary (unformatted). Sequential vs. Direct.

Types of operations supported by a file system: create, open, read, write, delete, close, rewind, flush, seek. What do these terms mean? Mention file pointer.

The C++ File System

The C++ file system allows you do do similar access to what you did with "iostream". The only difference is that instead of reading from the keyboard and writing to the monitor, your are reading from and writing to files. There is also additional functionality that is provided specifically to access files.

There is a class hierarchy associate with I/O in C++. It looks like the following:

Page-1

The cout is an object of ostream and cin is and object of istream. There are 3 classes used to create file objects. These are:

ifstream - use this when you only want to read from the file.
ofstream - use this when you only want to write to a file.
fstream - use this when you want to read from and write to a file.

Since these guys are classes, they have constructors and member functions that allow us to access files. This includes the operators "<<" for output and ">>" for input. Since read operations are most common I will concentrate on these. It will be easy for you to deal with file output if you understand input. You should read about this in your Computer Science II book. The following is the constructor for the ifstream class.

ifstream::ifstream

ifstream();

ifstream( const char* szName, ios_base::openmode nModes = ios::in );

Notes

  1. Solaris, our UNIX system, has an additional argument for permissions. Also, note that the specification has changed in 1999.
  2. The default is usually fine for the nModes .

Parameters

szName

The name of the file to be opened during construction.

nMode

An integer that contains mode bits defined as ios enumerators that can be combined with the bitwise OR ( | ) operator. The nMode parameter must have one of the following values:

  • app, to seek to the end of a stream before each insertion.
  • ate, to seek to the end of a stream when its controlling object is first created.
  • binary, to read a file as a binary stream, rather than as a text stream.
  • in, to permit extraction from a stream.
  • out, to permit insertion to a stream.
  • trunc, to delete contents of an existing file when its controlling object is created.

Note that if you open for input, that the file must exist.

The ifstream constructor will create a file object. If you use the second form of the constructor, it will also open the file. If you do not use the second form, then you must use the create member function.

ifstream::open

void open( const char* szName, ios_base::openmode nMode = ios::in );

Parameters

szName

The name of the file to be opened during construction.

nMode

An integer containing bits defined as ios enumerators that can be combined with the OR ( | ) operator. See the ifstream constructor for a list of the enumerators. The ios::in mode is implied.

A good question is how do we detect a failure in opening the file. The answer is that the object maintains a bit map of events. The ios::failbit is one of the bits. The following function may be used to check if the file was successfully opened.

ifstream::is_open

int is_open() const;

Return Value

Returns a nonzero value if this stream is attached to an open disk file identified by a file descriptor; otherwise 0.

When we are finished with a file, we may close it using:

ifstream::close

void close();

Other operations that we may do one a file are described in istream. Once we have opened the file, we can read it using the >> operator. We can also use the getline member function. It is described as follows:

istream::getline

istream& getline( char* pch, int nCount, char delim = '\n' );

istream& getline( unsigned char* puch, int nCount, char delim = '\n' );

istream& getline( signed char* psch, int nCount, char delim = '\n' );

Parameters

pch, puch, psch

A pointer to a character array.

nCount

The maximum number of characters to store, including the terminating NULL.

delim

The delimiter character (defaults to newline).

Remarks

Extracts characters from the stream until either the delimiter delim is found, the limit nCount–1 is reached, or end of file is reached. The characters are stored in the specified array followed by a null terminator. If the delimiter is found, it is extracted but not stored.

In order to read through the file twice, we need a way to set the file pointer back to the beginning of the file. The seekg function will do this.

istream::seekg

istream& seekg( streampos pos );

istream& seekg( streamoff off, ios::seek_dir dir );

Parameters

pos

The new position value; streampos is a typedef equivalent to long.

off

The new offset value; streamoff is a typedef equivalent to long.

dir

The seek direction. Must be one of the following enumerators:

  • ios::beg Seek from the beginning of the stream.

  • ios::cur Seek from the current position in the stream.

  • ios::end Seek from the end of the stream.

Remarks

Changes the get pointer for the stream.

The following function may be used to read unformatted data:

basic_istream::read

Reads a specified number of characters from the stream and stores them in an array.

basic_istream& read(
char_type *_Str,
streamsize _Count
);

Parameters

_Str
The array in which to read the characters.
_Count
The number of characters to read.

Return Value

The stream (*this).

Remarks

The unformatted input function extracts up to count elements and stores them in the array beginning at _Str. Extraction stops early on end of file, in which case the function calls setstate(failbit). In any case, it returns *this.

The following function allows us to test for error conditions on our reads.

ios::operator !

int operator !() const;

Return Value

Returns a nonzero value if either failbit or badbit is set in the stream’s error state.

The following function my be used to determine if the end of a file is reached.

ios::eof

int eof() const;

Return Value

Returns a nonzero value if end of file has been reached. This is the same as setting the eofbit error flag.

C++ I/O records status with flags. It does not clear the flags unless we do it ourselves. This includes the end of file flags. The following function may be used to clear flags.

ios::clear

void clear( int nState = 0 );

Parameter

nState

If 0, all error bits are cleared; otherwise bits are set according to the following masks (ios enumerators) that can be combined using the bitwise OR ( | ) operator. The nState parameter must have one of the following values:

  • ios::goodbit No error condition (no bits set).

  • ios::eofbit End of file reached.

  • ios::failbit A possibly recoverable formatting or conversion error.

  • ios::badbit A severe I/O error.

Note: there are other member function and operators. For example, you can use the ">>" operator to read from a file.

basic_ostream::flush

Flushes the buffer.

basic_ostream& flush( );

Return Value

A reference to the basic_ostream object.

Example

// basic_ostream_flush.cpp
// compile with: /EHsc
#include

void main( )
{
using namespace std;
cout << "test";
cout.flush();
}

Output

test

Example:

The following is an example of a program which will read through a files twice and display its contents the second time. We will play with this example to demonstrate other features.

Please take a little time and look a6t the Dietel and Dietel book for more on the C++ file system.

istringstream and ostringstream classes

These classes provide tools for converting to and from ASCII strings. Very similar to sprintf and sscanf in C. If class doesn't know of these, give a small example.

没有评论: