|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
System calls are functions that a programmer can call to perform the services of the operating system. There are several online books that describe them at some length, for example Programming in C. Unix manual pages are probably the first stop. They are often referred to as "man pages," because they are accessed with the man command. The manual pages are divided into eight sections. They are organized as follows:
2. UNIX System Calls This section gives information about the library calls that interface with the UNIX operating system, such as open for opening a file, and exec for executing a program file. These are often accessed by C programmers.
3. Libraries This section contains the library routines that come with the system. An example library that comes with each system is the math library, containing such functions as fabs for absolute value. Like the system call section, this is relevant to programmers.
4. File Formats This section contains information on the file formats of system files, such as init, group, and passwd. This is useful for system administrators.
5. File Formats This section contains information on various system characteristics. For example, a manual page exists here to display the complete ASCII character set (ascii).
6. Games This section usually contains directions for games that came with the system.
7. Device Drivers This section contains information on UNIX device drivers, such scsi and floppy. These are usually pertinent to someone implementing a device driver, as well as the system administrator.
8. System Maintenance This section contains information on commands that are useful for the system administrator, such as how to format a disk.
Section 2 can be very useful as a reference. When you invoke the man command, the output is sent through what is known as a pager. This is a command that lets you view the text one page at a time. The default pager for most UNIX systems is the more command. You can, however, specify a different one by setting the PAGER environment variable.
The second source of information for a particulr call is Google. It usually can get you some useful links to the information for a particular call:
Some of them involve access to data that users must not be permitted to corrupt or even change.
It's often difficult to determine what is a library routine (e.g printf()), and what is a system call (e.g sleep()). They are used in the same way and the only way to tell is to remember which is which.
To obtain information about a system call or library routine, how to use it, what it returns, what it does etc., you can read the on-line manual. If you are looking for the manual on read, you can read the manual by doing:
% man 2 read - if read is a system call, or % man 3 read - if read is a library routineAll of the entries in Section 2 of the manuals are system calls, and all of the entries in Section 3 are library routines; so if you don't know whether something is a system call, or a library routine, try looking it up in both Sections 2 and 3.
Here is an excerpt from Rochkind's book that introduces system calls, and explain how to use them:
The subject of this book is UNIX system calls, which form the interface between the UNIX kernel and the user programs that run on top of it. Those who interact only with commands, like the shell, text editors, and other application programs, may have little need to know much about system calls, but a thorough knowledge of them is essential for UNIX programmers. System calls are the only way to access kernel facilities such as the file system, the multitasking mechanisms, and the interprocess communication primitives.
System calls define what UNIX is. Everything else -- subroutines and commands -- is built on this foundation. While the novelty of many of these higher-level programs has been responsible for much of UNIX's renown, they could as well have been programmed on any modern operating system. When one describes UNIX as elegant, simple, efficient, reliable, and portable, one is referring not to the commands (some of which are none of these things), but to the kernel. How hard is it to learn UNIX system calls? When I first started programming UNIX, in 1973, it wasn't very hard at all. UNIX -- and its programmer's manual -- was only a fraction of its present size and complexity. There weren't any programming examples in the manual, but all of the source code was on-line and it was easy to read through programs like the shell or the editor to see how system calls worked. Perhaps most important, there were more experienced people around to ask for help. Even Dennis Ritchie and Ken Thompson, the inventors of UNIX, took time out to help me.
Today's aspiring UNIX programmers have a tougher challenge than I did. UNIX is now so widely dispersed that an expert is unlikely to be nearby. Most computers running UNIX are licensed for the object code only, so the source code for commands is unavailable. There are twice as many system calls now as there were in 1973, and the quality of the manual has deteriorated markedly from the days when Ritchie and Thompson did all the system call write-ups. It's now full of grotesque paragraphs like this:
If the set-user-ID mode bit of the new process file is set (see chmod(2)), exec sets the effective user ID of the new process to the owner ID of the new process file. Similarly, if the set-group-ID mode bit of the new process file is set, the effective group ID of the new process is set to the group ID of the new process file. The real user ID and the real group ID of the new process remain the same as those of the calling process.
As an old-timer I understood what this meant when I first saw it, but a newcomer is sure to be completely baffled. And until now, there's been nowhere to turn. This book's goal is to allow any experienced programmer to learn UNIX system calls as easily as I did, and then to use them wisely and portably. It's packed with examples -- over 3500 lines of C code. Instead of just tactics (how the system calls are used), I've tried also to include strategies (why and when they're used). And there's lots of informal advice as well, based on my experiences programming UNIX over the past dozen years.
The number of different flavors of Unix is amazing, and what is worse, the system calls and their parameters change from flavor to flavors. One of the goals in writing Unix programs is to make them as portable as possible across all the flavors of Unix; obviously this isn't possible. However, most of the original Unix system calls haven't changed, so if you try to use these, you should be all right.
Currently, the most important flavors of Unix are:
How does a C programmer actually issue a system call? There is no difference between a system call and any other function call. For example, the read system call might be issued like this:
amt = read(fd, buf, numbytes);
The implementation of the subroutine read varies with the UNIX implementation. It is usually an assembly language program that uses a machine instruction designed specifically for system calls, which isn't directly executable from C. Nowadays, it's safe to assume that system calls are simply C subroutines. Remember, though, that since a system call involves a context switch (from user to kernel and back), it takes much longer than a simple subroutine call within a process's own address space. So avoiding excessive system calls might be a wise strategy for programs that need to be tightly optimized.
Most system calls return a value. In the read example above, the number of bytes read is returned. To indicate an error, a system call returns a value that can't be mistaken for valid data, namely -1 . Therefore, our read example should have been coded something like this:
if ((amt= read(fd, buf, numbytes)) == -1)
{
printf("Read failed\n");
exit(1);
}
Note that exit is a system call too, but it can't return an error.
There are lots of reasons why a system call that returns -1 might have failed. The global integer errno contains a code that indicates the reason. These error codes are defined at the beginning of the system call chapter of the UNIX manual [the pages titled ``intro(2)'']. Note that errno contains valid data only if a system call actually returns -1; you can't use errno alone to determine whether an error occurred.
The library routine perror takes as its argument a string, and prints out the string, a colon, and a description of the error condition stored in errno. So, a way of handling the error above that gives the programmer more information is:
if ((amt= read(fd, buf, numbytes)) == -1)
{
perror("read");
exit(1);
}
which might print out read: file does not exist on an error.
The manual pages for all Unix system calls give a declaration for the system call. This shows you what type of value the system call returns, what types of arguments it takes, and what header files you need to include before you can use the system call. As an example, here is part of the man page for the read() system call.
SYNOPSIS
#include <unistd.h>
#include <sys/types.h>
#include <sys/uio.h>
int
read(int d, char *buf, int nbytes)
DESCRIPTION
Read() attempts to read nbytes of data from the object referenced by the file descriptor d into the buffer pointed to by buf.
RETURN VALUES
If successful, the number of bytes actually read is returned. Upon reading end-of-file, zero is returned. Otherwise, a -1 is returned and the global variable errno is set to indicate the error.
The first part shows what header files you need to include. Then the declaration of the system call is given.
int read(int d, char *buf, int nbytes)
read() takes three arguments: an int which is called d in the man page, a pointer to a character called buf (usually an array of characters), and another int called nbytes. read() returns an int as its result.
The names of the arguments given in the man pages need not be the same as the ones you use in your programs, they are only to explain the function of the system call. For example, you could use the read() function in a program as follows:
int main()
{
int i, count, desc;
char array[500];
desc=0; count=500;
i=read(desc, array, count);
}
Every process has a process-ID, which is a positive integer. At any instant this is guaranteed to be unique. Every process but one has a parent. The exception is process 0, which is created and used by the kernel itself, for swapping.
A process's system data also records its parent-process-ID, the process-ID of its parent. If a process is orphaned because its parent has terminated, its parent-process-ID is changed to 1. This is the process-ID of the initialization process ( init), which is the ancestor of all other processes. In other words, the initialization process adopts all orphans.
Sometimes programmers choose to implement a subsystem as a group of related processes instead of as a single process. For example, a complex database management system might be broken down into several processes to gain additional concurrency of disk I/O. The UNIX kernel allows these related processes to be organized into a process group.
One of the group members is the group leader. Each member of the group has the group leader's process-ID as its process-group-ID. The kernel provides a system call to send a signal to each member of a designated process group. Typically, this would be used to terminate the entire group as a whole, but any signal can be broadcast in this way.
Any process can resign from its process group, become a leader of its own group (of one) by making its process-group-ID the same as its own process-ID, and then spawn child processes to round out the new group. Hence, a single user could be running, say, 10 processes formed into, say, three process groups.
A process group can have a control terminal, which is the first terminal device opened by the group leader. Normally, the control terminal for a user's processes is the terminal from which the user logged in. When a new process group is formed, the processes in the new group no longer have a control terminal.
The terminal device driver sends interrupt, quit, and hangup signals coming from a terminal to every process for which that terminal is the control terminal. Unless precautions are taken, hanging up a terminal, for example, will terminate all of the user's processes. To prevent this, a process can arrange to ignore hangups (this is what the nohup command does).
When a process group leader terminates for any reason, all processes with the same control terminal are sent a hangup signal, which, unless caught or ignored, terminates them too. This feature makes hard-wired terminals, which can't be physically hung up, behave like those that can. Thus, when a user logs off (terminating the shell, which is normally the process group leader), everything is cleaned up for the next user, just as it would be if the user actually hung up.
In summary, there are three process-IDs associated with each process:
A user-ID is a positive integer that is associated with a user's login name in the password file ( /etc/passwd). When a user logs in, the login command makes this ID the user-ID of the first process created, the login shell. Processes descended from the shell inherit this user-ID.
Users are also organized into groups (not to be confused with process groups), which have IDs too, called group-IDs. A user's login group-ID is taken from the password file and made the group-ID of his or her login shell.
Groups are defined in the group file ( /etc/group). While logged in, a user can change to another group of which he or she is a member; this changes the group-ID of the process that handles the request (normally the shell, via the newgrp command), which then is inherited by all descendent processes.
These two IDs are called the real user-ID and the real group-ID because they are representative of the real user, the person who is logged in. Two other IDs are also associated with each process: the effective user-ID and the effective group-ID. These IDs are normally the same as the corresponding real IDs, but they can be different, as we shall see shortly. For now, we'll assume the real and effective IDs are the same.
The effective ID is always used to determine permissions; the real ID is used for accounting and user-to-user communication. One indicates the user's permissions; the other indicates the user's identity.
Each file (ordinary, directory, or special) has, in its i-node, an owner user-ID and an owner group-ID. The i-node also contains three sets of three permission bits (nine bits in all). Each set has one bit for read permission, one bit for write per- mission, and one bit for execute permission. A bit is 1 if the permission is granted and 0 if not. There is a set for the owner, for the owner group, and for others (the public). Here are the bit assignments (bit 0 is the rightmost bit):
Permission bits are frequently specified using an octal number. For example, octal 775 would mean read, write, and execute permission for the owner and the group, and only read and execute permission for others. The ls command would show this combination of permissions as rwxrwxr-x; in binary it would be 111111101; in octal it would be 775.
The permission system determines whether a given process can perform a desired action (read, write, or execute) on a given file. For ordinary files the meaning of the actions is obvious. For directories the meaning of read is obvious, since directories are stored in ordinary files (the ls command reads a directory, for example). ``Write'' permission on a directory means the ability to issue a system call that would modify the directory (add or remove a link). ``Execute'' permission means the ability to use the directory in a path (sometimes called ``search'' permission). For special files, read and write permissions mean the ability to execute the read and write system calls. What, if anything, that implies is up to the designer of the device driver. Execute permission on a special file is meaningless.
The permission system determines whether permission will be granted using this algorithm:
Occasionally we want a user to temporarily take on the privileges of another user. For example, when we execute the passwd command to change our password, we would like the effective user-ID to be that of root (the traditional login name for the superuser), because only root can write into the password file. This is done by making root the owner of the passwd command (i.e., the ordinary file containing the passwd program), and then turning on another permission bit in the passwd command's i-node, called the set-user-ID bit. Executing a program with this bit on changes the effective user-ID to the owner of the file containing the program. Since it's the effective, rather than the real, user-ID that determines permissions, this allows a user to temporarily take on the permissions of someone else. The set-group-ID bit is used in a similar way.
Since both user-IDs (real and effective) are inherited from parent process to child process, it is possible to use the set-user-ID feature to run with an effective user-ID for a very long time.
Here are the system calls to get the IDs mentioned above:
int getuid() /* Get the real user-ID */
/* Returns the ID */
int getgid() /* Get the real group-ID */
/* Returns the ID */
int geteuid() /* Get the effective user-ID */
/* Returns the ID */
int getegid() /* Get the effective group-ID */
/* Returns the ID */
int getpid() /* Get the process-ID */
/* Returns the ID */
int getppid() /* Get the parent process-ID */
/* Returns the ID */
int getpgrp() /* Get the process-group-ID */
/* Returns the ID */
Each of these system calls returns a single ID, as indicated by the comments following their function headers.
long time(timep) /* Get system time */
long *timep; /* Pointer to time */
time returns the time, in seconds, since January I, 1970. If the argument timep is not NULL, the current time is stored into the long integer to which it points. This is a carry-over from the days before the C language supported long integers. It is of no use now that a simple assignment statement can be used to capture the return value. The argument to time should always be NULL. i.e value= time(NULL);
|
|||||||
| 2005 | 2004 | 2003 | 2002 | 2001 |
21 Mar 2007 (IBM Developerworks) Linux® system calls -- we use them every day. But do you know how a system call is performed from user-space to the kernel? Explore the Linux system call interface (SCI), learn how to add new system calls (and alternatives for doing so), and discover utilities related to the SCI.
A system call is an interface between a user-space application and a service that the kernel provides. Because the service is provided in the kernel, a direct call cannot be performed; instead, you must use a process of crossing the user-space/kernel boundary. The way you do this differs based on the particular architecture. For this reason, I'll stick to the most common architecture, i386.
In this article, I explore the Linux SCI, demonstrate adding a system call to the 2.6.20 kernel, and then use this function from user-space. I also investigate some of the functions that you'll find useful for system call development and alternatives to system calls. Finally, I look at some of the ancillary mechanisms related to system calls, such as tracing their usage from a given process.
The implementation of system calls in Linux is varied based on the architecture, but it can also differ within a given architecture. For example, older x86 processors used an interrupt mechanism to migrate from user-space to kernel-space, but new IA-32 processors provide instructions that optimize this transition (using
sysenterandsysexitinstructions). Because so many options exist and the end-result is so complicated, I'll stick to a surface-level discussion of the interface details. See the Resources at the end of this article for the gory details.You needn't fully understand the internals of the SCI to amend it, so I explore a simple version of the system call process (see Figure 1). Each system call is multiplexed into the kernel through a single entry point. The eax register is used to identify the particular system call that should be invoked, which is specified in the
Clibrary (per the call from the user-space application). When theClibrary has loaded the system call index and any arguments, a software interrupt is invoked (interrupt 0x80), which results in execution (through the interrupt handler) of thesystem_callfunction. This function handles all system calls, as identified by the contents of eax. After a few simple tests, the actual system call is invoked using thesystem_call_tableand index contained in eax. Upon return from the system call,syscall_exitis eventually reached, and a call toresume_userspacetransitions back to user-space. Execution resumes in theClibrary, which then returns to the user application.
If all the basics are the same, what has changed? Well, these things:
- Number of system calls
- Languages we use
- Subsystems we program
- Need for portability
- Relevance of UNIX standards
More System Calls
The number of system calls has quadrupled, more or less, depending on what you mean by "system call." The first edition of Advanced UNIX Programming focused on only about 70 genuine kernel system calls—for example, open, read, and write; but not library calls like fopen, fread, and fwrite. The second edition includes about 300. (There are about 1,100 standard function calls in all, but many of those are part of the Standard C Library or are obviously not kernel facilities.) Today's UNIX has threads, real-time signals, asynchronous I/O, and new interprocess-communication features (POSIX IPC), none of which existed 20 years ago. This has caused, or been caused by, the evolution of UNIX from an educational and research system to a universal operating system. It shows up in embedded systems (parking meters, digital video recorders); inside Macintoshes; on a few million web servers; and is even becoming a desktop system for the masses. All of these uses were unanticipated in 1984.
More Languages
In 1984, UNIX applications were usually programmed in C, occasionally mixed with shell scripts, Awk, and Fortran. C++ was just emerging; it was implemented as a front end to the C compiler. Today, C is no longer the principal UNIX application language, although it's still important for low-level programming and as a reference language. (All the examples in both books are written in C.) C++ is efficient enough to have replaced C when the application requirements justify the extra effort, but many projects use Java instead, and I've never met a programmer who didn't prefer it over C++. Computers are fast enough so that interpretive scripting languages have become important, too, led by Perl and Python. Then there are the web languages: HTML, JavaScript, and the various XML languages, such as XSLT.
Even if you're working in one of these modern languages, though, you still need to know what going on "down below," because UNIX still defines—and, to a degree, limits—what the higher-level languages can do. This is a challenge for many students who want to learn UNIX, but don't want to learn C. And for their teachers, who tire of debugging memory problems and explaining the distinction between declarations and definitions.
TIP
To enable students to learn UNIX without first learning C, I developed a Java-to-UNIX system-call interface that I call Jtux. It allows almost all of the UNIX system calls to be executed from Java, using the same arguments and datatypes as the official C calls. You can find out more about Jtux and download its source code from http://basepath.com/aup/.
More Subsystems
The third area of change is that UNIX is both more visible than ever (sold by Wal-Mart!) and more hidden, underneath subsystems like J2EE and web servers, Apache, Oracle, and desktops such as KDE or GNOME. Many application programmers are programming for these subsystems, rather than for UNIX directly. What's more, the subsystems themselves are usually insulated from UNIX by a thin portability layer that has different implementations for different operating systems. Thus, many UNIX system programmers these days are working on middleware, rather than on the end-user applications that are several layers higher up.
More Portability
The fourth change is the requirement for portability between UNIX systems, including Linux and the BSD-derivatives, one of which is the Macintosh OS X kernel (Darwin). Portability was of some interest in 1984, but today it's essential. No developer wants to be locked into a commercial version of UNIX without the possibility of moving to Linux or BSD, and no Linux developer wants to be locked into only one distribution. Platforms like Java help a lot, but only serious attention to the kernel APIs, along with careful testing, will ensure that the code is really portable. Indeed, you almost never hear a developer say that he or she is writing for XYZ's UNIX. It's much more common to hear "UNIX and Linux," implying that the vendor choice will be made later. (The three biggest proprietary UNIX hardware companies—Sun, HP, and IBM—are all strong supporters of Linux.)
More Complete StandardsThe requirement for portability is connected with the fifth area of change, the role of standards. In 1984, a UNIX standards effort was just starting. The IEEE's POSIX group hadn't yet been formed. Its first standard, which emerged in 1988, was a tremendous effort of exceptional quality and rigor, but it was of very little use to real-world developers because it left out too many APIs, such as those for interprocess communication and networking. That minimalist approach to standards changed dramatically when The Open Group was formed from the merger of X/Open and the Open Software Foundation in 1996. Its objective was to include all the APIs that the important applications were using, and to specify them as well as time allowed—which meant less precisely than POSIX did. They even named one of their standards Spec 1170, the number being the total of 926 APIs, 70 headers, and 174 commands. Quantity over quality, maybe, but the result meant that for the first time programmers would find in the standard the APIs they really needed. Today, The Open Group's Single UNIX Specification is the best guide for UNIX programmers who need to write portably.
- Getting Started With POSIX Threads
- SWIG Homepage: POSIX Threads
- Hints on POSIX Threads
- Threads
- POSIX Thread Tutorials and Books
- An introduction to multi-threaded programming.
- Three POSIX Threads' Implementations Threads
Sun's Multithreaded Programming Guide- UNIX (Solaris) Threads and Semaphores,
- Solaris and Windows NT Thread Functions,
- Guide to DECthreads
The following tutorial describes various common methods for reading and writing files and directories on a Unix system. Part of the information is common C knowledge, and is repeated here for completeness. Other information is Unix-specific, although DOS programmers will find some of it similar to what they saw in various DOS compilers. If you are a proficient C programmer, and know everything about the standard I/O functions, its buffering operations, and know functions such as
fseek()orfread(), you may skip the standard C library I/O functions section. If in doubt, at least skim through this section, to catch up on things you might not be familiar with, and at least look at the standard C library examples.This document is copyright (c) 1998-2002 by guy keren.
The material in this document is provided AS IS, without any expressed or implied warranty, or claim of fitness for a particular purpose. Neither the author nor any contributers shell be liable for any damages incured directly or indirectly by using the material contained in this document.
permission to copy this document (electronically or on paper, for personal or organization internal use) or publish it on-line is hereby granted, provided that the document is copied as-is, this copyright notice is preserved, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.
Permission to make translations of this document is also granted, under these terms - assuming the translation preserves the meaning of the text, the copyright notice is preserved as-is, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.
For any questions about the document and its license, please contact the author.
The ps command can tell you quite a few things about each process running on your system. These include the process owner, memory use, accumulated time, the process status (e.g., waiting on resources) and many other things as well. But one thing that ps cannot tell you is what a process is doing - what files it is using, what ports it has opened, what libraries it is using and what system calls it is making. If you can't look at source code to determine how a program works, you can tell a lot about it by using a procedure called "tracing". When you trace a process (e.g., truss date), you get verbose commentary on the process' actions. For example, you will see a line like this each time the program opens a file:
open("/usr/lib/libc.so.1", O_RDONLY) = 4
The text on the left side of the equals sign clearly indicates what is happening. The program is trying to open the file /usr/lib/libc.so.1 and it's trying to open it in read-only mode (as you would expect, given that this is a system library). The right side is not nearly as self-evident. We have just the number 4. Open is not a Unix command, of course, but a system call. That means that you can only use the command within a program. Due to the nature of Unix, however, system calls are documented in man pages just like ls and pwd.
To determine what this number represents, you can skip down in this column or you can read the man page. If you elect to read the man page, you will undoubtedly read a line that tells you that the open() function returns a file descriptor for the named file. In other words, the number, 4 in our example, is the number of the file descriptor referred to in this open call. If the process that you are tracing opens a number of files, you will see a sequence of open calls. With other activity removed, the list might look something like this:
open("/dev/zero", O_RDONLY) = 3
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
open("/usr/lib/libc.so.1", O_RDONLY) = 4
open("/usr/lib/libdl.so.1", O_RDONLY) = 4
open64("./../", O_RDONLY|O_NDELAY) = 3
open64("./../../", O_RDONLY|O_NDELAY) = 3
open("/etc/mnttab", O_RDONLY) = 4
Notice that the first file handle is 3 and that file handles 3 and 4 are used repeatedly. The initial file handle is always 3. This indicates that it is the first file handle following those that are the same for every process that you will run - 0, 1 and 2. These represent standard in, standard out and standard error.
The file handles shown in the example truss output above are repeated only because the associated files are subsequently closed. When a file is closed, the file handle that was used to access it can be used again.
The close commands include only the file handle, since the location of the file is known. A close command would, therefore, be something like close(3). One of the lines shown above displays a different response - Err#2
ENOENT. This "error" (the word is put in quotes because this does not necessarily indicate that the process is defective in any way) indicates that the file the open call is attempting to open does not exist. Read "ENOENT" as "No such file".
Some open calls place multiple restrictions on the way that a file is opened. The open64 calls in the example output above, for example, specify both O_RDONLY and O_NDELAY. Again, reading the man page will help you to understand what each of these specifications means and will present with a list of other options as well.
As you might expect, open is only one of many system calls that you will see when you run the truss command. Next week we will look at some additional system calls and determine what they are doing.
Exploring processes with Truss: part 2 By Sandra Henry-Stocker
While truss and its cousins on non-Solaris systems (e.g., strace on Linux and ktrace on many BSD systems) provide a lot of data on what a running process is doing, this information is only useful if you know what it means. Last week, we looked at the open call and the file handles that are returned by the call to open(). This week, we look at some other system calls and analyze what these system calls are doing. You've probably noticed that the nomenclature for system functions is to follow the name of the call with a set of empty parentheses for example, open(). You will see this nomenclature in use whenever system calls are discussed.
The fstat() and fstat64() calls obtains information about open files - "fstat" refers to "file status". As you might expect, this information is retrieved from the files' inodes, including whether or not you are allowed to read the files' contents. If you trace the ls command (i.e., truss ls), for example, your trace will start with lines that resemble these:
1 execve("/usr/bin/ls", 0x08047BCC, 0x08047BD4) argc = 1
2 open("/dev/zero", O_RDONLY) = 3
3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xDFBFA000
4 xstat(2, "/usr/bin/ls", 0x08047934) = 0
5 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
6 sysconfig(_CONFIG_PAGESIZE) = 4096
7 open("/usr/lib/libc.so.1", O_RDONLY) = 4
8 fxstat(2, 4, 0x08047310) = 0
...
28 lstat64(".", 0x080478B4) = 0
29 open64(".", O_RDONLY|O_NDELAY) = 3
30 fcntl(3, F_SETFD, 0x00000001) = 0
31 fstat64(3, 0x0804787C) = 0
32 brk(0x08057208) = 0
33 brk(0x08059208) = 0
34 getdents64(3, 0x08056F40, 1048) = 424
35 getdents64(3, 0x08056F40, 1048) = 0
36 close(3) = 0
In line 31, we see a call to fstat64, but what file is it checking? The man page for the fstat() and your intuition are probably both telling you that this fstat call is obtaining information on the file opened two lines before – "." or the current directory - and that it is referring to this file by its file handle (3) returned by the open() call in line
2. Keep in mind that a directory is simply a file, though a different variety of file, so the same system calls are used as would be used to check a text file.
You will probably also notice that the file being opened is called /dev/zero (again, see line 2). Most Unix sysadmins will immediately know that /dev/zero is a special kind of file - primarily because it is stored in /dev. And, if moved to look more closely at the file, they
will confirm that the file that /dev/zero points to (it is itself a symbolic link) is a special character file. What /dev/zero provides to system programmers, and to sysadmins if they care to use it, is an endless stream of zeroes. This is more useful than might first appear.
To see how /dev/zero works, you can create a 10M-byte file full of zeroes with a command like this:
/bin/dd < /dev/zero > zerofile bs=1024 seek=10240 count=1
This command works well because it creates the needed file with only a few read and write operations; in other words, it is very efficient.
You can verify that the file is zero-filled with od.
# od -x zerofile
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
50002000
Each string of four zeros (0000) represents two bytes of data. The * on the second line of output indicates that all of the remaining lines are identical to the first.
Looking back at the truss output above, we cannot help but notice that the first line of the truss output includes the name of the command that we are tracing. The execve() system call executes a process. The first argument to execve() is the name of the file from which the new process
image is to be loaded. The mmap() call which follows maps the process image into memory. In
other words, it directly incorporates file data into the process address space. The getdents64() calls on lines 34 and 35 are extracting information from the directory file - "dents" refers to "directory entries'.
The sequence of steps that we see at the beginning of the truss output executing the entered command, opening /dev/zero, mapping memory and so on - looks the same whether you are tracing ls, pwd, date or restarting Apache. In fact, the first dozen or so lines in your truss output will be nearly identical regardless of the command you are running. You should, however, expect to see some differences between different Unix systems and different versions of Solaris.
Viewing the output of truss, you can get a solid sense of how the operating system works. The same insights are available if you are tracing your own applications or troubleshooting third party executables.
-------------------
Sandra Henry-Stocker
Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
Last modified: August 10, 2009