UNIX Unleashed unx17.htm

17 — The C Programming Language

By James Armstrong

The History of C
Creating, Compiling, and Executing Your First Program
An Overview of the C Language

Elementary C Syntax
Expressions

Comparison Expressions
Mathematical Expressions
Bitwise Operations

Statement Controls

Creating a Simple Program

Writing the Code
Compiling the Program
Executing the Program

Building Large Applications

Making Libraries with ar
Building Large Applications with make

Debugging Tools
Summary

17 — The C Programming Language

By James Armstrong

C is the programming language most frequently associated with UNIX. Since the 1970s, the bulk of the operating system and applications have been written in C. This is one of the major reasons why UNIX is a portable operating system.

The History of C

C was first designed by Dennis Ritchie for use with UNIX on DEC PDP-11 computers. The language evolved from Martin Richard's BCPL, and one of its earlier forms was the B language, which was written by Ken Thompson for the DEC PDP-7. The first book on C was The C Programming Language by Brian Kernighan and Dennis Ritchie, published in 1978.

In 1983, the American National Standards Institute established a committee to standardize the definition of C. Termed ANSI C, it is the recognized standard for the language grammar and a core set of libraries. The syntax is slightly different from the original C language, which is frequently called K&R—for Kernighan and Ritchie.

Creating, Compiling, and Executing Your First Program

The development of a C program is an iterative procedure. Many UNIX tools are involved in this four-step process. They are familiar to software developers:

Using an editor, write the code into a text file.
Compile the program.
Execute the program.
Debug the program.

The first two steps are repeated until the program compiles successfully. Then the execution and debugging begin. Many of the concepts presented may seem strange to non-programmers. This chapter endeavors to introduce C as a programming language.

The typical first C program is almost a cliché. It is the "Hello, World" program, and it prints the simple line Hello, World. Listing 17.1 is the source of the program.

Listing 17.1. Source of Hello World.

main()

{

printf("Hello, World\n");

}

This program can be compiled and executed as follows:

$ cc hello.c

$ a.out

Hello, World

$

The program is compiled with the cc command, which creates a program a.out if the code is correct. Just typing a.out will run the program. The program includes only one function, main. Every C program must have a main function; it is where the program's execution begins. The only statement is a call to the printf library function, which passes the string Hello, World\n. (Functions are described in detail later in this chapter.) The last two characters of the string, \n, represent the carriage return-line feed character.

An Overview of the C Language

As with all programming languages, C programs must follow rules. These rules describe how a program should appear, and what those words and symbols mean. This is the syntax of a programming language. Think of a program as a story. Each sentence must have a noun and a verb. Sentences form paragraphs, and the paragraphs tell the story. Similarly, C statements can build into functions and programs.

For more information about programming in C, I recommend the following books from Sams Publishing:

Teach Yourself C in 21 Days by Peter Aitken and Bradley Jones
Programming in ANSI C by Stephen G. Kochan

Elementary C Syntax

Like all languages, C deals primarily with the manipulation and presentation of data. BCPL deals with data as data. C, however, goes one step further to use the concept of data types. The basic data types are character, integer, and floating point numbers. Other data types are built from these three basic types.

Integers are the basic mathematical data type. They can be classified as long and short integers, and the size is implementation-dependent. With a few exceptions, integers are four bytes in length, and they can range from 2,147,483,648 to 2,147,483,647. In ANSI C, these values are defined in a header—limit.h—as INT_MIN and INT_MAX. The qualifier unsigned moves the range one bit higher, to the equivalent of INT_MAX-INT_MIN.

Floating point numbers are used for more complicated mathematics. Integer mathematics is limited to integer results. With integers, 3/2 equals 1. Floating point numbers give a greater amount of precision to mathematical calculations: 3/2 equals 1.5. Floating point numbers can be represented by a decimal number, such as 687.534, or with scientific notation: 8.87534E+2. For larger numbers, scientific notation is preferred. For even greater precision, the type double provides a greater range. Again, specific ranges are implementation-dependent.

Characters are usually implemented as single bytes, although some international character sets require two bytes. One common set of character representations is ASCII, and is found on most U.S. computers.

An array is used for a sequence of values that are often position-dependent. An array is useful when a range of values of a given type is needed. Related to the array is the pointer. Variables are stored in memory, and a pointer is the physical address of that memory. In a sense, a pointer and an array are similar, except when a program is invoked. The space needed for the data of an array is allocated when the routine that needs the space is invoked. For a pointer, the space must be allocated by the programmer, or the variable must be assigned by dereferencing a variable. The ampersand is used to indicate dereferencing, and an asterisk is used to when the value pointed at is required. Here are some sample declarations:

int i;	Declares an integer
char c;	Declares a character
char *ptr;	Declares a pointer to a character
double temp[16];	Declares an array of double-precision floating point numbers with 16 values

Listing 17.2 shows an example of a program with pointers.

Listing 17.2. An example of a program with pointers.

int i;

int *ptr;

i=5;

ptr = &i;

printf("%d %x %d\n", i,ptr,*ptr);

output is: 5 f7fffa6c 5

NOTE: A pointer is just a memory address and will tell you the address of any variable.

There is no specific type for a string. An array of characters is used to represent strings. They can be printed using an %s flag, instead of %c.

Simple output is created by the printf function. printf takes a format string and the list of arguments to be printed. A complete set of format options is presented in Table 17.1. Format options can be modified with sizes. Check the documentation for the full specification.

Table 17.1. Format conversions for printf.

Conversion	Meaning
%%	Percentage sign
%E	Double (scientific notation)
%G	Double (format depends on value)
%X	Hexadecimal (letters are capitalized)
%c	Single character
%d	Integer
%e	Double (scientific notation)
%f	Double of the form mmm.ddd
%g	Double (format depends on value)
%i	Integer
%ld	Long integer
%n	Count of characters written in current printf
%o	Octal
%p	Print as a pointer
%s	Character pointer (string)
%u	Unsigned integer
%x	Hexadecimal

Some characters cannot be included easily in a program. New lines, for example, require a special escape sequence, because there cannot be an unescaped newline in a string. Table 17.2 contains a complete list of escape sequences.

Table 17.2. Escape characters for strings.

Escape Sequence	Meaning
\"	Double quote
\'	Single quote
\?	Question mark
\\	Backslash
\a	Audible bell
\b	Backspace
\f	Form feed (new page)
\n	New line
\ooo	Octal number
\r	Carriage return
\t	Horizontal tab
\v	Vertical tab
\xhh	Hexadecimal number

A full program is compilation of statements. Statements are separated by semicolons. They can be grouped in blocks of statements surrounded by curly braces. The simplest statement is an assignment. A variable on the left side is assigned the value of an expression on the right.

Expressions

At the heart of the C programming language are expressions. These are techniques to combine simple values into new values. There are three basic types of expressions: comparison, numerical, and bitwise expressions.

Comparison Expressions

The simplest expression is a comparison. A comparison evaluates to a TRUE or a FALSE value. In C, TRUE is a non-zero value, and FALSE is a zero value. Table 17.3 contains a list of comparison operators.

Table 17.3. Comparison operators.

Operator	Meaning	Operator	Meaning
<	Less than	>=	Greater than or equal to
>	Greater than	\|\|	Or
==	Equal to	&&	And
<=	Less than or equal to

Expressions can be built by combining simple comparisons with ANDs and ORs to make complex expressions. Consider the definition of a leap year. In words, it is any year divisible by 4, except a year divisible by 100 unless that year is divisible by 400. If year is the variable, a leap year can be defined with this expression.

((((year%4)==0)&&((year%100)!=0))||((year%400)==0))

On first inspection, this code might look complicated, but it isn't. The parentheses group the simple expressions with the ANDs and ORs to make a complex expression.

Mathematical Expressions

One convenient aspect of C is that expressions can be treated as mathematical values, and mathematical statements can be used in expressions. In fact, any statement—even a simple assignment—has values that can be used in other places as an expression.

The mathematics of C is straightforward. Barring parenthetical groupings, multiplication and division have higher precedence than addition and subtraction. The operators are standard. They are listed in Table 17.4.

Table 17.4. Mathematical operators.

Operator	Meaning	Operator	Meaning
+	Addition	/	Division
-	Subtraction	%	Integer remainder
*	Multiplication	^	Exponentiation

There are also unary operators, which effect a single variable. These are ++ (increment by one) and — (decrement by one). These shorthand versions are quite useful.

There are also shorthands for situations in which you want to change the value of a variable. For example, if you want to add an expression to a variable called a and assign the new value to a, the shorthand a+=expr is the same as a=a+expr. The expression can be as complex or as simple as required.

NOTE: Most UNIX functions take advantage of the truth values and return 0 for success. This enables a programmer to write code such as

if (function())
{
error condition
}

The return value of a function determines whether the function worked.

Bitwise Operations

Because a variable is just a string of bits, many operations work on those bit patterns. Table 17.5 lists the bit operators.

Table 17.5. Bit operators.

Operator	Meaning	Operator	Meaning
&	Logical AND	<<	Bit shift left
\|	Logical OR	>>	Bit shift right

A logical AND compares the individual bits in place. If both are 1, the value 1 is assigned to the expression. Otherwise, 0 is assigned. For a logical OR, 1 is assigned if either value is a 1. Bit shift operations move the bits a number of positions to the right or left. Mathematically, this is the same as multiplying or dividing by 2, but circumstances exist where the bit shift is preferred.

Bit operations are often used for masking values and for comparisons. A simple way to determine whether a value is odd or even is to perform a logical AND with the integer value 1. If it is TRUE, the number is odd.

Statement Controls

With what you've seen so far, you can create a list of statements that are executed only once, after which the program terminates. To control the flow of commands, three types of loops exist in C. The simplest is the while loop. The syntax is

while (expression)

       statement

So long as the expression between parentheses evaluates as non-zero—or TRUE in C—the statement is executed. The statement actually can be a list of statements blocked off with curly braces. If the expression evaluates to zero the first time it is reached, the statement is never executed. To force at least one execution of the statement, use a do loop. The syntax for a do loop is

do

        statement

        while (expression);

The third type of control flow is the for loop. This is more complicated. The syntax is

for(expr1;expr2;expr3) statement

When the expression is reached for the first time, expr1 is evaluated. Next, expr2 is evaluated. If expr2 is non-zero, the statement is executed, followed by expr3. Then, expr2 is tested again, followed by the statement and expr3, until expr2 evaluates to zero. Strictly speaking, this is a notational convenience, for a while loop can be structured to perform the same actions. For example,

expr1;

while (expr2) {

        statement;

        expr3

        }

Loops can be interrupted in three ways. A break statement terminates execution in a loop and exits it. continue terminates the current iteration and retests the loop before possibly re-executing the statement. For an unconventional exit, you can use goto. goto changes the program's execution to a labelled statement. According to many programmers, goto is poor programming practice, and you should avoid using it.

Statements can also be executed conditionally. Again, there are three different formats for statement execution. The simplest is an if statement. The syntax is

if (expr) statement

If the expression expr evaluates to non-zero, the statement is executed. You can expand this with an else, the second type of conditional execution. The syntax for else is

if (expr) statement else statement

If the expression evaluates to zero, the second statement is executed.

NOTE: The second statement in an else condition can be another if statement. This situation might cause the grammar to be indeterminant if the structure

if (expr) if (expr) statment else statement

is not parsed cleanly.

As the code is written, the else is considered applicable to the second if. To make it applicable with the first if, surround the second if statement with curly braces. For example:

$ if (expr) {if (expr) statement} else statement

The third type of conditional execution is more complicated. The switch statement first evaluates an expression. Then it looks down a series of case statements to find a label that matches the expression's value and executes the statements following the label. A special label default exists if no other conditions are met. If you want only a set of statements executed for each label, you must use the break statement to leave the switch statement.

This covers the simplest building blocks of a C program. You can add more power by using functions and by declaring complex data types.

If your program requires different pieces of data to be grouped on a consistent basis, you can group them into structures. Listing 17.3 shows a structure for a California driver's license. Note that it includes integer, character, and character array (string) types.

Listing 17.3. An example of a structure.

struct license {

        char name[128];

        char address[3][128];

        int zipcode;

        int height, weight,month, day, year;

        char license_letter;

        int license_number;

        };

struct license licensee;

struct license *user;

Since California driver's license numbers consist of a single character followed by a seven digit number, the license ID is broken into two components. Similarly, the licensee's address is broken into three lines, represented by three arrays of 128 characters.

Accessing individual fields of a structure requires two different techniques. To read a member of a locally defined structure, you append a dot to the variable, then the field name. For example:

licensee.zipcode=94404;

To use a pointer, to the structure, you need -> to point to the member:

user->zipcode=94404;

Interestingly, if the structure pointer is incremented, the address is increased not by 1, but by the size of the structure.

Functions are an easy way to group statements and to give them a name. These are usually related statements that perform repetitive tasks such as I/O. printf, described above, is a function. It is provided with the standard C library. Listing 17.4 illustrates a function definition, a function call, and a function.

NOTE: The three-dot ellipsis simply means that some lines of sample code are not shown here, in order to save space.

Listing 17.4. An example of a function.

int swapandmin( int *, int *);        /* Function declaration */

...

int i,j,lower;

i=2; j=4;

lower=swapandmin(&i, &j);            /* Function call */

...

int swapandmin(int *a,int *b)        /* Function definition */

{

int tmp;

tmp=(*a);

(*a)=(*b);

(*b)=tmp;

if ((*a)<(*b)) return(*a);

return(*b);

}

ANSI C and K&R differ most in function declarations and calls. ANSI requires that function arguments be prototyped when the function is declared. K&R required only the name and the type of the returned value. The declaration in Listing 17.4 states that a function swapandmin will take two pointers to integers as arguments and that it will return an integer. The function call takes the addresses of two integers and sets the variable named lower with the return value of the function.

When a function is called from a C program, the values of the arguments are passed to the function. Therefore, if any of the arguments will be changed for the calling function, you can't pass only the variable—you must pass the address, too. Likewise, to change the value of the argument in the calling routine of the function, you must assign the new value to the address.

In the function in Listing 17.4, the value pointed to by a is assigned to the tmp variable. b is assigned to a, and tmp is assigned to b. *a is used instead of a to ensure that the change is reflected in the calling routine. Finally, the values of *a and *b are compared, and the lower of the two is returned.

If you included the line

printf("%d %d %d",lower,i,j);

after the function call, you would see 2 4 2 on the output.

This sample function is quite simple, and it is ideal for a macro. A macro is a technique used to replace a token with different text. You can use macros to make code more readable. For example, you might use EOF instead of (-1) to indicate the end of a file. You can also use macros to replace code. Listing 17.5 is the same as Listing 17.4 except that it uses macros.

Listing 17.5. An example of macros.

#define SWAP(X,Y) {int tmp; tmp=X; X=Y; Y=tmp; }

#define MIN(X,Y) ((X<Y) ? X : Y )

...

int i,j,lower;

i=2; j=4;

SWAP(i,j);

lower=MIN(i,j);

When a C program is compiled, macro replacement is one of the first steps performed. Listing 17.6 illustrates the result of the replacement.

Listing 17.6. An example of macro replacement.

int i,j,lower;

i=2; j=4;

{int tmp; tmp=i; i=j; j=tmp; };

lower= ((i<j) ? i : j );

The macros make the code easier to read and understand.

Creating a Simple Program

For your first program, write a program that prints a chart of the first ten integers and their squares, cubes, and square roots.

Writing the Code

Using the text editor of your choice, enter all the code in Listing 17.7 and save it in a file called sample.c.

Listing 17.7. Source code for sample.c.

#include <stdio.h>

#include <math.h>

main()

{

int i;

double a;

for(i=1;i<11;i++)

        {

        a=i*1.0;

        printf("%2d. %3d %4d %7.5f\n",i,i*i,i*i*i,sqrt);

        }

}

The first two lines are header files. The stdio.h file provides the function definitions and structures associated with the C input and output libraries. The math.h file includes the definitions of mathematical library functions. You need it for the square root function.

The main loop is the only function that you need to write for this example. It takes no arguments. You define two variables. One is the integer i, and the other is a double-precision floating point number called a. You wouldn't have to use a, but you can for the sake of convenience.

The program is a simple for loop that starts at 1 and ends at 11. It increments i by 1 each time through. When i equals 11, the for loop stops executing. You could have also written i<=10, because the expressions have the same meaning.

First, you multiply i by 1.0 and assign the product to a. A simple assignment would also work, but the multiplication reminds you that you are converting the value to a double-precision floating point number.

Next, you call the print function. The format string includes three integers of widths 2, 3, and 4. After the first integer is printed, you print a period. After the first integer is printed, you print a floating point number that is seven characters wide with five digits following the decimal point. The arguments after the format string show that you print the integer, the square of the integer, the cube of the integer, and the square root of the integer.

Compiling the Program

To compile this program using the C compiler, enter the following command:

cc sample.c -lm

This command produces an output file called a.out. This is the simplest use of the C compiler. It is one of the most powerful and flexible commands on a UNIX system.

A number of different flags can change the compiler's output. These flags are often dependent on the system or compiler. Some flags are common to all C compilers. These are described in the following paragraphs.

The -o flag tells the compiler to write the output to the file named after the flag. The cc -o sample sample.c command would put the program in a file named sample.

NOTE: The output discussed here is the compiler's output, not the sample program. Compiler output is usually the program, and in every example here, it is an executable program.

The -g flag tells the compiler to keep the symbol table (the data used by a program to associate variable names with memory locations), which is necessary for debuggers. Its opposite is the -O flag, which tells the compiler to optimize the code—that is, to make it more efficient. You can change the search path for header files with the -I flag, and you can add libraries with the -l and -L flags.

The compilation process takes place in several steps.

First, the C preprocessor parses the file. To parse the file, it sequentially reads the lines, includes header files, and performs macro replacement.
The compiler parses the modified code for correct syntax. This builds a symbol table and creates an intermediate object format. Most symbols have specific memory addresses assigned, although symbols defined in other modules, such as external variables, do not.
The last compilation stage, linking, ties together different files and libraries and links the files by resolving the symbols that have not been resolved yet.

Executing the Program

The output from this program appears in Listing 17.8.

Listing 17.8. Output from the sample.c program.

$ sample.c

 1.   1    1 1.00000

 2.   4    8 1.41421

 3.   9   27 1.73205

 4.  16   64 2.00000

 5.  25  125 2.23607

 6.  36  216 2.44949

 7.  49  343 2.64575

 8.  64  512 2.82843

 9.  81  729 3.00000

10. 100 1000 3.16228

NOTE: To execute a program, just type its name at a shell prompt. The output will immediately follow.

Building Large Applications

C programs can be broken into any number of files, so long as no function spans more than one file. To compile this program, you compile each source file into an intermediate object before you link all the objects into a single executable. The -c flag tells the compiler to stop at this stage. During the link stage, all the object files should be listed on the command line. Object files are identified by the .o suffix.

Making Libraries with ar

If several different programs use the same functions, they can be combined in a single library archive. The ar command is used to build a library. When this library is included on the compile line, the archive is searched to resolve any external symbols. Listing 17.9 shows an example of building and using a library.

Listing 17.9. Building a large application.

cc -c sine.c

cc -c cosine.c

cc -c tangent.c

ar c libtrig.a sine.o cosine.o tangent.o

cc -c mainprog.c

cc -o mainprog mainprog.o libtrig.a

Building Large Applications with make

Of course, managing the process of compiling large applications can be difficult. UNIX provides a tool that takes care of this for you. make looks for a makefile, which includes directions for building the application.

You can think of the makefile as being its own programming language. The syntax is

target: dependencies

        Commandlist

Dependencies can be targets declared elsewhere in the makefile, and they can have their own dependencies. When a make command is issued, the target on the command line is checked; if no targets are specified on the command line, the first target listed in the file is checked.

When make tries to build a target, first the dependencies list is checked. If any of them requires rebuilding, it is rebuilt. Then, the command list specified for the target itself is executed.

make has its own set of default rules, which are executed if no other rules are specified. One rule specifies that an object is created from a C source file using $(cc) $(CFLAGS) -c (source file). CFLAGS is a special variable; a list of flags that will be used with each compilation can be stored there. These flags can be specified in the makefile, on the make command line, or in an environment variable. make checks the dependencies to determine whether a file needs to be made. It uses the mtime field of a file's status. If the file has been modified more recently than the target, the target is remade.

Listing 17.10 shows an example of a makefile.

Listing 17.10. An example of a makefile.

CFLAGS= -g

igfl: igfl.o igflsubs.o

        cc -g -o igfl igfl.o igflsubs.o -lm

igflsubs.o: igfl.h

clean:

        rm -f *.o

Listing 17.10 uses several targets to make a single executable called igfl. The two C files are compiled into objects by implicit rules. Only igflsubs.o is dependent on a file, igfl.h. If igfl.h has been modified more recently than igflsubs.o, a new igfl.o is compiled.

Note that there is a target called clean. Because there are no dependencies, the command is always executed when clean is specified. This command removes all the intermediate files. Listing 17.11 shows the output of make when it is executed for the first time.

Listing 17.11. Output of make.

cc -g  -target sun4 -c  igfl.c

cc -g  -target sun4 -c  igflsubs.c

cc -g -o igfl igfl.o igflsubs.o -lm

Debugging Tools

Debugging is a science and an art unto itself. Sometimes, the simplest tool—the code listing—is best. At other times, however, you need to use other tools. Three of these tools are lint, prof, and sdb. Other available tools include escape, cxref, and cb. Many UNIX commands have debugging uses.

lint is a command that examines source code for possible problems. The code might meet the standards for C and compile cleanly, but it might not execute correctly. Two things checked by lint are type mismatches and incorrect argument counts on function calls. lint uses the C preprocessor, so you can use similar command-like options as you would use for cc.

The prof command is used to study where a program is spending its time. If a program is compiled and linked with -p as a flag, when it executes, a mon.out file is created with data on how often each function is called and how much time is spent in each function. This data is parsed and displayed with prof. An analysis of the output generated by prof helps you determine where performance bottlenecks occur. Although optimizing compilers can speed your programs, this analysis significantly improves program performance.

The third tool is sdb—a symbolic debugger. When a program is compiled with -g, the symbol tables are retained, and a symbolic debugger can be used to track program bugs. The basic technique is to invoke sdb after a core dump and get a stack trace. This indicates the source line where the core dump occurred and the functions that were called to reach that line. Often, this is enough to identify the problem. It is not the limit of sdb, though.

sdb also provides an environment for debugging programs interactively. Invoking sdb with a program enables you to set breakpoints, examine variable values, and monitor variables. If you suspect a problem near a line of code, you can set a breakpoint at that line and run the program. When the line is reached, execution is interrupted. You can check variable values, examine the stack trace, and observe the program's environment. You can single-step through the program, checking values. You can resume execution at any point. By using breakpoints, you can discover many of the bugs in your code that you've missed.

cpp is another tool that can be used to debug programs. It will perform macro replacements, include headers, and parse the code. The output is the actual module to be compiled. Normally, though, cpp is never executed by the programmer directly. Instead it is invoked through cc with either a -E or -P option. -E will put the output directly to the terminal; -P will make a file with a .i suffix.

Summary

In this chapter, we've discussed the basics of the C language: building C programs, running them, and debugging them. While this overview isn't enough to make you an expert C programmer, you can now understand how programmers develop their products. You should also be able to read a C program and know what the program is doing.