C is a programming language developed by Dennis Ritchie, in the early 1970s, for use on the UNIX operating system. It is now used on practically every operating system, and is the most popular language for writing system software, though it is also used for writing applications. It is also commonly used in computer science education. The popular C++ programming language is based on C.
C is a relatively minimalist programming language. It is significantly lower-level than most other programming languages. Even though it is sometimes referred to as a "high level language", it is only really higher-level than the various assembly languages.
C has two important advantages over assembly. Firstly, code is generally easier to read and much less burdensome to write, especially for lengthy programs. Secondly, assembly code is usually applicable only to a specific computer architecture, whereas a C program can be ported to any architecture on which a C compiler and certain required libraries exist. (C code is almost always compiled, rather than interpreted.) On the other hand, the efficiency of C code is somewhat dependent on the ability of the compiler to optimize the resulting machine language, which is largely out of the programmer's control. In contrast, the efficiency of assembly code is precisely determined, since assembly is just human-readable notation for a machine language. For this reason, programs such as operating system kernelss, though mostly written in C, may contain "hand-tuned" fragments of assembly language where performance is especially crucial.
Similar advantages and disadvantages distinguish C from higher-level languages: the efficiency of C code can be more closely controlled, at the cost of being generally more troublesome to read and write. Note, however, that C is at least as portable as higher-level languages, because nowadays most computer architectures are equipped with a C compiler and libraries; in fact, the compilers, libraries, and interpreters of higher-level languages are often implemented in C!
One of the most notable features of C is that it is up to the programmer to manage the contents of computer memory. Standard C provides no facilities for array bounds checking or automatic garbage collection. In contrast, the Java and C# languages, both descendants of C, provide automatic memory management, including garbage collection. While manual memory management provides the programmer with greater leeway in tuning the performance of a program, it also makes it easy to produce bugs involving erroneous memory operations, such as buffer overflows. Bugs of these sort have gained notoriety for their effects on computer insecurity. Some tools have been created to help C programmers avoid memory errors, including libraries for performing array bounds checking and automatic garbage collection, and automated source code checkers such as Lint.
Some of the specific features of C are:
The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named "C" because many of its features were derived from an earlier language called "B". Accounts differ regarding the origins of the name "B": Ken Thompson credits the BCPL programming language, but he had also created a language called Bon in honor of his wife Bonnie.
By 1973, the C language had become powerful enough that most of the UNIX kernel, originally written in PDP-11/20 assembly language, was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly, earlier instances being the Multics system (written in PL/I) and Tripos (written in BCPL.)
In 1978, Ritchie and Brian Kernighan published the first edition of The C Programming Language. This book, known to C programmers as "K&R", served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as "K&R C." (The second edition of the book covers the later ANSI C standard, described below.)
K&R introduced the following features to the language:
In the years following the publication of K&R C, several "unofficial" features were added to the language, supported by compilers from AT&T and some other vendors. These included:
During the late 1970s, C began to replace BASIC as the leading microcomputer programming language. During the 1980s, it was adopted for use with the IBM PC, and its popularity began to increase significantly. At the same time, Bjarne Stroustrup and others at Bell Labs began work on adding object-oriented programming language constructs to C. The language they produced, called C++, is now the most common application programming language on the Microsoft Windows operating system; C remains more popular in the Unix world.
In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11,
to establish a standard specification of C. After a long and arduous process, the standard was completed in 1989 (one year after the first ANSI standard for C++!) and ratified as ANSI X3.159-1989 "Programming Language C". This version of the language is often referred to as ANSI C. In 1990, the ANSI C standard (with a few minor modifications) was adopted by the International Standards Organization (ISO) as ISO/IEC 9899:1990.
One of the aims of the ANSI C standardization process was to produce a superset of K&R C, incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from C++), and a more capable preprocessor.
ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays is based on ANSI C. Any program written only in standard C is guaranteed to perform correctly on any platform with a conforming C implementation. However, many programs have been written that will only compile on a certain platform, or with a certain compiler, due to (i) the use of non-standard libraries, e.g. for graphical displays, and (ii) some compilers not adhering to the ANSI C standard, or its successor, in their default mode.
After the ANSI standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve. (Normative Amendment 1 created a new version of the C language in 1995, but this version is rarely acknowledged.) However, the standard underwent revision in the late 1990s, leading to the publication of ISO 9899:1999 in 1999. This standard is commonly referred to as "C99". It was adopted as an ANSI standard in March 2000.
The new features in C99 include:
The following simple application prints out "Hello, World" to the standard output file (which is usually the screen, but might be a file or some other hardware device). A version of this program appeared for the first time in K&R.
A C program consists of functions and variables. C functions are like the subroutines and functions of Fortran or the procedures and functions of Pascal. The function
The
A function may return a value to the environment which called it. This is usually another C function. The
A C function consists of a return type (
Note: bracing style varies from programmer to programmer and can be the subject of great debate ("religious wars"). See Indent style for more details.
Compound statements in C have the form
A statement of the form
C has three types of selection statements: two kinds of
The two kinds of
The
C has three forms of iteration statement:
while (
for (
If all three expressions are present in a
Jump statements transfer control unconditionally. There are four types of jump statements in C:
The
A
do {
/* ... */
cont: ;
} while (expression);
for (optional-expr; optexp2; optexp3) {
/* ... */
cont: ;
}
The
A function returns to its caller by the
The values in the
If a declaration is suffixed by a number in square brackets (
Examples:
If a variable has an asterisk (*) in its declaration it is said to be a pointer.
Examples:
Another operator, the
Strings may be manipulated without using the standard library. However, the library contains many useful functions for working with both zero-terminated strings and unterminated arrays of
The most commonly used string functions are:
The following example demonstrates how a filter program is typically structured:
The parameters given on a command line are passed to a C program with two predefined variables - the count of the command line arguments in
(Note: there is no guarantee that the individual strings are contiguous.)
The individual values of the parameters may be accessed with
An interesting (though certainly not unique) aspect of the C standards is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from (accidentally) working as intended to crashing every time it is run
For example, the following code produces undefined behavior, because the variable
Features
History
Early developments
K&R C
K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. For many years, even after the introduction of ANSI C, it was considered the "lowest common denominator" that C programmers stuck to when maximum portability was desired, since not all compilers were updated to fully support ANSI C, and reasonably well-written K&R C code is also legal ANSI C.struct
data types
long int
data type
unsigned int
data type
=+
operator was changed to +=
, and so forth (=+
was confusing the C compiler's lexical analyzer).void
functions and void *
data type
struct
or union
types
struct
field names in a separate name space for each struct type
struct
data types
const
qualifier to make an object read-only
float
typeANSI C and ISO C
C99
Interest in supporting the new C99 features appears to be mixed. Whereas GCC and several other compilers now support most of the new features of C99, the compilers maintained by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.long long int
(to reduce the pain of the looming 32-bit to 64-bit transition), an explicit boolean data type, and a complex
type representing complex numbers
//
, borrowed from C++
snprintf()
stdint.h
"Hello, World!" in C
\r\n#include
Anatomy of a C Program
main()
is special in that a C program always begins executing at the beginning of this function. This means that every C program must have a main()
function.main()
function will usually call other functions to help perform its job. Functions may be written by the programmer, or provided by existing libraries; the latter are accessed by including "standard headers" via the #include
preprocessing directive. Certain library functions, such as printf()
in the above example, are defined by the C standards; these are referred to as the standard library. (An implementation of C providing all of the standard library functions is called a "hosted implementation"; some implementations are not hosted, usually because they are not intended to be used with an operating system.) Other libraries can provide extra functionality, such as a graphical interface, advanced mathematical operations, or access to platform-specific features.main()
function's calling environment is the operating system. Hence, in the "Hello, world!" example above, the operating system receives a value of 0 when the program terminates. (The printf
function above returns how many characters were printed -- in the case above, 14 -- but its value is effectively ignored.)void
if no value is returned), a unique name, a list of parameters in parentheses (void
if there are none) and a function body delimited by braces. The syntax of the function body is equivalent to that of a compound statement.Control structures
Compound statements
{
and are used as the body of a function or anywhere that a single statement is expected.Expression statements
is an expression statement. If the expression is missing, the statement is called a null statement.Selection statements
if
and the switch
statement.if
statement are if (
and if (
In the if
statement, if the expression in parentheses is nonzero or true, control passes to the statement following the if
. If the else
clause is present, control will pass to the statement following the else
clause if the expression in parentheses is zero or false. The two are disambiguated by matching an else
to the next previous unmatched if
at the same nesting level. Braces may be used to override this or for clarity.switch
statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case
labels, which consist of the keyword case
followed by a constant expression and then a colon (:).
No two of the case constants associated with the same switch may have the same value. There may be at most one default
label associated with a switch; control passes to the default
label if none of the case labels are equal to the expression in the parentheses following switch
.
Switches may be nested; a case
or default
label is associated with the smallest switch that contains it. Switch statements can "fall-through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break statement is encountered. This may prove useful in certain circumstances, newer programming languages forbid case statements to "fall-through".
In the below example, if break
to separate the two case statements. switch (
Iteration statements
do
In the while
and do
statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero or true. With while
, the test, including all side effects from the expression, occurs before each execution of the statement; with do
, the test follows each iteration.for
, the statement for (e1; e2; e3)
s;
is equivalent to e1;
while (e2) {
s;
e3;
}
Any of the three expressions in the for
loop may be omitted. A missing second expression makes the while
test nonzero, creating an infinite loop.Jump statements
goto
, continue
, break
, and return
.goto
statement looks like this: goto <identifier>;
The identifier must be a label located in the current function. Control transfers to the labeled statement.continue
statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the smallest enclosing such statement. That is, within each of the statements while (expression) {
/* ... */
cont: ;
}
a continue
not contained within a nested iteration statement is the same as goto cont
.break
statement is used to get out of a for
loop, while
loop, do
loop, or switch
statement. Control passes to the statement following the terminated statement.return
statement. When return
is followed by an expression, the value is returned to the caller of the function. Flowing off the end of the function is equivalent to a return
with no expression. In either case, the returned value is undefined.Operator precedence in C89
() [] -> . ++ -- (cast) postfix operators
++ -- * & ~ ! + - sizeof unary operators
* / % multiplicative operators
+ - additive operators
<< >> shift operators
< <= > >= relational operators
== != equality operators
& bitwise and
^ bitwise exclusive or
| bitwise inclusive or
&& logical and
|| logical or
?: conditional operator
= += -= *= /= %= <<= >>=
&= |= ^= assignment operators
, comma operator
Data declaration
Elementary data types
<limits.h>
and <float.h>
headers determine the ranges of the fundamental data types. The ranges of the float
, double
, and long double
types are typically those mentioned in the IEEE 754 Standard.
name
minimum range
char
-127..127 or 0..255
unsigned char
0..255
signed char
-127..127
int
-32767..32767
short int
-32767..32767
long int
-2147483647..2147483647
float
1e-37..1e+37 (positive range)
double
1e-37..1e+37 (positive range)
long double
1e-37..1e+37 (positive range)
Arrays
[]
), the declaration is said to be an array declaration. Strings are just character arrays. They are terminated by a character zero (represented in C by '\\0'
, the null character). Array bounds are not checked, and if a memory location beyond the array is written to, it may result in a segmentation fault. int myvector [100];
char mystring [80];
float mymatrix [3] [2] = {2.0 , 10.0, 20.0, 123.0, 1.0, 1.0}
char lexicon [10000] [300] ; /* 10000 entries with max 300 chars each. */
int a[3][4];
The last example above creates an array of arrays, but can be thought of as
a multidimensional array for most purposes. The 12 int
values
created could be accessed as follows:
a[0][0]
a[0][1]
a[0][2]
a[0][3]
a[1][0]
a[1][1]
a[1][2]
a[1][3]
a[2][0]
a[2][1]
a[2][2]
a[2][3]
Pointers
int *pi; /* pointer to int */
int *api[3]; /* array of 3 pointers to int */
char **argv; /* pointer to pointer to char */
The value at the address stored in a pointer variable can then be accessed in the program with an asterisk. For example, given the first example declaration above, *pi
is an int
. This is called "dereferencing" a pointer.&
(ampersand), called the address-of
operator, returns the address of variable, array, or function. Thus, given the following int i, *pi; /* int and pointer to int */
pi = &i;
i
and *pi
could be used interchangeably (at least
until pi
is set to something else).Strings
char
.
The less important string functions are:strcat(dest, source)
- appends the string source
to the end of string dest
strchr(s, c)
- finds the first instance of character c
in string s
and returns a pointer to it or a null pointer if c
is not found
strcmp(a, b)
- compares strings a
and b
(lexical ordering); returns negative if a
is less than b
, 0 if equal, positive if greater.
strcpy(dest, source)
- copies the string source
to the string dest
strlen(st)
- return the length of string st
strncat(dest, source, n)
- appends a maximum of n
characters from the string source
to the end of string dest
; characters after the null terminator are not copied.
strncmp(a, b, n)
- compares a maximum of n
characters from strings a
and b
(lexical ordering); returns negative if a
is less than b
, 0 if equal, positive if greater.
strncpy(dest, source, n)
- copies a maximum of n
characters from the string source
to the string dest
strrchr(s, c)
- finds the last instance of character c
in string s
and returns a pointer to it or a null pointer if c
is not foundstrcoll(s1, s2)
- compare two strings according to a locale-specific collating sequence
strcspn(s1, s2)
- returns the index of the first character in s1
that matches any character in s2
strerror(err)
- returns a string with an error message corresponding to the code in err
strpbrk(s1, s2)
- returns a pointer to the first character in s1
that matches any character in s2
or a null pointer if not found
strspn(s1, s2)
- returns the index of the first character in s1
that matches no character in s2
strstr(st, subst)
- returns a pointer to the first occurrence of the string subst
in st
or a null pointer if no such substring exists.
strtok(s1, s2)
- returns a pointer to a token within s1
delimited by the characters in s2
.
strxfrm(s1, s2, n)
- transforms s2
into s1
using locale-specific rulesFile Input / Output
In C, input and output are performed via a group of functions in the standard library. In ANSI/ISO C, those functions are defined in the <stdio.h>
header.Standard I/O
Three standard I/O streams are predefined:
These streams are automatically opened and closed by the runtime environment, they need not and should not be opened explicitly.stdin
standard input
stdout
standard output
stderr
standard error
\r\n#include
Passing command line arguments
argc
and the individual arguments as character arrays in the pointer array argv
.
So the command
myFilt p1 p2 p3
results in something likeargv[1]
, argv[2]
, and argv[3]
.Undefined behaviors
b
is operated on more than once in the expression a = b + b++;
:\r\n#include
Links and references
See also:
External links
References
An early version of this article contained material from FOLDOC, used with permission.