Array Data Structures & Algorithms

Download Report

Transcript Array Data Structures & Algorithms

Characters and Strings
Character and String definitions,
algorithms, library functions
Character and String Processing
 A common programming issue involves manipulation of
text, usually referred to as string, or text, processing
 To achieve solutions typically requires capabilities to:
 perform input and output of characters and strings
 query what a single character is, or is not
 determine if a character, a substring, or any of a set of




characters is included, or not, in a string
determine the attributes of a character (eg. upper versus
lower case) or string (eg. length)
convert between character string and machine
representations of different data types
break large strings into smaller substrings recognized by
tokens
join substrings into larger strings (catenation)
Characters and Strings in C
 The concept of a string refers to a sequence of items.
 The sequence, or string, may contain zero or more
elements, and a delimiter that denotes the end
(termination) of the string.
 A string of characters, in computer science terms,
usually refers to a vector, or list, of char values
 ASCII is commonly used
 UniCode is another
 In the C language, the special delimiter character
‘\0’ (called character null) is recognized by the
compiler and assigned a specific integer value
 Strings of bits (or other encoded symbols) provides
abstraction possibilities for more general strings.
Fundamentals
String length
 Defining a string container
 Example:
#define STRLEN 256
l
o
H
e
l
\0
char strName [ STRLEN ] ;
 Example:
char
strName
[];
Consider
a variation
of the second example, using pointers:
char * strPtr ;
Delimiter
char
strName
[
50
]
,
*
strPtr
;
Sequence of characters
(character null,
int k ;
 Initialization
terminal)
(value
of =the
string)
for(
k=0, strPtr
strName
; k<49; k++, strPtr++
) *strPtr = ‘#’ ;
 Example:
*strPtr = ‘\0’ ;
char strName1 [ ] = “My name is Bob!” ;
const char * strStatic = “String that cannot be changed!” ;
char strName2 [ ] = { ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ } ;
 Example:
char strName [ 50 ] ;
int k ;
for( k=0; k<49; k++ ) strName[k] = ‘#’ ; // Fill with # symbols
strName[49] = ‘\0’ ;
Character Handling Library
 The C language standard supports the notion of char
data type, and the delimiter character code ‘\0’.
 We do not need to know the details of how character data
is represented in bit form
 In programming and algorithm design it is useful to
know and use a wide variety of functions that query or
manipulate (transform) both individual character data as
well as strings of characters
 We will discuss functions from four libraries
 #include <ctype.h>
 #include <stdlib.h>
and
#include <stdio.h>
 #include <string.h>
 We start with the character function library, <ctype.h>
Character Handling Library <ctype.h>
 Begin with character query functions
 General prototype form:

int isValueRange ( int c ) ; // Returns 1 if a match, or 0
 ValueRange refers to a single value or a range of values
Function Prototype
 Note
that
int isblank(
int c );
Function Description
theReturns
input aargument
c ifhas
the
type0 int
positive value
c is ‘
‘ date
; otherwise
would
suggest
c should
beif char
int isdigit(intIntuition
c );
Returns
a positive
value
c is atype
base-10 digit in the
range ‘0‘ to (involving
‘9’ ; otherwise
0
 Technical considerations
representation
of nonASCII
recommend
int, recalling
int isalpha( int
c ); data encodings)
Returns a positive
value if for
c isusing
an alphabetic
that char is acharacter
compatible
sub-type
of int
(and or
short
in the
range ‘a‘
to ‘z’,
‘A‘int).
to ‘Z’
; otherwise 0
int isalnum( int c );
Returns a positive value if c is an alphabetic
character, or a base-10 digit ; otherwise 0
int isxdigit( int c );
Returns a positive value if c is a base-16
(hexadecimal) digit in the range ‘0‘ to ‘9’, or
‘a’ to ‘f’, or ‘A’ to ‘F’ ; otherwise 0
Character Handling Library
 Additional query functions provide information about
the nature of the character data
 Transformative functions modify the character data
Function Prototype
Function Description
int islower( int c );
Returns a positive value if c is a lower case alphabetic
character in the range ‘a‘ to ‘z’; otherwise 0
int isuppper( int c );
Returns a positive value if c is an upper case alphabetic
character in the range ‘A‘ to ‘Z’ ; otherwise 0
int tolower( int c );
Returns the value c if c is a lower case alphabetic
character, or the upper case variant of the same alphabetic
character (Ex. tolower( ‘A’ ) returns ‘a’)
int toupper( int c );
Returns the value c if c is an upper case alphabetic
character, or the lower case variant of the same alphabetic
character (Ex. toupper( ‘e’ ) returns ‘E’)
Character Handling Library
 And still more query functions for non-alphanumeric
character data (eg. graphical, control signals,
punctuation)
Function Prototype
Function Description
int isspace( int c );
Returns >0 if c is any valid white space character data
(including blank, newline, tab, etc); otherwise 0
int iscntrl( int c );
Returns >0 if c is any valid control character data (including
‘\n’, ‘\b’, ‘\r’, ‘\a’ etc); otherwise 0
int ispunct( int c );
Returns >0 if c is any valid, printable punctuation character
data (including ‘,’, ‘.’, ‘;’, ‘:’ etc.); otherwise 0
int isprint( int c );
Returns >0 if c is any valid, printable character data;
otherwise 0
int isgraph( int c );
Returns >0 if c is any valid character data representing a
graphical symbol (such as ‘<’, ‘>’, ‘#’, ‘$’ etc, and
including extensions to ASCII); otherwise 0
Example: Counting characters
 Problem: Determine the frequencies of occurrence for each alphabetic
character (ignoring case) in a text file.
 Solution:
#include <ctype.h>
#include <stdio.h>
int main ( ) {
int N=0, K, C[26] ; double F[26] ; char Ch ;
for( K=0; K<26; K++ ) { C[K]=0; F[K]=0.0; }
for( Ch=getchar(); Ch != EOF; N++, Ch=getchar() ) {
if( isalpha( Ch ) ) {
K = toupper( Ch ) – ‘A’ ;
C[K]++ ;
}
for( K=0; K<26; K++) {
F[K] = C[K] * 1.0 / N ;
printf( “Frequency of letter %c: %lf\n”, (char) (K+’A’), F[K] ) ;
}
return 0 ;
}
String Conversion Functions: <stdlib.h>
 Purpose of these functions is to convert a string (or
portion) to (1) an integer or (2) a floating point type
 General prototype form:






resultType strtoOutputType ( const char * nPtr,
char **endPtr
[, int base ] ) ;
nPtr points at the input string (protected as constant)
resultType refers to one of double, long int, or unsigned long int
OutputType refers to one of d,
l,
or
ul
base refers to the base of the input string (0, or 2..36)
endPtr points at the position within the input string where a valid
numeric representation terminates
endPtr
nPtr
-
1
2
3
.
8
9
5
$
b
C
\0
String Conversion Functions
Function Prototype
Function Description
double strtod( const char * nPtr,
char **endPtr );
If nPtr points at a valid string representation of a
signed real number (possibly followed by
additional character data), return a double value;
Else return 0 if no part of the input string can be
converted. Return a pointer (through *endPtr) to
the character following the last convertible
character – if no part of the input string is
Note that one can also determine the size of the initial substring
convertible then *endPtr is set to nPtr.
used to determine the double value returned, namely:
int NumChars ; endPtr
nPtr
NumChars = -( EP1– S2) / sizeof(
; // sizeof(char)
usually
3 . char
8 )9
5
$ b
C 1\0
Example usage:
double D ;
const char * S = “ -123.895 $bC” ;
char * EP ;
D = strtod( S, &EP ) ;
if( EP != S ) printf( “Value converted is %lf\n”, D ) ;
else
printf( “No value could be converted\n” ) ;
String Conversion Functions
Function Prototype
Function Description
long strtol( const char * nPtr,
char **endPtr,
int base );
If nPtr points at a valid string representation of a
signed integer number (possibly followed by
additional character data), return a long int value;
Else return 0 if no part of the input string can be
converted. Return a pointer (through *endPtr) to
the character following the last convertible
character – if no part of the input string is
convertible then *endPtr is set to nPtr. The input
long int LI ;
string may use any base digits in the range 0 to
const char * S = “ (base-1).
-1234.$bC” ;
char * EP ;
unsigned long strtoul(
to=strtol()
LI = strtol( S, &EP,Performs
0 ) ; // 0 analogously
base => base
8, 10, for
16 string to
const char * nPtr,
long int conversion.
if( EP != S ) printf(unsigned
“Value converted
is %ld\n”, LI ) ;
char **endPtr, intelse
base ); printf( “No value could be converted\n” ) ;
endPtr
nPtr
-
1
2
3
4
.
$
b
C
\0
String Conversion Functions
 The base argument value (for integer conversions only!)
long int LI ;
defines
thechar
base
of“ –Ab2$”
the input
string.
const
*S=
;
char * EP ;the input string digits may be in base
 For base=0,
= strtol( S, &EP, 13 ) ; // base = 13
16. LI
if( EP != S ) printf( “Value converted is %ld\n”, LI ) ;
 The case
is “No
notvalue
used.
elsebase=1
printf(
could be converted\n” ) ;
8, 10 or
 For 2 <= base <= 36 the characters that are interpretable
// Value outputted is the negative of:
as//baseA*13*13
digits +lieb*13
in the
range from 0 to (base-1)
+ 2 = 1690+143+2 = 1835 (base-10)
Base
Base digits (upper or lower case alpha chars)
0
0, 1, … , F
2
0, 1
10
0, 1, 2, … , 9
13
0, 1, … , 9, A, B, C
24
0, 1, … , 9, A, B, … , N
36
0, 1, … , 9, A, B, … , Z
String Conversion Functions
 The C standard utilities library <stdlib.h> also includes
two additional conversion functions for long long int,
both signed and unsigned.
Function Prototype
Function Description
long long strtoll(
const char * nPtr,
char **endPtr,
int base );
Performs analogously to strtol() for string to long
long int conversion, with identical treatment of
non-convertible strings, treatment of *endPtr and
base.
unsigned long long strtoull(
const char * nPtr,
char **endPtr, int base );
Performs analogously to strtoul() for string to
unsigned long long int conversion, with identical
treatment of non-convertible strings, treatment of
*endPtr and base.
Useful <stdio.h> Functions
 The C standard input/output library contains useful
functions
 I/O of characters and strings
 Conversion to and from character and internal data
representations
Useful <stdio.h> Functions
Function Prototype and Description
#include
<stdio.h>
int getchar(
void
);
int main and
() { returns a single character from the input stream (stdin); if end of file
Fetches
int C ; //then
can also
use char
is signalled
the return
value is EOF
while(
int putchar(
int(C
C =); getchar() ) != EOF && C != ‘\n’ )
putchar(
C character
);
Outputs
a single
to the output stream (stdout). Returns the same
return
0
;
#include
<stdio.h>
character if successful;
otherwise
returns EOF on failure
}
#define MAX 256
char CAUTION:
* fgets( charWhen
* S, intstdin
N, FILE * keyboard,
stream); remember that
int mainis()the
{
Fetches
up to either
(a) generates
a new line a‘\n’, or (b) EOF, or (c) N-1
pressingall
thecharacters
Enter key
chartoSsignal
[ MAXinput
], * sPtr
;
characters
have
been
inputted,
and
then
appends
a delimiter ‘\0’ to make a
character and thiswhile(
must (sPtr
be accounted
= fgets( S,for.
MAX, stdin )) != NULL )
string. The pointer S points to the inputted string. Input is from the input stream
puts( S ) ;
(typically stdin, but can be from a text file). Returns a pointer to the input string,
return 0 ;
or NULL if failure occurs (as with EOF).
}
int puts( const char * S );
Outputs the string of characters S, followed by a newline ‘\n’. Returns a nonzero integer result (typically the number of characters outputted), or EOF on
failure.
Useful <stdio.h> Functions
#include
 The functions sprintf() and sscanf()
are<stdio.h>
used for processing of
int mainrepresentations
() {
character (string) data and machine
of data
int A ; float X ;
(according to different data types).
char S[100], M[100] ;
char FormatStr[7] = “%d%f%s” ;
 All data processing is done in RAM – no I/O is involved!
scanf( FormatStr, &A, &X, S );
printf( FormatStr, A, X, S ) ;
int sprintf( char * S, const char * format [, …] );
Used in the same way as printf(), exceptfgets(
that the
of characters
M, string
100, stdin
);
produced is directed to the string argument
S, according
to the&A,
format
sscanf(
M, FormatStr,
&X, string
S );
(and referenced parameters).
sprintf( M, FormatStr, A, X, S );
Function Prototype and Description
M );
int sscanf( char * S, const char * format [, …]puts(
);
Used in the same way as scanf(), except that the string S contains the “input”
returnstring
0 ; (and referenced
data to be processed according to the format
}
parameters).
String Manipulation Functions
 Two functions are provided to perform copying of one
string into another string.
Function Prototype and Description
char * strcpy( char * Dest, const char * Src);
Copies the source string Src to the destination Dest. If Src is shorter, or
equal in length, to Dest, the entire string is copied. If Src is longer than
Dest, only those characters that will fit are copied – note that this may
leave Dest without a delimiter ‘\0’ (which fails to define a proper string).
char * strncpy( char * Dest, const char * Src, size_t N);
Copies the first N characters of the source string Src to the destination
Dest. If N is less than the length of Dest, the entire Src string is copied – if
the length of Src is less than N then the entire Src string is copied and as
many ‘\0’ as needed are inserted to fill up to N characters is performed.. If
N is greater than the length of Dest, only those characters that will fit are
copied – note that this may leave Dest without a delimiter ‘\0’ (which fails
to define a proper string). Remember that strncpy() does not append the
delimiter automatically!
String Manipulation Functions
 Joining together of two strings is called string
catenation (also called concatenation).
 For instance, one might combine various words and phrases
to form sentences and paragraphs.
Function Prototype and Description
char * strcat( char * S1, const char * S2);
Copy string S2 to a position in S1, following the string already in S1. Note that
the original ‘\0’ delimiter in S1 is overwritten by the first character in the S2
string, so that only one delimiter occurs at the end of the modified S1 string. If
the total number of characters is greater than the capacity of S1 then a logical
error will likely ensue.
char * strncat( char * S1, const char * S2, size_t N);
Copy the first N characters of the string S2 to a position in S1, following the
string already in S1. The original ‘\0’ delimiter in S1 is overwritten by the first
character in the S2 string, and only one delimiter occurs at the end of the
modified S1 string inserted by strncat() automatically. If the total number of
characters is greater than the capacity of S1 then a logical error will likely ensue.
String Comparison Functions
 Comparison of two strings is based on the notion of
lexical ordering.
 All
characters
encoded (eg. ASCII) and the numeric values
Function
Prototype
and are
Description
of the characters defines the possible orderings.
int strcmp( const char * S1, const char * S2);
 Stringstrings
comparisons
areReturns
done based
bothequivalent
(a) character
Compares
S1 and S2.
0 if S1on
is fully
to S2,by
a
character
(b) use
relative
length ofnumber
each is
positive
number comparison,
if S1 is lexicallyand
greater
thanofS2,
and a negative
S1 is string.
lexically less than S2.
int strncmp( const char * S1, const char * S2, size_t N);
Compares up to the first N characters of the strings S1 and S2. Returns 0
if S1 is fully equivalent to S2, a positive number if S1 is lexically greater
than S2, and a negative number is S1 is lexically less than S2. Note that if
the length of either S1 or S2 is less than N, the comparison is done only
for the characters present in each string.
Strings - Search Functions
 C provides functions for searching for various
characters and substrings within a string
 This is a huge advantage in text processing
Function Prototype and Description
char * strchr( const char * S, int C);
Locates the position in S of the first occurrence of C. Returns the pointer
value to where C is first located; otherwise returns NULL.
size_t strspn( const char * S1, const char * S2 );
String S1 is searched, and returns the length of the initial substring
segment in S1 that contains characters only found in S2.
size_t strcspn( const char * S1, const char * S2 );
String S1 is searched, and returns the length of the initial substring
segment in S1 that contains characters not found in S2.
Strings - Search Functions
Function Prototype and Description
char * strpbrk( const char * S1, const char * S2 );
Locates the first occurrence in S1 of any character found in S2, and
returns a pointer to that position in S1. Otherwise a NULL value is
returned.
char * strrchr( const char * S1, int C );
Locates the last occurrence in S1 of any character found in S2, and
returns a pointer to that position in S1. Otherwise a NULL value is
returned.
char * strstr( const char * S1, const char * S2 );
Locates the first occurrence in S1 of the entire string S2. Otherwise a
NULL value is returned.
Strings - Search Functions
 Consider the problem of a string of text S1 that contains various
words (substrings) separated by specially designated characters
used as delimiters (and contained in a string S2). The objective is
to extract the words from the text. This can be accomplished using
the function strtok() repeatedly.
 Each identified substring in S1, delimited by a character in S2, is
called a token. Thus, strtok() is called the string tokenizer function.
Function Prototype and Description
char * strtok( char * S1, const char * S2 );
The first call to strtok() states the argument S1 and provides the string of
delimiters S2. Returns a pointer to the next token found in S1.
Each subsequent call to strtok() uses NULL as the first argument (instead
of the string S1), and the function remembers where it left off from the last
time it was called.
Each time strtok() is called, it points to the next token found and also
replaces the delimiter character by ‘\0’. Thus, S1 is modified!
Thus, a sequence of calls to strtok() breaks S1 into token substrings.
Strings - Search Functions
#include <stdio.h>
#include <string.h>
int main () {
int N = 0 ;
char S[] = “This is a sentence with tokens separated by blanks.” ;
char * tokenPtr ;
printf( “The following tokens were found in S.\n” ) ;
tokenPtr = strtok( S, “ “ ) ; // First time use S; ‘ ‘ is the only delimiter
while( tokenPtr != NULL ) {
N++ ;
printf( “%s\n”, tokenPtr ) ;
tokenPtr = strtok( NULL, “ “ ) ; // Use NULL in successive calls
}
printf( “Number of tokens found = %d\n”, N ) ;
return 0 ;
}
Strings - Search Functions
#include <stdio.h>
#include <string.h>
int main () {
int N = 0 ;
char S[] = “This is a sentence with tokens separated by various characters.” ;
char * tokenPtr, * DelimList = “ .,;:$“ ;
printf( “The following tokens were found in S.\n” ) ;
tokenPtr = strtok( S, DelimList ) ; // First time use S; various delimiters
while( tokenPtr != NULL ) {
N++ ;
printf( “%s\n”, tokenPtr ) ;
tokenPtr = strtok( NULL, DelimList ) ; // Use NULL in successive calls
}
printf( “Number of tokens found = %d\n”, N ) ;
return 0 ;
}
Memory Functions in <string.h>
 C also provides functions for dealing with blocks of data
in RAM
 The blocks may be characters, or other data types, hence
the functions typically return a void * pointer value.
 A void * pointer value can be assigned to any other pointer type, and
vice versa.
Function Prototype and Description
 However, void * pointers cannot be dereferenced, thus the size of the
void * memcpy(
void
* S1,
voidas
* S2,
size_t N );
block
must
be const
specified
an argument.
Copies N characters (bytes) from the object S2 into the object S1. A pointer to
 None of the functions discussed perform checks for terminating null
the resulting object (S1) is returned, otherwise NULL is returned on failure.
characters (delimiters).
Note: The result of this function is undefined if S1 and S2 overlap!
void * memmove( void * S1, const void * S2, size_t N );
Copies N characters (bytes) from the object S2 into the object S1. A pointer to
the resulting object (S1) is returned, otherwise NULL is returned on failure.
Note: This function utilizes a temporary memory space to perform the copying,
hence the operation is always defined.
Memory Functions in <string.h>
Function Prototype and Description
int memcmp( const void * S1, const void * S2, size_t N );
Compares the first N characters (bytes) of S1 and S2. Returns 0 if S1==S2,
>0 if S1>S2, and <0 if S1<S2.
void * memchr( const void * S1, int C, size_t N );
Locates the first occurrence of the character C in the first N characters (bytes)
of S1. If C is found, a pointer to C in S1 is returned. Otherwise, NULL is
returned.
void * memset( void * S1, int C, size_t N );
Copies the character (byte) C to the first N positions of S1. A pointer to S1 is
returned, or NULL on failure.
Note: the type of C is modified to unsigned char to enable copying to blocks
of arbitrary data type.
Other Functions in <string.h>
Function Prototype and Description
size_t strlen( const char * S );
Determines and returns the number of characters in S, not including the
‘\0’ delimiter.
char * strerror( int errornum );
Outputs to stdout an error message (defined by others as standard
messages) referenced by an error number code. For instance, the
statement
printf( “%s\n”, strerror( 2 ) ) ;
might generate the output string:
No such file or directory
Secure C programming
 C11 standard with Annex K
 Addresses issues related to robustness of array
based manipulation of character data (and other data
containers)
 Stack overflow detection
 Array overflow detection
 Read more:
 CERT guideline INT05-C
 www.securecoding.cert.org
 Additional online Appendices E-H for the textbook
 www.pearsonhighered.com/deitel/
Summary
Concepts of character and strings, query functions, transformation
functions, search functions, generalization to abstract strings
(memory functions).
Topic Summary
 Characters and Strings in the C language
 Multiple library sources
 Query functions
 Transformative functions
 Conversion functions
Practice, practice,
 Memory functions
practice !
 Reading – Chapter 8
 Review Pointers as well, especially the const qualifier, and also
the use of ** for modifying pointer values on return (through
arguments) from functions.
 Reading – Chapter 9: Formatted Input and Output
 This chapter is straightforward and is assigned for self-directed
independent study and learning – it will be tested!