cstring management

Upload: rodrigo-a-saucedo

Post on 04-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 CString Management

    1/17

    highlights off

    Not quite what you are looking for? You may want to try:

    String Helpers

    Useful ReSharper plugin for localization

    8,335,565 members and growing!Email Password Sign in Join Remember me? Lost password?

    Home Articles Quick Answers Discussions Learning Zones Features Help! The Lounge string

    151Article Browse Code Stats Revisions Sponsored Links

    General Programming String handling General

    Licence CPOL

    First Posted 16 May 2000

    Views 419,261

    Bookmarked 261 times

    CString ManagementBy Joseph M. Newcomer | 17 May 2000

    VC6 Visual-Studio MFC Dev Beginner

    Learn how to effectively use CStrings.

    See Also

    More like this

    More by this author

    4.94 (128 votes)

    Introduction

    CStrings are a useful data type. They greatly simplify a lot of operations in MFC, making it much more

    convenient to do string manipulation. However, there are some special techniques to using CStrings,

    particularly hard for people coming from a pure-C background to learn. This essay discusses some of

    these techniques.

    Much of what you need to do is pretty straightforward. This is not a complete tutorial on CStrings, but

    captures the most common basic questions.

    CString concatenation

    Formatting (including integer-to-CString)

    Converting CStrings to integers

    Converting between char * to a CString

    char * to CString

    CString to char * I: Casting to LPCTSTR

    CString to char * II: Using GetBuffer

    CString to char * III: Interfacing to a control

    CString to BSTR

    BSTR to CString (New 30-Jan-01)

    VARIANT to CString (New 24-Feb-01)

    Loading STRINGTABLE resources (New 22-Feb-01)

    CStrings and temporary objects

    CString efficiency

    String Concatenation

    One of the very convenient features ofCString is the ability to concatenate two strings. For example if

    we have

    Collapse | Copy Code

    CString gray("Gray");

    CString cat("Cat");

    CString graycat = gray + cat;

    is a lot nicer than having to do something like:

    Collapse | Copy Code

    char gray[] = "Gray";

    char cat[] = "Cat";

    char * graycat = malloc(strlen(gray) + strlen(cat) + 1);

    strcpy(graycat, gray);

    strcat(graycat, cat);

    Formatting (including integer-to-CString)

    Rather than using sprintf or wsprintf, you can do formatting for a CString by using the Format

    Toggle Button for Silverlight

    ContentControl that can contain a

    single object of any type (such as a...

    www.viblend.com

    TranSolution

    Professional localization add-in for

    Microsoft Visual Studio. Extracts...

    www.hexadigm.com

    See Also...

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    2/17

    method:

    Collapse | Copy Code

    CString s;

    s.Format(_T("The total is %d"), total);

    The advantage here is that you don't have to worry about whether or not the buffer is large enough to

    hold the formatted data; this is handled for you by the formatting routines.

    Use of formatting is the most common way of converting from non-string data types to a CString, for

    example, converting an integer to a CString:

    Collapse | Copy Code

    CString s;

    s.Format(_T("%d"), total);

    I always use the _T( ) macro because I design my programs to be at least Unicode-aware, but that's a

    topic for some other essay. The purpose of_T( ) is to compile a string for an 8-bit-character

    application as:

    Collapse | Copy Code

    #define _T(x) x // non-Unicode version

    whereas for a Unicode application it is defined as

    Collapse | Copy Code

    #define _T(x) L##x // Unicode version

    so in Unicode the effect is as if I had written

    Collapse | Copy Code

    s.Format(L"%d", total);

    If you ever think you might ever possibly use Unicode, start coding in a Unicode-aware fashion. For

    example, never, ever use sizeof( ) to get the size of a character buffer, because it will be off by a

    factor of 2 in a Unicode application. We cover Unicode in some detail in Win32 Programming. When I

    need a size, I have a macro called DIM, which is defined in a file dim.h that I include everywhere:

    Collapse | Copy Code

    #define DIM(x) ( sizeof((x)) / sizeof((x)[0]) )

    This is not only useful for dealing with Unicode buffers whose size is fixed at compile time, but anycompile-time defined table.

    Collapse | Copy Code

    class Whatever { ... };

    Whatever data[] = {

    { ... },

    ...

    { ... },

    };

    for(int i = 0; i < DIM(data); i++) // scan the table looking for a match

    Beware of those API calls that want genuine byte counts; using a character count will not work.

    Collapse | Copy Code

    TCHAR data[20];lstrcpyn(data, longstring, sizeof(data) - 1); // WRONG!

    lstrcpyn(data, longstring, DIM(data) - 1); // RIGHT

    WriteFile(f, data, DIM(data), &bytesWritten, NULL); // WRONG!

    WriteFile(f, data, sizeof(data), &bytesWritten, NULL); // RIGHT

    This is because lstrcpyn wants a character count, but WriteFile wants a byte count.

    Using _T does notcreate a Unicode application. It creates a Unicode-aware application. When you

    compile in the default 8-bit mode, you get a "normal" 8-bit program; when you compile in Unicode

    Wildcard string compare (globbing)

    Strings UNDOCUMENTED

    Custom MembershipProvider and

    RoleProvider Implementations that

    use Web Services

    Extension Methods to Reverse a

    String and StringBuilder Object

    Html 5 Controls for ASP.Net MVC

    Using POP3 with C# to download and

    parse your mail.

    ASP.NET Providers for the ADO.NET

    Entity Framework

    The Complete Guide to C++ Strings,

    Part II - String Wrapper Classes

    SQL Statement Generator

    Reverse of a string without using the

    Reverse function in C# and VB

    WMI Hardware/Software Enumeration

    Script

    Fast String Sort in C# and F#

    Non-MFC String Class for ATL

    The Daily Insider

    30 free programming books

    Daily News: Signup now.

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    3/17

    mode, you get a Unicode (16-bit-character) application. Note that a CString in a Unicode application is

    a string that holds 16-bit characters.

    Converting a CString to an integer

    The simplest way to convert a CString to an integer value is to use one of the standard string-

    to-integer conversion routines.

    While generally you will suspect that _atoi is a good choice, it is rarely the right choice. If you play to

    be Unicode-ready, you should call the function _ttoi, which compiles into _atoi in ANSI code and

    _wtoi in Unicode code. You can also consider using _tcstoul (for unsigned conversion to any radix,

    such as 2, 8, 10 or 16) or _tcstol (for signed conversion to any radix). For example, here are some

    examples:

    Collapse | Copy Code

    CString hex = _T("FAB");

    CString decimal = _T("4011");

    ASSERT(_tcstoul(hex, 0, 16) == _ttoi(decimal));

    Converting between char * and CString

    This is the most common set of questions beginners have on the CString data type. Due largely to

    serious C++ magic, you can largely ignore many of the problems. Things just "work right". The

    problems come about when you don't understand the basic mechanisms and then don't understand why

    something that seems obvious doesn't work.

    For example, having noticed the above example you might wonder why you can't write

    Collapse | Copy Code

    CString graycat = "Gray" + "Cat";

    or

    Collapse | Copy Code

    CString graycat("Gray" + "Cat");

    In fact the compiler will complain bitterly about these attempts. Why? Because the + operator is defined

    as an overloaded operator on various combinations of the CString and LPCTSTR data types, but not

    between two LPCTSTR data types, which are underlying data types. You can't overload C++ operators

    on base types like int and char, or char *. What willwork is

    Collapse | Copy Code

    CString graycat = CString("Gray") + CString("Cat");

    or even

    Collapse | Copy Code

    CString graycat = CString("Gray") + "Cat";

    If you study these, you will see that the + always applies to at least one CString and one LPCSTR.

    char * to CString

    So you have a char *, or a string. How do you create a CString. Here are some examples:

    Collapse | Copy Code

    char * p = "This is a test"

    or, in Unicode-aware applications

    Collapse | Copy Code

    TCHAR * p = _T("This is a test")

    or

    Collapse | Copy Code

    LPTSTR p = _T("This is a test");

    you can write any of the following:

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    4/17

    Collapse | Copy Code

    CString s = "This is a test"; // 8-bit only

    CString s = _T("This is a test"); // Unicode-aware

    CString s("This is a test"); // 8-bit only

    CSTring s(_T("This is a test"); // Unicode-aware

    CString s = p;CString s(p);

    Any of these readily convert the constant string or the pointer to a CString value. Note that the

    characters assigned are always copied into the CString so that you can do something like

    Collapse | Copy Code

    TCHAR * p = _T("Gray");

    CString s(p);

    p = _T("Cat");

    s += p;

    and be sure that the resulting string is "GrayCat".

    There are several other methods for CString constructors, but we will not consider most of these here;

    you can read about them on your own.

    CString to char * I: Casting to LPCTSTR

    This is a slightly harder transition to find out about, and there is lots of confusion about the "right" way

    to do it. There are quite a few right ways, and probably an equal number of wrong ways.

    The first thing you have to understand about a CString is that it is a special C++ object which contains

    three values: a pointer to a buffer, a count of the valid characters in the buffer, and a buffer length. The

    count of the number of characters can be any size from 0 up to the maximum length of the buffer minus

    one (for the NUL byte). The character count and buffer length are cleverly hidden.

    Unless you do some special things, you know nothing about the size of the buffer that is associated with

    the CString. Therefore, if you can get the address of the buffer, you cannot change its contents. You

    cannot shorten the contents, and you absolutely must not lengthen the contents. This leads to some

    at-first-glance odd workarounds.

    The operator LPCTSTR (or more specifically, the operator const TCHAR *), is overloaded for CString.

    The definition of the operator is to return the address of the buffer. Thus, if you need a string pointer to

    the CString you can do something like

    Collapse | Copy Code

    CString s("GrayCat");

    LPCTSTR p = s;

    and it works correctly. This is because of the rules about how casting is done in C; when a cast is

    required, C++ rules allow the cast to be selected. For example, you could define (float) as a cast on a

    complex number (a pair of floats) and define it to return only the first float (called the "real part") of the

    complex number so you could say

    Collapse | Copy Code

    Complex c(1.2f, 4.8f);float realpart = c;

    and expect to see, if the (float) operator is defined properly, that the value ofrealpart is now 1.2.

    This works for you in all kinds of places. For example, any function that takes an LPCTSTR parameter

    will force this coercion, so that you can have a function (perhaps in a DLL you bought):

    Collapse | Copy Code

    BOOL DoSomethingCool(LPCTSTR s);

    and call it as follows

    Collapse | Copy Code

    CString file("c:\\myfiles\\coolstuff")

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    5/17

    BOOL result = DoSomethingCool(file);

    This works correctly because the DoSomethingCool function has specified that it wants an LPCTSTR

    and therefore the LPCTSTR operator is applied to the argument, which in MFC means that the address of

    the string is returned.

    But what if you want to format it?

    Collapse | Copy Code

    CString graycat("GrayCat");

    CString s;

    s.Format("Mew! I love %s", graycat);

    Note that because the value appears in the variable-argument list (the list designated by "..." in the

    specification of the function)that there is no implicit coercion operator. What are you going to get?

    Well, surprise, you actually get the string

    Collapse | Copy Code

    "Mew! I love GrayCat"

    because the MFC implementers carefully designed the CString data type so that an expression of type

    CString evaluates to the pointer to the string, so in the absence of any casting, such as in a Format

    or sprintf, you will still get the correct behavior. The additional data that describes a CString actually

    lives in the addresses below the nominal CString address.

    What you can'tdo is modify the string. For example, you might try to do something like replace the "."by a "," (don't do it this way, you should use the National Language Support features for decimal

    conversions if you care about internationalization, but this makes a simple example):

    Collapse | Copy Code

    CString v("1.00"); // currency amount, 2 decimal places

    LPCTSTR p = v;

    p[lstrlen(p) - 3] = ',';

    If you try to do this, the compiler will complain that you are assigning to a constant string. This is the

    correct message. It would also complain if you tried

    Collapse | Copy Code

    strcat(p, "each");

    because strcat wants an LPTSTR as its first argument and you gave it an LPCTSTR.

    Don't try to defeat these error messages. You will get yourself into trouble!

    The reason is that the buffer has a count, which is inaccessible to you (it's in that hidden area that sits

    below the CString address), and if you change the string, you won't see the change reflected in the

    character count for the buffer. Furthermore, if the string happens to be just about as long as the buffer

    physical limit (more on this later), an attempt to extend the string will overwrite whatever is beyond the

    buffer, which is memory you have no right to write (right?) and you'll damage memory you don't own.

    Sure recipe for a dead application.

    CString to char * II: Using GetBuffer

    A special method is available for a CString if you need to modify it. This is the operation GetBuffer.

    What this does is return to you a pointer to the buffer which is considered writeable. If you are onlygoing to change characters or shorten the string, you are now free to do so:

    Collapse | Copy Code

    CString s(_T("File.ext"));

    LPTSTR p = s.GetBuffer();

    LPTSTR dot = strchr(p, '.'); // OK, should have used s.Find...

    if(p != NULL)

    *p = _T('\0');

    s.ReleaseBuffer();

    This is the first and simplest use ofGetBuffer. You don't supply an argument, so the default of0 is

    used, which means "give me a pointer to the string; I promise to not extend the string". When you call

    ReleaseBuffer, the actual length of the string is recomputed and stored in the CString. Within the

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    6/17

    scope of a GetBuffer/ReleaseBuffer sequene, and I emphasize this: You Must Not, Ever, Use

    Any Method Of CString on the CString whose buffer you have!The reason for this is that the

    integrity of the CString object is not guaranteed until the ReleaseBuffer is called. Study the code

    below:

    Collapse | Copy Code

    CString s(...);

    LPTSTR p = s.GetBuffer();

    //... lots of things happen via the pointer p

    int n = s.GetLength(); // BAD!!!!! PROBABLY WILL GIVE WRONG ANSWER!!!

    s.TrimRight(); // BAD!!!!! NO GUARANTEE IT WILL WORK!!!!

    s.ReleaseBuffer(); // Things are now OK

    int m = s.GetLength(); // This is guaranteed to be correct

    s.TrimRight(); // Will work correctly

    Suppose you want to actually extend the string. In this case you must know how large the string will

    get. This is just like declaring

    Collapse | Copy Code

    char buffer[1024];

    knowing that 1024 is more than enough space for anything you are going to do. The equivalent in the

    CString world is

    Collapse | Copy Code

    LPTSTR p = s.GetBuffer(1024);

    This call gives you not only a pointer to the buffer, but guarantees that the buffer will be (at least) 1024

    bytes in length.

    Also, note that if you have a pointer to a const string, the string value itself is stored in read-only

    memory; an attempt to store into it, even if you've done GetBuffer, you have a pointer to read-only

    memory, so an attempt to store into the string will fail with an access error. I haven't verified this for

    CString, but I've seen ordinary C programmers make this error frequently.

    A common "bad idiom" left over from C programmers is to allocate a buffer of fixed size, do a sprintf

    into it, and assign it to a CString:

    Collapse | Copy Code

    char buffer[256];

    sprintf(buffer, "%......", args, ...); // ... means "lots of stuff here"

    CString s = buffer;

    while the better form is to do

    Collapse | Copy Code

    CString s;

    s.Format(_T("%....", args, ...);

    Note that this always works; if your string happens to end up longer than 256 bytes you don't clobber

    the stack!

    Another common error is to be clever and realize that a fixed size won't work, so the programmer

    allocates bytes dynamically. This is even sillier:

    Collapse | Copy Code

    int len = lstrlen(parm1) + 13 + lstrlen(parm2) + 10 + 100;

    char * buffer = new char[len];

    sprintf(buffer, "%s is equal to %s, valid data", parm1, parm2);

    CString s = buffer;

    ....

    delete [] buffer;

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    7/17

    Where it can be easily written as

    Collapse | Copy Code

    CString s;

    s.Format(_T("%s is equal to %s, valid data"), parm1, parm2);

    Note that the sprintf examples are not Unicode-ready (although you could use tsprintf and put

    _T() around the formatting string, but the basic idea is still that you are doing far more work than is

    necessary, and it is error-prone.

    CString to char * III: Interfacing to a control

    A very common operation is to pass a CString value in to a control, for example, a CTreeCtrl. While

    MFC provides a number of convenient overloads for the operation, but in the most general situation you

    use the "raw" form of the update, and therefore you need to store a pointer to a string in the TVITEM

    which is included within the TVINSERTITEMSTRUCT:

    Collapse | Copy Code

    TVINSERTITEMSTRUCT tvi;

    CString s;

    // ... assign something to s

    tvi.item.pszText = s; // Compiler yells at you here

    // ... other stuff

    HTREEITEM ti = c_MyTree.InsertItem(&tvi);

    Now why did the compiler complain? It looks like a perfectly good assignment! But in fact if you look at

    the structure, you will see that the member is declared in the TVITEM structure as shown below:

    Collapse | Copy Code

    LPTSTR pszText;

    int cchTextMax;

    Therefore, the assignment is notassigning to an LPCTSTR and the compiler has no idea how to cast the

    right hand side of the assignment to an LPTSTR.

    OK, you say, I can deal with that, and you write

    Collapse | Copy Code

    tvi.item.pszText = (LPCTSTR)s; // compiler still complains!

    What the compiler is now complaining about is that you are attempting to assign an LPCTSTR to an

    LPTSTR, an operation which is forbidden by the rules of C and C++. You may not use this technique to

    accidentally alias a constant pointer to a non-constant alias so you can violate the assumptions of

    constancy. If you could, you could potentially confuse the optimizer, which trusts what you tell it when

    deciding how to optimize your program. For example, if you do

    Collapse | Copy Code

    const int i = ...;

    //... do lots of stuff

    ... = a[i]; // usage 1

    // ... lots more stuff

    ... = a[i]; // usage 2

    Then the compiler can trust that, because you said const, that the value ofi at "usage1" and "usage2"

    is the same value, and it can even precompute the address ofa[i] at usage1 and keep the value

    around for later use at usage2, rather than computing it each time. If you were able to write

    Collapse | Copy Code

    const int i = ...;

    int * p = &i;

    //... do lots of stuff

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    8/17

    ... = a[i]; // usage 1

    // ... lots more stuff

    (*p)++; // mess over compiler's assumption

    // ... and other stuff

    ... = a[i]; // usage 2

    The the compiler would believe in the constancy ofi, and consequently the constancy of the location of

    a[i], and the place where the indirection is done destroys that assumption. Thus, the program would

    exhibit one behavior when compiled in debug mode (no optimizations) and another behavior when

    compiled in release mode (full optimization). This Is Not Good. Therefore, the attempt to assign the

    pointer to i to a modifiable reference is diagnosed by the compiler as being bogus. This is why the

    (LPCTSTR) cast won't really help.

    Why not just declare the member as an LPCTSTR? Because the structure is used both for reading and

    writing to the control. When you are writing to the control, the text pointer is actually treated as an

    LPCTSTR but when you are reading from the control you need a writeable string. The structure cannot

    distinguish its use for input from its use for output.

    Therefore, you will often find in my code something that looks like

    Collapse | Copy Code

    tvi.item.pszText = (LPTSTR)(LPCTSTR)s;

    This casts the CString to an LPCTSTR, thus giving me that address of the string, which I then force to

    be an LPTSTR so I can assign it. Note that this is valid only if you are using the value as data to a

    Set orInsert style method! You cannot do this when you are trying to retrieve data!

    You need a slightly different method when you are trying to retrieve data, such as the value stored in a

    control. For example, for a CTreeCtrl using the GetItem method. Here, I want to get the text of the

    item. I know that the text is no more than MY_LIMIT in size. Therefore, I can write something like

    Collapse | Copy Code

    TVITEM tvi;

    // ... assorted initialization of other fields of tvi

    tvi.pszText = s.GetBuffer(MY_LIMIT);

    tvi.cchTextMax = MY_LIMIT;

    c_MyTree.GetItem(&tvi);

    s.ReleaseBuffer();

    Note that the code above works for any type ofSet method also, but is not needed because for a

    Set-type method (including Insert) you are not writing the string. But when you are writing the

    CString you need to make sure the buffer is writeable. That's what the GetBuffer does. Again, note

    that once you have done the GetBuffer call, you must not do anything else to the CString until the

    ReleaseBuffer call.

    CString to BSTR

    When programming with ActiveX, you will sometimes need a value represented as a type BSTR. A BSTR

    is a counted string, a wide-character (Unicode) string on Intel platforms and can contain embeddedNUL characters.

    You can convert at CString to a BSTR by calling the CString method AllocSysString:

    Collapse | Copy Code

    CString s;

    s = ... ; // whatever

    BSTR b = s.AllocSysString()

    The pointer b points to a newly-allocated BSTR object which is a copy of the CString, including the

    terminal NUL character. This may now be passed to whatever interface you are calling that requires a

    BSTR. Normally, a BSTR is disposed of by the component receiving it. If you should need to dispose of a

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    9/17

    BSTR, you must use the call

    Collapse | Copy Code

    ::SysFreeString(b);

    to free the string.

    The story is that the decision of how to represent strings sent to ActiveX controls resulted in some

    serious turf wars within Microsoft. The Visual Basic people won, and the string type BSTR (acronym for

    "Basic String") was the result.

    BSTR to CString

    Since a BSTR is a counted Unicode string, you can use standard conversions to make an 8-bit CString.

    Actually, this is built-in; there are special constructors for converting ANSI strings to Unicode and

    vice-versa. You can also get BSTRs as results in a "code-string" href="%3Cspan">"#VARIANT to

    CString">VARIANT type, which is a type returned by various COM and Automation calls.

    For example, if you do, in an ANSI application,

    Collapse | Copy Code

    BSTR b;

    b = ...; // whatever

    CString s(b == NULL ? L"" : s)

    works just fine for a single-string BSTR, because there is a special constructor that takes an LPCWSTR

    (which is what a BSTR is) and converts it to an ANSI string. The special test is required because a BSTR

    could be NULL, and the constructors Don't Play Well with NULL inputs (thanks to Brian Ross for pointing

    this out!). This also only works for a BSTR that contains only a single string terminated with a NUL; you

    have to do more work to convert strings that contain multiple NUL characters. Note that embedded NUL

    characters generally don't work well in CStrings and generally should be avoided.

    Remember, according to the rules of C/C++, if you have an LPWSTR it will match a parameter type of

    LPCWSTR (it doesn't work the other way!).

    In UNICODE mode, this is just the constructor

    Collapse | Copy Code

    CString::CString(LPCTSTR);

    As indicated above, in ANSI mode there is a special constructor for

    Collapse | Copy Code

    CString::CString(LPCWSTR);

    this calls an internal function to convert the Unicode string to an ANSI string. (In Unicode mode there

    is a special constructor that takes an LPCSTR, a pointer to an 8-bit ANSI string, and widens it to a

    Unicode string!). Again, note the limitation imposed by the need to test for a BSTR value which is NULL.

    There is an additional problem as pointed out above: BSTRs can contain embedded NUL characters;

    CString constructors can only handle single NUL characters in a string. This means that CStrings will

    compute the wrong length for a string which contains embedded NUL bytes. You need to handle this

    yourself. If you look at the constructors in strcore.cpp, you will see that they all do an lstrlen or

    equivalent to compute the length.

    Note that the conversion from Unicode to ANSI uses the ::WideCharToMultiByte conversion with

    specific arguments that you may not like. If you want a different conversion than the default, you have

    to write your own.

    If you are compiling as UNICODE, then it is a simple assignment:

    Collapse | Copy Code

    CString convert(BSTR b)

    {

    if(b == NULL)

    return CString(_T(""));

    CString s(b); // in UNICODE mode

    return s;

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    7 12/14/2011 0

  • 7/30/2019 CString Management

    10/17

    }

    If you are in ANSI mode, you need to convert the string in a more complex fashion. This will accomplish

    it. Note that this code uses the same argument values to ::WideCharToMultiByte that the implicit

    constructor for CString uses, so you would use this technique only if you wanted to change these

    parameters to do the conversion in some other fashion, for example, specifying a different default

    character, a different set of flags, etc.

    Collapse | Copy Code

    CString convert(BSTR b)

    {

    CString s; if(b == NULL)

    return s; // empty for NULL BSTR

    #ifdef UNICODE

    s = b;

    #else

    LPSTR p = s.GetBuffer(SysStringLen(b) + 1);

    ::WideCharToMultiByte(CP_ACP, // ANSI Code Page

    0, // no flags

    b, // source widecharstring

    -1, // assume NUL-terminated

    p, // target buffer

    SysStringLen(b)+1, // target buffer length

    NULL, // use system default char

    NULL); // don't care if default used

    s.ReleaseBuffer();

    #endif

    return s;

    }

    Note that I do not worry about what happens if the BSTR contains Unicode characters that do not map to

    the 8-bit character set, because I specify NULL as the last two parameters. This is the sort of thing you

    might want to change.

    VARIANT to CString

    Actually, I've never done this; I don't work in COM/OLE/ActiveX where this is an issue. But I saw a

    posting by Robert Quirk on the microsoft.public.vc.mfc newsgroup on how to do this, and it

    seemed silly not to include it in this essay, so here it is, with a bit more explanation and elaboration. Any

    errors relative to what he wrote are my fault.

    A VARIANT is a generic parameter/return type in COM programming. You can write methods that return

    a type VARIANT, and which type the function returns may (and often does) depend on the input

    parameters to your method (for example, in Automation, depending on which method you call,

    IDispatch::Invoke may return (via one of its parameters) a VARIANT which holds a BYTE, a WORD,

    an float, a double, a date, a BSTR, and about three dozen other types (see the specifications of the

    VARIANT structure in the MSDN). In the example below, it is assumed that the type is known to be a

    variant of type BSTR, which means that the value is found in the string referenced by bstrVal. This

    takes advantage of the fact that there is a constructor which, in an ANSI application, will convert a value

    referenced by an LPCWCHAR to a CString (see "code-string" href="%3Cspan">"#BSTR to

    CString">BSTR-to-CString). In Unicode mode, this turns out to be the normal CString constructor.

    See the caveats about the default ::WideCharToMultibyte conversion and whether or not you find

    these acceptable (mostly, you will).

    Collapse | Copy Code

    VARIANT vaData;

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    11/17

    vaData = m_com.YourMethodHere();

    ASSERT(vaData.vt == VT_BSTR);

    CString strData(vaData.bstrVal);

    Note that you could also make a more generic conversion routine that looked at the vt field. In this

    case, you might consider something like:

    Collapse | Copy Code

    CString VariantToString(VARIANT * va)

    {

    CString s;

    switch(va->vt){ /* vt */

    case VT_BSTR:

    return CString(vaData->bstrVal);

    case VT_BSTR | VT_BYREF:

    return CString(*vaData->pbstrVal);

    case VT_I4:

    s.Format(_T("%d"), va->lVal);

    return s;

    case VT_I4 | VT_BYREF:

    s.Format(_T("%d"), *va->plVal);

    case VT_R8:

    s.Format(_T("%f"), va->dblVal);

    return s;

    ... remaining cases left as an Exercise For The Reader

    default:

    ASSERT(FALSE); // unknown VARIANT type (this ASSERT is optional)

    return CString("");

    } /* vt */

    }

    Loading STRINGTABLE values

    If you want to create a program that is easily ported to other languages, you must not include native-

    language strings in your source code. (For these examples, I'll use English, since that is my native

    language (aber Ich kann ein bischen Deutsch sprechen). So it is verybad practice to write

    Collapse | Copy Code

    CString s = "There is an error";

    Instead, you should put all your language-specific strings (except, perhaps, debug strings, which are

    never in a product deliverable). This means that is fine to write

    Collapse | Copy Code

    s.Format(_T("%d - %s"), code, text);

    in your program; that literal string is not language-sensitive. However, you must be verycareful to not

    use strings like

    Collapse | Copy Code

    // fmt is "Error in %s file %s"

    // readorwrite is "reading" or "writing"

    s.Format(fmt, readorwrite, filename);

    I speak of this from experience. In my first internationalized application I made this error, and in spite ofthe factthat I know German, and that German word order places the verb at the end of a sentence, I

    had done this. Our German distributor complained bitterly that he had to come up with truly weird error

    messages in German to get the format codes to do the right thing. It is much better (and what I do

    now) to have two strings, one for reading and one for writing, and load the appropriate one, making

    them string parameter-insensitive, that is, instead of loading the strings "reading" or "writing", load

    the whole format:

    Collapse | Copy Code

    // fmt is "Error in reading file %s"

    // "Error in writing file %s"

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    12/17

    s.Format(fmt, filename);

    Note that if you have more than one substitution, you should make sure that if the word order of the

    substitutions does not matter, for example, subject-object, subject-verb, or verb-object, in English.

    For now, I won't talk about FormatMessage, which actually is better than sprintf/Format, but is

    poorly integrated into the CString class. It solves this by naming the parameters by their position in

    the parameter list and allows you to rearrange them in the output string.

    So how do we accomplish all this? By storing the string values in the resource known as the

    STRINGTABLE in the resource segment. To do this, you must first create the string, using the Visual

    Studio resource editor. A string is given a string ID, typically starting IDS_. So you have a message,you create the string and call it IDS_READING_FILE and another called IDS_WRITING_FILE . They

    appear in your .rc file as

    Collapse | Copy Code

    STRINGTABLE

    IDS_READING_FILE "Reading file %s"

    IDS_WRITING_FILE "Writing file %s"

    END

    Note: these resources are always stored as Unicode strings, no matter what your program is compiled

    as. They are even Unicode strings on Win9x platforms, which otherwise have no real grasp of Unicode

    (but they do for resources!). Then you go to where you had stored the strings

    Collapse | Copy Code

    // previous code

    CString fmt;

    if(...)

    fmt = "Reading file %s";

    else

    fmt = "Writing file %s";

    ...

    // much later

    CString s;

    s.Format(fmt, filename);

    and instead do

    Collapse | Copy Code

    // revised code

    CString fmt;

    if(...)

    fmt.LoadString(IDS_READING_FILE);

    else

    fmt.LoadString(DS_WRITING_FILE);

    ...

    // much later

    CString s;

    s.Format(fmt, filename);

    Now your code can be moved to any language. The LoadString method takes a string ID and

    retrieves the STRINGTABLE value it represents, and assigns that value to the CString.

    There is a clever feature of the CString constructor that simplifies the use ofSTRINGTABLE entries. It

    is not explicitly documented in the CString::CString specification, but is obscurely shown in the

    example usage of the constructor! (Why this couldn't be part of the formal documentation and has to be

    shown in an example escapes me!). The feature is that if you cast a STRINGTABLE ID to an LPCTSTR it

    will implicitly do a LoadString. Thus the following two examples of creating a string value produce the

    same effect, and the ASSERT will not trigger in debug mode compilations:

    Collapse | Copy Code

    CString s;

    s.LoadString(IDS_WHATEVER);

    CString t( (LPCTSTR)IDS_WHATEVER);

    ASSERT(s == t);

    Now, you may say, how can this possibly work? How can it tell a valid pointer from a STRINGTABLE ID?

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    13/17

    Simple: all string IDs are in the range 1..65535. This means that the high-order bits of the pointer will

    be 0. Sounds good, but what if I have valid data in a low address? Well, the answer is, you can't. The

    lower 64K of your address space will never, ever, exist. Any attempt to access a value in the address

    range 0x00000000 through 0x0000FFFF (0..65535) will always and forever give an access fault. These

    addresses are never, ever valid addresses. Thus a value in that range (other than 0) must necessarily

    represent a STRINGTABLE ID.

    I tend to use the MAKEINTRESOURCE macro to do the casting. I think it makes the code clearer

    regarding what is going on. It is a standard macro which doesn't have much applicability otherwise in

    MFC. You may have noted that many methods take either a UINT or an LPCTSTR as parameters, using

    C++ overloading. This gets us around the ugliness of pure C where the "overloaded" methods (which

    aren't really overloaded in C) required explicit casts. This is also useful in assigning resource names to

    various other structures.

    Collapse | Copy Code

    CString s;

    s.LoadString(IDS_WHATEVER);

    CString t( MAKEINTRESOURCE(IDS_WHATEVER));

    ASSERT(s == t);

    Just to give you an idea: I practice what I preach here. You will rarely if ever find a literal string in my

    program, other than the occasional debug output messages, and, of course, any language-independent

    string.

    CStrings and temporary objects

    Here's a little problem that came up on the microsoft.public.vc.mfc newsgroup a while ago. I'llsimplify it a bit. The basic problem was the programmer wanted to write a string to the Registry. So he

    wrote:

    I am trying to set a registry value using RegSetValueEx() and it is the value that I am having trouble

    with. If I declare a variable ofchar[] it works fine. However, I am trying to convert from a CString

    and I get garbage. "..." to be exact. I have tried GetBuffer, typecasting to char*,

    LPCSTR. The return ofGetBuffer (from debug) is the correct string but when I assign it to a char*

    (or LPCSTR) it is garbage. Following is a piece of my code:

    Collapse | Copy Code

    char* szName = GetName().GetBuffer(20);

    RegSetValueEx(hKey, "Name", 0, REG_SZ,

    (CONST BYTE *) szName,

    strlen (szName + 1));

    The Name string is less then 20 chars long, so I don't think the GetBuffer parameter is to blame. It is

    very frustrating and any help is appreciated.

    Dear Frustrated,

    You have been done in by a fairly subtle error, caused by trying to be a bit too clever. What happened

    was that you fell victim to knowing too much. The correct code is shown below:

    Collapse | Copy Code

    CString Name = GetName();

    RegSetValueEx(hKey, _T("Name"), 0, REG_SZ,

    (CONST BYTE *) (LPCTSTR)Name,

    (Name.GetLength() + 1) * sizeof(TCHAR));

    Here's why my code works and yours didn't. When your function GetName returned a CString, it

    returned a "temporary object". See the C++ Reference manual 12.2.

    In some circumstances it may be necessary or convenient for the compiler to generate a temporary

    object. Such introduction of temporaries is implementation dependent. When a compiler introduces a

    temporary object of a class that has a constructor it must ensure that a construct is called for the

    temporary object. Similarly, the destructor must be called for a temporary object of a class where a

    destructor is declared.

    The compiler must ensure that a temporary object is destroyed. The exact point of destruction is

    implementation dependent....This destruction must take place before exit from the scope in which the

    temporary is created.

    Most compilers implement the implicit destructor for a temporary at the next program sequencing point

    following its creation, that is, for all practical purposes, the next semicolon. Hence the CString existed

    when the GetBuffer call was made, but was destroyed following the semicolon. (As an aside, there was

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    14/17

    no reason to provide an argument to GetBuffer, and the code as written is incorrect since there is no

    ReleaseBuffer performed). So what GetBuffer returned was a pointer to storage for the text of the

    CString. When the destructor was called at the semicolon, the basic CString object was freed, along

    with the storage that had been allocated to it. The MFC debug storage allocator then rewrites this freed

    storage with 0xDD, which is the symbol "". By the time you do the write to the Registry, the string

    contents have been destroyed.

    There is no particular reason to need to cast the result to a char * immediately. Storing it as a CString

    means that a copy of the result is made, so after the temporary CString is destroyed, the string still

    exists in the variable's CString. The casting at the time of the Registry call is sufficient to get the value

    of a string which already exists.

    In addition, my code is Unicode-ready. The Registry call wants a byte count. Note also that the call

    lstrlen(Name+1) returns a value that is too small by 2 for an ANSI string, since it doesn't start until the

    second character of the string. What you meant to write was lstrlen(Name) + 1 (OK, I admit it, I've

    made the same error!). However, in Unicode, where all characters are two bytes long, we need to cope

    with this. The Microsoft documentation is surprisingly silent on this point: is the value given for REG_SZ

    values a byte count or a character count? I'm assuming that their specification of "byte count" means

    exactly that, and you have to compensate.

    CString Efficiency

    One problem of CString is that it hides certain inefficiencies from you. On the other hand, it also means

    that it can implement certain efficiencies. You may be tempted to say of the following code

    Collapse | Copy Code

    CString s = SomeCString1;s += SomeCString2;

    s += SomeCString3;

    s += ",";

    s += SomeCString4;

    that it is horribly inefficient compared to, say

    Collapse | Copy Code

    char s[1024];

    lstrcpy(s, SomeString1);

    lstrcat(s, SomeString2);

    lstrcat(s, SomeString 3);

    lstrcat(s, ",");

    lstrcat(s, SomeString4);

    After all, you might think, first it allocates a buffer to hold SomeCString1, then copies SomeCString1 to

    it, then detects it is doing a concatenate, allocates a new buffer large enough to hold the current string

    plus SomeCString2, copies the contents to the buffer and concatenates the SomeCString2 to it, then

    discards the first buffer and replaces the pointer with a pointer to the new buffer, then repeats this for

    each of the strings, being horribly inefficient with all those copies.

    The truth is, it probably never copies the source strings (the left side of the +=) for most cases.

    In VC++ 6.0, in Release mode, all CString buffers are allocated in predefined quanta. These are defined

    as 64, 128, 256, and 512 bytes. This means that unless the strings are very long, the creation of the

    concatenated string is an optimized version of a strcat operation (since it knows the location of the end

    of the string it doesn't have to search for it, as strcat would; it just does a memcpy to the correct place)

    plus a recomputation of the length of the string. So it is about as efficient as the clumsier pure-C code,

    and one whole lot easier to write. And maintain. And understand.

    Those of you who aren't sure this is what is really happening, look in the source code for CString,

    strcore.cpp, in the mfc\srcsubdirectory of your vc98 installation. Look for the method ConcatInPlacewhich is called from all the += operators.

    Aha! So CString isn't really "efficient!" For example, if I create

    Collapse | Copy Code

    CString cat("Mew!");

    then I don't get a nice, tidy little buffer 5 bytes long (4 data bytes plus the terminal NUL). Instead the

    system wastes all that space by giving me 64 bytes and wasting 59 of them.

    If this is how you think, be prepared to reeducate yourself. Somewhere in your career somebody taught

    you that you always had to use as little space as possible, and this was a Good Thing.

    This is incorrect. It ignores some seriously important aspects of reality.

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    15/17

    If you are used to programming embedded applications with 16K EPROMs, you have a particular mindset

    for doing such allocation. For that application domain, this is healthy. But for writing Windows

    applications on 500MHz, 256MB machines, it actually works against you, and creates programs that

    perform far worse than what you would think of as "less efficient" code.

    For example, size ofstrings is thought to be a first-order effect. It is Good to make this small, and Bad

    to make it large. Nonsense. The effect of precise allocation is that after a few hours of the program

    running, the heap is cluttered up with little tiny pieces of storage which are useless for anything, but

    they increase the storage footprint of your application, increase paging traffic, can actually slow down

    the storage allocator to unacceptable performance levels, and eventually allow your application to grow

    to consume all of available memory. Storage fragmentation, a second-order or third-order effect,

    actually dominates system performance. Eventually, it compromises reliability, which is completelyunacceptable.

    Note that in Debug mode compilations, the allocation is always exact. This helps shake out bugs.

    Assume your application is going to run for months at a time. For example, I bring up VC++, Word,

    PowerPoint, FrontPage, Outlook Express, Fort Agent, Internet Explorer, and a few other applications,

    and essentially never close them. I've edited using PowerPoint for days on end (on the other hand, if

    you've had the misfortune to have to use something like Adobe FrameMaker, you begin to appreciate

    reliability; I've rarely been able to use this application without it crashing four to six times a day! And

    always because it has run out of space, usually by filling up my entire massive swap space!) Precise

    allocation is one of the misfeatures that will compromise reliability and lead to application crashes.

    By making CStrings be multiples of some quantum, the memory allocator will end up cluttered with

    chunks of memory which are almost always immediately reusable for another CString, so the

    fragmentation is minimized, allocator performance is enhanced, application footprint remains almost assmall as possible, and you can run for weeks or months without problem.

    Aside: Many years ago, at CMU, we were writing an interactive system. Some studies of the storage

    allocator showed that it had a tendency to fragment memory badly. Jim Mitchell, now at Sun

    Microsystems, created a storage allocator that maintained running statistics about allocation size, such

    as the mean and standard deviation of all allocations. If a chunk of storage would be split into a size that

    was smaller than the mean minus one s than the prevailing allocation, he didn't split it at all, thus

    avoiding cluttering up the allocator with pieces too small to be usable. He actually used floating point

    inside an allocator! His observation was that the long-term saving in instructions by not having to ignore

    unusable small storage chunks far and away exceeded the additional cost of doing a few floating point

    operations on an allocation operation. He was right.

    Never, ever think about "optimization" in terms of small-and-fast analyzed on a per-line-of-code basis.

    Optimization should mean small-and-fast analyzed at the complete application level (if you like New Age

    buzzwords, think of this as the holistic approach to program optimization, a whole lot better than the

    per-line basis we teach new programmers). At the complete application level, minimum-chunk string

    allocation is about the worst method you could possibly use.

    If you think optimization is something you do at the code-line level, think again. Optimization at this

    level rarely matters. Read my essay on Optimization: Your Worst Enemy for some thought-provoking

    ideas on this topic.

    Note that the += operator is special-cased; if you were to write:

    Collapse | Copy Code

    CString s = SomeCString1 + SomeCString2 + SomeCString3 + "," + SomeCString4;

    then each application of the + operator causes a new string to be created and a copy to be done

    (although it is an optimized version, since the length of the string is known and the inefficiencies of

    strcat do not come into play).

    Summary

    These are just some of the techniques for using CString. I use these every day in my programming.

    CString is not a terribly difficult class to deal with, but generally the MFC materials do not make all of

    this apparent, leaving you to figure it out on your own.

    Acknowledgements

    Special thanks to Lynn Wallace for pointing out a syntax error in one of the examples, Brian Ross for his

    comments on BSTR conversions, and Robert Quirk for his example of VARIANT-to-BSTR conversion.

    The views expressed in these essays are those of the author, and in no way represent, nor are they

    endorsed by, Microsoft.

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    16/17

    Article Top Sign Up to vote Poor Excellent Vote

    Send mail to [email protected] questions or comments about this web site. Copyright

    1999 CompanyLongName All Rights Reserved. www.flounder.com/mvp_tips.htm

    License

    This article, along with any associated source code and files, is licensed under The Code Project Open

    License (CPOL)

    About the Author

    Joseph M.

    Newcomer

    United States

    Member

    Comments and Discussions

    You must Sign In to use this message board. (secure sign-in)

    FAQ Search

    Profile popups Noise level Medium Layout Normal Per page 25 Update

    Refresh First Prev Next

    boz 6:14 7 Oct '10

    Parker M cCauley 23:10 6 Feb '10

    sealplusplus

    16:05 4 Feb '10

    Member 1737504 1:07 30 Dec '08

    Joseph M. Newcomer 5:39 30 Dec '08

    Anandi.VC 0:58 4 Aug '08

    Joseph M. Newcomer 3:29 4 Aug '08

    Anandi.VC 3:04 5 Aug '08

    Joseph M. Newcomer 4:11 5 Aug '08

    anitaj 22:59 10 Jun '08

    Joseph M. Newcomer 5:19 11 Jun '08

    anitaj 16:36 11 Jun '08

    Joseph M. Newcomer 17:07 11 Jun '08

    anitaj 17:31 11 Jun '08

    Joseph M. Newcomer3:26 12 Jun '08

    Joseph M. Newcomer3:31 12 Jun '08

    anitaj18:33 16 Jun '08

    svansickle 13:55 16 Sep '08

    Joseph M. Newcomer 12:35 21 Sep '08

    svansickle 4:19 22 Sep '08

    Joseph M. Newcomer 5:14 22 Sep '08

    fahadkhowaja 0:44 28 May '08

    Joseph M. Newcomer 5:57 28 May '08

    Do I always have to call ReleaseBuffer after GetBuffer?

    Thank You,,,

    Just say hello and thank you for the article CString management

    Problem passing textbox text to another char var

    Re: Problem passing textbox text to another char var

    char array length

    Re: char array length

    Re: char array length

    Re: char array length

    Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query

    statement

    Re: Error while using CString to form a query

    statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Re: Error while using CString to form a query statement

    Assert failes for afxCurrentResourceHandle

    Re: Assert failes for afxCurrentResourceHandle

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm

    17 12/14/2011 0

  • 7/30/2019 CString Management

    17/17

    Permalink | Advertise | Privacy | Mobile

    Web22 | 2.5.111208.1 | Last Updated 18 May 2000

    Article Copyright 2000 by Joseph M. Newcomer

    Everything else Copyright CodeProject, 1999-2011

    Terms of Use

    Khathar 0:02 16 Feb '08

    Joseph M. Newcomer 3:18 16 Feb '08

    Last Visit: 19:00 31 Dec '99 Last Update: 14:10 14 Dec '11 1 2 3 4 5 6 7 Next

    General News Suggestion Question Bug Answer Joke Rant Admin

    Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

    Layout: fixed | fluid

    how to conert Integer into unsigned char?

    Re: how to conert Integer into unsigned char?

    ng Management - CodeProject http://www.codeproject.com/KB/string/cstringm