Monday, 6 August 2012

Special Characters In Programming

Computer programming and markup code is naturally written with alphanumeric and other characters. However, within each language or technology, there are normally a set of characters that have a specific meaning or usage. This means that special consideration has to be given when those characters are used, for example when text strings are output for reading by users. The special characters for particular languages vary, as do techniques for handling them.

Coding

Whether a language is for programming or markup, there are typically always characters that cannot be used in the normal way. Those characters that have a specific meaning separate to their mainstream typographic use are generally considered special characters.

For example, within HTML, the tags surrounding Web page content use the "less than" and "greater than" characters:

<p>

This means that if those characters are to be used within an HTML page, they need to be included using their ASCII or ISO codes. For "less than" there are two options:

&#60; &lt;

Problems

When coding is carried out without sufficient handling of special characters, a number of problems may arise. For programming languages, code may not compile, and if it does there may be runtime errors.

For Web coding such as HTML markup, pages may not be rendered correctly, and their appearance may differ across Web browsers. Tools used to validate HTML and other Web markup to check whether it is well-formed will also show errors if special characters are not included properly.

Types

The types of special character depend on the language being used. In general, punctuation and non-alphanumeric characters may need special handling. There may also be characters that have a specified meaning within a particular language. For example, in many languages it is common to include text strings within quotation marks, for example in PHP:

$text="Here is some text";

If the text string also contains a quotation mark, this may cause a problem, as the compiling program will interpret the second quotation character it encounters as the end of the text string, preventing it from being able to process the code that follows.

Coping

The main technique for coping with special characters is to use an "escape" function. This is a shorthand syntax method letting the compiler know that a character is not being treated as it would be normally. For example, to include quotation marks in the PHP string:

$text="Here is \"some\" text";

The backslash is a common escape character used in many languages. The escape character is typically included immediately prior to the special character, preventing the compilation process from going wrong when it encounters the character in question.

Links

Special Characters | Webmonkey
Common HTML Validation Problems
Characters (The Java Tutorials > Learning the Java Language > Numbers and Strings)
PHP: htmlspecialchars - Manual
Special Characters - Microsoft

No comments:

Post a Comment