Why the use of quotation marks in programming languages is a bad idea

Quotation marks are widely used in many programming languages for a multitude of reasons.

In most programming languages, quotation marks are used for representing strings and also for characters.  In SQL, quotation marks are also used to allow irregular identifiers – words which are keywords in SQL in table and column names. Quotation marks are so widespread; it’s hard to imagine a programming language that doesn’t use them.

But sadly, a feature (flaw?) in the design of the keyboard means that we use the same key for the opening and quotation mark. And this is what makes quotation marks evil for me. Imagine for example, if the founders of html had decided to use quotation marks to identify tags:

Here’s what an HTML document would look like.

“html”
“head”
“title” My title “/title”
“/head"
“body”
“p”nospacestodifferentiatetagsfromcontent”/p”
“p” This is my body “/p”
This makes  parsing even more difficult.
“/body”
“/html”

Now imagine if someone wanted to write a regular expression to get all the tags:

For the actual html tags: the work is easy. All you have to do is match the html stored in a string with the regular expression

^<([^/][a-z] [a-z0-9]*?)//?$

Let’s break it down into parts to better understand the logic behind the regex

^< //start tag
(
[^/]  //don’t include the closing tag
[a-z]  //html tags start with a character
[a-z0-9]*? //tag may have letters and numbers later
)
//? //For singleton tags like <br/>
> //end tag
$

A more practical regular expression would obviously be more complex but bear with me for a moment.

Imagine what you would have to do if html used quotation marks. The regular expression would become much more complex. So complex that I don’t know how to write it with my current knowledge of regular expressions.

The problem grows even worse if we don’t have valid XHTML. XML would not have been a quarter of the standard it has become if it used quotation marks instead of the less than and greater than signs.

If we had separate keys for the opening and closing quotation marks on the keyboard, the use of quotation marks in programming would have been a much better idea.

It’s a difficult reality to grasp because we are so used to the software making sure that quotation marks are matched for us. To fully understand this, try to imagine your keyboard with a single key for the opening and closing braces.

Word processors, IDEs and almost every software that involves typing would make sure that every opening brace matches a corresponding closing brace but with a consequence. They would suck at nesting braces.

The designers of the keyboard had to provide different opening and closing braces because they are nested commonly in algebra. If we needed to type only in English, we wouldn’t need separate opening and closing braces(Unless you have swag (#yolo right?)).

The problem of quotation marks is already prevalent in the programming languages that we use. We can’t have strings within strings in most programming languages.

Many programmers spend hours and hours trying to figure out what combination works. Do single quotes within double quotes work? Do double quotes within single quotes work? Do quotes within quotes within quotes work?

It is very difficult to get rid of a time-honored tradition. But when tradition challenges rationality, we need to change. We need a different combination of delimiters for strings, and for all the purposes where quotation marks are used; a combination which has a separate opening and closing key on the keyboard.

It is very important to tackle this question of what combination of delimiters should be chosen with care because we have limited choices and most programming languages have already called dibs on them. We can use:

  1. Keys which are found in pairs on the keyboard {}, [], () , /\, <>
  2. One key as starting delimeter and another one as ending delimiter. ^$(as used in regular expressions), ^~ etc.

In any case, the world would be a better place if the creator of a programming language read this article and used something other than quotation marks to delimit strings.

Change won’t come suddenly. I like how Microsoft has handled the problem with T-SQL. Square brackets are the convention but quotation marks are allowed too. Maybe someday in the near future, square brackets will become the SQL standard. I will be very happy when that day comes.

And if I ever write a programming language of my own, I promise to do the same, use something different but allow quotation marks for the people who are not used to change – Backward compatibility for a better world.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: