A Programming Primer for Counting and Other Unconventional Tasks

Strings

Letters, words, paragraphs, books. And quotes.

In the numbers chapter, we learned how Ruby deals with numbers as Fixnum and Float objects.

For text, whether it be a single letter or a book, Ruby has the String class. This is an entirely different data object that won't mix with the data objects for numbers.

Text, words, characters

Strings are a sequence of characters denoted by single or double quotes. Here are some examples of strings:

  • "a"
  • "puts"
  • "John's book"
  • "12+100"
  • 'To be or not to be, that is the question...'

A string can contain just a single character, a few words, or the entire works of Shakespeare.

In fact, they can contain numbers themselves:

"42"

And these quoted numbers may look like numbers, but they are not. Try adding these together:

"42" + 42
#=> TypeError: can't convert Fixnum into String
    

Those quotation marks are vital. They tell the Ruby interpreter "Please treat these characters as a string and not a Ruby command"


puts "Please treat these characters as a string and not a Ruby command"
#=> Please treat these characters as a string and not a Ruby command

puts Please treat these characters as a string and not a Ruby command
#=> NameError: undefined local variable or method `string' for #<Object:0x1001dd2a0>    

Either double or single quotes can be used to denote a string:


puts 100 + 10   #=>   110
puts "100 + 10"   #=>   100 + 10
puts "puts"      #=>   puts
puts puts      #   [a blank line] This is a double-called of the puts method, but no string was passed in to print out
      

Close your quotes

One of the most common show-stopping and confusing mistakes you'll make as a novice is not wrapping your strings in matching quotes. This is a very common typo to make early on.

If you start a string with a single quote ', you must end it with a single quote. If you start with ", end with ".

Here's the mistake in action; copy-and-paste this into irb to try it out:


puts "The following code is Dan's special math formula:
puts 42* 6+ 80 - 100^100*3.141
   

The programmer intends for the program to first print out a descriptive sentence and then output the result of a mathematical equation. But because he didn't close the string in the first line with a ", Ruby interprets the the second line is part of the initial string.

Try it out for yourself in irb. Until you put in that closing double-quote, the Ruby interpreter won't do anything no matter how many other characters you type in.

Students will sometimes miss the closing quote mark when copying code for practice. They then will complain their program doesn't do anything when they hit the Enter key. Without that closing quote, Ruby will interpret that Enter (carriage return/line break) character as just another part of the string, instead of an actual command.

This is another reason why we use a specialized text-editor such as TextWrangler or SciTE. For easy readability, these text editors color the content of the strings differently than the code:

String operations and methods

You can add strings together – this is sometimes referred to as concatenation. You can't subtract or multiply strings together:


puts "abc" + "def"   #=> "abcdef"
      

As I've mentioned in the numbers chapter, all the Ruby classes of data – such as Fixnum and Float – have a set of methods that can be called to do something, usually upon the object that invokes it.

As with numbers, we invoke a particular string's method with the dot operator .


puts (100.99).round 
#=> 101

puts "the quick brown fox jumps over the lazy dog".capitalize
#=> "The quick brown fox jumps over the lazy dog" 

Here are a few of the methods; the length method, which returns the number of characters in the string, will be one of the more frequently used String methods:


"abc".upcase   #=> "ABC"
"DEF".downcase   #=> "def"
"abcdef".reverse   #=> "fedcba"
"ABCdef".capitalize   #=> "Abcdef"
"dog park".length   #=> 8            
      

You can see a list of all String methods in the Ruby documentation.

Mixing datatypes

So a word wrapped in quotes is a String. What is a number wrapped in quotes? It is also a String.

What happens when you add together two numbers that have been wrapped in quotes?


puts 2 + 2   #   4
puts "2" + "2"   #   "22"
         

If this is confusing, your problem is that you are still thinking of "2" as a number. Ruby does not think that. It sees the quote marks and treats the "2" as it would treat any other quote-wrapped character:


puts "2" + "abc"   #   "2abc"
         
Adding a string to a number

Try adding a number to a String, like this:

"1" + 1

You should get an error. This is because the Ruby interpreter can't make sense of what you intend to do. Nor would a human being, if you asked her: "What's the sum of 42 and the letter D?"

We saw in the numbers chapter that Ruby can mix integers and decimals, even though they are technically different datatypes. But for some (actually, most) cases, Ruby won't allow you to mix objects of different classes.

Conversion methods

There are times when you'll take in input from an outside source – such as scraping from a webpage or taking in user input – and then need to combine it with existing data. You can to convert it to a common datatype in order to avoid errors.

Ruby gives us several methods to convert data:


"42".to_i + 42   #=> 84
"42" + 42.to_s   #=> "4242"
         

We used to_i in the previous chapter to turn a Float into an integer. You can guess what to_s stands for. You'll have to decide what's the appropriate conversion, or rather, if you should be converting anything at all.

Sometimes it's better to raise an error and stop the program. Unexpected types of input can indicate that there is a design flaw that should be fixed rather than let the program go on happily.

Classes and the class method

How can we tell how Ruby classifies 42. Or "Mexico"?

Every Ruby object has a class method that returns the Class of that object.


42.class   #   Fixnum
42.0.class   #   Float 
"42".class   #   String
String.class   #   Class         
Fixnum.class   #   Class
            

This is dipping our toes into object-oriented programming (which I cover in the Supplementals section). The last two lines in the above code make an important point: String, which is the class of "42", is itself an object with a class. That object's class is Class. Likewise, the number 42's class is Fixnum, and Fixnum's class is Class. This is getting more meta than we need to, but didn't you want to see if every Ruby object really has its own class?

And once again, don't mix datatypes:

            
"hello".class + "world"   #=> [e.g. String + "world" – two different classes]
#=> TypeError: can't convert Class into String               
            

The main takeaway is that, yes, a class appears to be just a name describing something's datatype. But a Class object, as we'll see later, is the name of the data structure that defines how a datatype behaves. This is why "42" – a String – will have different properties and methods (such as upcase) than 42, which belongs to the Fixnum class.

Combining strings with interpolation

Combining strings with non-strings using the plus sign + can be hard to read, and prone to error:

puts "The result of 7 + 7 is " + (7 + 7).to_s   
#=>   The result of 7 + 7 is 14

It's very easy to forget to miss a plus sign or closing quote when combining strings. Luckily, Ruby has a special notation that allows us to evaluate Ruby code and output the result into a Stringwithin that String:


puts "The result of 7 + 7 is #{7+7}"   
#=>   The result of 7 + 7 is 14

puts "#{10 * 10} is greater than #{9 * 11}"   
#=>   100 is greater than 99
   

Notice how the expressions inside the #{ } are evaluated before being included in the string.

Two requirements here:

  • The string must be enclosed in double-quotes
  • Use a pound sign # followed by curly braces {} to enclose the Ruby code.
Exercise: String interpolation

Write the following strings using interpolation:

  • "1 + 1 is: " + (1+1).to_s
  • "There were 12 cases of a dozen eggs each (" + (12 * 12).to_s + ")"
  • "His name is " + "jon".capitalize
Solution
  • "1 + 1 is: #{1+1}"
  • "There were 12 cases of a dozen eggs each (#{12 * 12})"
  • "His name is #{"jon".capitalize}"

Earlier in this chapter, I said anything within quotes is treated by Ruby as just text characters. But the #{} notation essentially creates a Ruby code-interpreting environment within the curly braces.

So yes, this means you can put strings inside a string:


puts "No interpolation here.upcase"   
#=> No interpolation here.upcase

puts "This is interpolation #{"here".upcase}"   
#=> This is interpolation HERE

puts "This is not useful interpolation #{"here.upcase"}"   
#=> This is not useful interpolation here.upcase

puts "This is an error #{here.upcase}"   
#=>   ERROR 
   

The last two puts statements didn't work quite as expected. Why?

In line 3, the "here.upcase" inside the curly brackets is just one String. A String, when evaluated, is just a String.

In line 4, Ruby sees here as the name of a Ruby object, but since here does not actually exist as such, we get an error.

Exercise: String interpolation II

Convert the following set of string concatenations and conversions to just a single string using interpolation:


puts "The answer to 12387 * 345 is: " + (12387 * 345).to_s + " Which is a " + "very".upcase + " big number! Even bigger than 42 * 98 * 12, which is: " + (42 * 98 * 12).to_s   
Solution

puts "The answer to 12387 * 345 is: #{12387 * 345} Which is a #{"very".upcase} big number! Even bigger than 42 * 98 * 12, which is: #{42 * 98 * 12}"
    
#=> The answer to 12387 * 345 is: 4273515 Which is a VERY big number! Even bigger than 42 * 98 * 12, which is: 49392     

"Escape" with backslash

If quote marks are special Ruby notation for starting and ending a String, what happens when we want regular quotation marks as part of the text?

In the context of a String, the backslash character \ tells Ruby that the character immediately following it is special. The backslash is frequently referred to as the escape character because it...escapes (I guess?) a character from being interpreted as normal.


"He asked Jeeves, \"Where is the local patisserie?\" To which Jeeves responded, \"Over hither.\""

So in the above example, the \ tells Ruby that the following " is not to be treated as a quotation mark typically are: as a symbol that denotes the beginning or end of a string. This means, then, that \" is treated just as any other character in a string. Thus, the quoted String continues on.

Literally!

Rather than think of the \" as representing a "normal" quotation mark, the more technical term is: a literal quotation mark. It literally is just a quotation mark that does nothing but catch your eye. A non-literal quotation mark is the kind that has the special function of denoting a string.

I will try to stick to this terminology throughout the book, even though the overuse of "literal" and "literally" in everyday conversation and rhetoric needs to stop, literally.

Also, numbers are sometimes referred to as literals. Because 9 is, well, literally the number that represents the value of 9. Consequently, "9" is not a literal 9.

The newline character

Inside a string, the character n is just a literal n. However, prepending a backslash like so – \n – tells Ruby to insert a newline character there:


puts "Doe, a deer, a female deer.\nRay, a piece of the\nsun."

#=> Doe, a deer, a female deer.
#=> Ray, a piece of the
#=> sun.
   

Note that in the sequence "the\nsun", only the n has its meaning affected by the backslash. The characters preceding the backslash – the – and the characters following the n – i.e. sunremain literal.

You'll use the backslash character so often that you'll quickly memorize this list of common uses:

\n
a newline
\t
a tab space
\"
a literal quotation mark
\'
a literal apostrophe
\\
a literal backslash
Stick with double quotation marks

As with string interpolation, Ruby will only interpret most of the above uses of backslashes inside strings with double quotes. Single-quoted strings will only make use of \\ and \'

So as you begin to learn Ruby, I recommend always using double-quoted strings, just so you avoid bugs in which you wrongly assume an escape sequence to work in a single-quoted string. There's a extremely slight performance uptick in using single-quoted strings because Ruby doesn't have to check for any interpolation or all the other backslash combinations. But it'll be awhile before you're writing a program where that fraction of a fraction of a second is noticeable.

Exercise: Practice your strings

Let's review the many characteristics of the String class by predicting the output of the following code. If you can't do the Ruby interpretation in your head, just type these out in irb (no copy and pasting, or you'll gloss over the finer points).

  1. puts "He's a good doctor, and thorough."
  2. puts '"I\'ve been at sea."'
  3. puts 'Maude said to him: "He's a good doctor, and thorough"'
  4. puts 'Out of order'.upcase
  5. puts "We're going to #{'sea world'.upcase}"
  6. puts "There were #{12*2/4} sheep and #{"three" + " sheepdogs"} out over at #{"Cherry".upcase} Creek."
  7. puts '#{2*2}score and #{"7"} years ago'
Solution

The expected output:

  1. "He's a good doctor, and thorough."
  2. "I've been at sea."
  3. There is an unterminated String here. The original String terminates at "He's because it is a single-quoted String. A new – and unterminated – String begins at the " after thorough.
  4. OUT OF ORDER
  5. We're going to SEA WORLD
  6. There were 6 sheep and three sheepdogs out over at CHERRY Creek.
  7. #{2*2}score and #{"7"} years ago [string interpolation isn't done in single-quoted strings]

String substitution

One of the most common String operations is replacing certain characters with others. The method for that is called sub, short for substitution.

Unlike the few methods we've learned so far, sub takes in two arguments.

"Arguments" is a kind-of-confusing-term for the values/operating parameters that a method needs to do its work. We'll cover this more in detail in the methods chapter. Just play along for now.


puts "The cat and the hat".sub("hat", "rat")   #=> The cat and the rat
puts "Another brick in the wall".sub("brick in the", "")   #=>   "Another wall"
      

The first argument is the character(s) to replace. The second argument is what to replace it with.

The sub method replaces only the first occurrence of the string to match. If you want to replace all occurrences of a given string, use gsub, which is short for global substitution:


puts "I own an iPad, iPhone and an iPod".gsub('i', 'my')
#=>   I own an myPad, myPhone and an myPod
      

Note that character case matters.

Methods and regular expressions

Remember these replacement methods for later, because we'll use them more after we've formally covered methods.

After you learn about regular expressions, you'll find yourself using them rather than strings with gsub. I don't consider regular expressions a programming fundamental because you don't have to be a programmer to use them. But they're so important to any one who works with text (e.g. you, the reader) that I've written a separate chapter for them (skipping to it now isn't such a bad idea).

Moving on

If you're a data-enthusiast, strings will be one your most frequently-used types of data. Get used to using interpolation as it'll make your code more readable and less error-prone.