A Programming Primer for Counting and Other Unconventional Tasks

Enumerables

Transforming and sorting collections

The Enumerable module provides a set of methods to traverse, search, sort and manipulate collections.

If you understand loops and arrays and hashes, there's nothing new conceptually here. But you'll learn to do more with prettier, fewer lines of code.

The each

Here's a for loop used to print out every element of an array:


ark = ['cat', 'dog', 'pig', 'goat']
for animal in ark
   puts ark[animal]
end   
      

As we learned in the collections chapter, the each method is Ruby's preferred way to iterate across a collection:


ark = ['cat', 'dog', 'pig', 'goat'] 
ark.each do |animal|
   puts animal
end
         

To review:

each
A method belonging to Array, and other collection classes. It's also known as an iterator, cycling through a code of block that acts or depends on each element in the collection.
do
A Ruby keyword that signifies the beginning of a block of code
|animal|
The variable names within the pipe characters are the arguments passed into the block. In the case of each, the single argument refers to the element in the current iteration.
end
Closes the block of code used by each.

This is what it looks like with curly braces notation, which is typically used for single-line code blocks:


['cat', 'dog', 'pig', 'goat'].each{ |animal| puts animal }
#=>   cat
#=>   dog
#=>   pig
#=>   goat

# as opposed to:
['cat', 'dog', 'pig', 'goat'].each do |animal|
   puts animal
end
      

each_with_index

So we've seen how each can result in tighter code compared to for. But sometimes it's useful to have a reference to each element and that element's numerical position (i.e. index) in its collection.

The Enumerable module has the convenient each_with_index, which has an extra argument to refer to the index:


#   print out every other element in the array
['cat', 'dog', 'pig', 'goat'].each_with_index do |animal, idx|
   puts animal if idx % 2 == 0
end
# prints:
#=>   cat
#=>   pig
      
Exercise: Practice each_with_index

Using a collection of alphabetical letters:

('A'..'Z')

Use each_with_index to only print out every third letter.

Solution

('A'..'Z').each_with_index do |letter, idx|
   puts letter if idx % 3 == 2
end
      

The output

C
F
I
L
O
R
U
X
each_index

If you just need the index of each element, there's also each_index:

(A..Z).each_index{|idx| puts idx}
#=> 0
#=> 1
#=> 2
#=> 3
# ...

Transforming arrays with map

What if you want to transform each element in the array? It's perfectly possible with a for loop:


arr = [1,2,3,4,5]
for x in 0..arr.length-1
   arr[x] = -arr[x]
end
puts arr.join(", ")   #=>   -1, -2, -3, -4, -5         

But what if you don't want to modify arr? What if you want a transformed copy of arr without altering the original? This is where the Enumerable method map (also referred to as collect) comes in:


arr = [1,2,3,4,5]
brr = arr.map{|x| -x}

puts arr.join(", ")   #=>   1, 2, 3, 4, 5   
puts brr.join(", ") #=> -1, -2, -3, -4, -5
   

What's going on here? The map method returns a copy of the its invoking collection, which is why we can immediately chain the join method to its result, brr. Meanwhile, the original arr retains its unaltered values.

Like each (in fact, map, and all the other Enumerable iterating methods are based off of each), map is invoked by a collection and accepts a block of code that acts upon each element in that collection. The main difference is that map returns that transformed collection:


ark2 = ['cat', 'dog', 'pig', 'goat'].map{ |animal| animal.capitalize}   
   

No need to initialize ark2 on its own line. As with methods, the last line in the code block is the value that will be part of the transformed collection.


ark2 = ['cat', 'dog', 'pig', 'goat'].map do |animal| 
   animal.capitalize
   animal.upcase   
end
         
puts ark2.join(", ")   #=>   "CAT, DOG, PIG, GOAT"
      
Why each doesn't cut it

Why can't we transform the array using each, like so:


arr = [1,2,3,4,5]
arr.each{ |x| x = -x }
puts arr.join(", ")   #=>   1, 2, 3, 4, 5         
   

Why not? Because in each iteration, the assignment operation here doesn't actually alter each element in the iteration. The x is a pointer to the current element in the collection. Changing x to point to another object, as we do above with x = -x, does not change the actual element in the array. It just points the puts arr[i] to something else, and that pointer is irrelevant at the end of the loop.

You could, however, use each_index:


arr = [1,2,3,4,5]
arr.each_index{ |x| arr[x] = -arr[x] }
puts arr.join(", ")   #=>   -1, -2, -3, -4, -5         
   

By addressing the array by the index x, we're actually modifying arr and its elements upon assignment.

But again, there's an important distinction. The above code modifies arr. Using map does not.

map!

However, if you want map to modify an array, you can use map!


arr = [1,2,3,4,5]
arr.map!{|a| a - a}    
puts arr.join(',')
#=> => [0, 0, 0, 0, 0] 
Exercise: Practice map

Using map (and one other method), create an array that lists the numbers 0 to -100 in descending order. Without typing all the numbers manually, of course.

Solution

100.times.map{|x| -x}
   

Or:


(0..100).map{|x| -x}
   
Combining map and each_with_index

If you chain the each_with_index method with another Enumerable method, such as map, you can have access to the current index inside your map block:


arr2 = ['a', 'b', 'c', 'd'].each_with_index.map do |letter, idx|
   "#{letter.capitalize} is in position #{idx+1} of the alphabet"
end

puts arr2.join("\n")

#=>   Letter A is in position 1 of the alphabet
#=>   Letter B is in position 2 of the alphabet
#=>   Letter C is in position 3 of the alphabet
#=>   Letter D is in position 4 of the alphabet
   
Exercise: Practice each_with_index and map

Using the previously used ark array (['cat', 'dog', 'pig', 'goat']), create a new array in which every second element is uppercased and backwards.

Solution

ark = ['cat', 'dog', 'pig', 'goat']
ark2 = ark.each_with_index.map do |a, i|
   if i % 2 == 1
      a.capitalize.reverse
   else
      a
   end
end   
puts ark2.join(', ')
#=> cat, goD, pig, taoG

The select and inject methods

Here are two more specialized Enumerable methods:

select

This useful method takes in one argument. The block you pass it should be some kind of true/false test. If the expression results in true for an element in an array, that element is kept as part of the returned collection


puts [1,'a', 2, 'dog', 'cat', 5, 6].select{ |x| x.class==String}.join(", ")            
#=>   a, dog, cat
         
Exercise: Filter a list of tweets

At the end of the previous chapter, we used crack to parse a sample XML file of tweets.

Using the same general code, fetch and loop through the tweets again. However, use the select method to select only tweets that have more than 10 retweets.

Print the number of tweets that meet the criteria and print the content of each tweet.

Here's an excerpt from the tweets file:


   <status>
      <created_at>Tue Sep 06 17:01:06 +0000 2011</created_at>
      <id>111121760346312704</id>
      <text>If you purchase an energy-efficient product for your home, you may be eligible for a federal tax credit. Learn more: http://t.co/TCZrLTr</text>
      <source>
         <a href="http://app.measuredvoice.com/" rel="nofollow">Measured Voice</a>
      </source>
      <truncated>false</truncated>
      <favorited>false</favorited>
      <in_reply_to_status_id/>
      <in_reply_to_user_id/>
      <in_reply_to_screen_name/>
      <retweet_count>14</retweet_count>
      <retweeted>false</retweeted>
   </status>
Solution
require 'rubygems'
require 'rest-client'
require 'crack'

URL = "http://ruby.bastardsbook.com/files/tweet-fetcher/tweets-data/USAGov-tweets-page-2.xml"
response = RestClient.get(URL)   
xml = Crack::XML.parse(response.body)

statuses = xml["statuses"].select{|status| status["retweet_count"].to_i > 10}

puts "There are #{statuses.length} statuses that have more than 10 retweets"
statuses.each do |status_el|
   puts status_el["text"]
   puts status_el["created_at"]
   puts "--- \n"
end
The inject method

The inject method takes a collection and reduces it to a single value, such as a sum of values:


val = [1,3,5,7].inject(0) do |total, num|
   total += num
end   
puts val   #=> 16
      

inject is probably the least intuitive Enumerable method. So don't worry if you don't get it right away. It is best understood by looking at examples.

While inject's functionality can be replicated with a more verbose use of each, it's a handy option for tighter code.

In the above example:

0
the inject method takes an initial value as its argument. In the above example we are finding the sum of the numbers in the array, so we start at 0.
total
The inject's block has two arguments. The first is the variable for the return value after each iteration. Initially, total is equal to the starting value, 0. After each iteration, total is equal to the return value at the end of the block.
num
the second argument of the block refers to the current iteration's element. In the routine above, num is added to the value in total

When the loop finishes, inject returns the value in total

Enumerables have built-in min and max methods to find the minimum and maximum values in a collection.

If you were to write your own minimum-finding method using just only each, it might look like this:


min_num = nil
[65,3,100,42,-7].each do |num|
   min_num = num if min_num == nil || min_num > num
end
      

In each iteration of that loop, the value of min_num is replaced if it is larger than num, or if this is the loop's first iteration (when min_num is at its initial value of nil)

If you want to be fancy, you can use the ternary operator:


min_num = nil
[65,3,100,42,-7].each do |num|
   min_num = min_num == nil || min_num > num ? num : min_num
end
puts min_num    #=> -7
      

Using inject, we can make things even cleaner. When no argument for the initial value is passed into inject, it uses the first value of the collection as the initial iterator value:


min_answer = [65,3,100,42,-7].inject do |min_num, num| 
   min_num =  min_num > num ? num : min_num
end      
puts min_answer #=> -7
      
Create an array of the Fibonacci sequence with inject

The Fibonacci sequence consists of a sequence of integers in which each number is the sum of the previous two numbers in the sequence. By definition, the first two numbers are 0 and 1.

0,1,1,2,3,5,8,13...

The Fibonacci sequence is one of the most famous in mathematics as its properties have been observed in numerous fields, including the Golden Ratio used in the arts and the natural arrangement of a plant's leaves.

Here's one way of creating an array with the first 20 Fibonacci numbers using a loop:


arr = [0,1]
18.times do
    arr << arr[-2] + arr[-1]
end        
    

Remember that negative indexes count backwards from the end of the array. So arr[-2] – when arr = [0,1] – points to the value of 0. At the end of each iteration, arr increases in length by 1.

Now build the same array arr using inject (hint: you'll use one less line)

Solution

arr = 18.times.inject([0,1]) do |a, idx|
    a << a[-2] + a[-1]
end       

puts arr.join(', ')
#=> 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181

    

Note that the idx variable isn't needed since we know that no matter what iteration we're on, we always add the last two values of the array to get the new value.

Use inject to convert an array into a hash

Given a two-dimensional array (i.e. an array of two-element arrays), convert it into a hash in the keys and values are the first and second elements, respectively, of each sub-array.

This is easier to show than describe. Here's how you would do it with each:


data_arr = [['dog', 'Fido'], ['cat', 'Whiskers'], ['fish', 'Fluffy']]

data_hash = {}
data_arr.each do |d|
    data_hash[d[0]] = d[1]
end
    

Now do the same using inject.

Solution

data_arr = [['dog', 'Fido'], ['cat', 'Whiskers'], ['fish', 'Fluffy']]

data_hash = data_arr.inject({}) do |hsh, v|
    hsh[v[0]] = v[1]
    hsh
end
    

The main difference here is that you don't need to initialize data_hash as an empty hash. One mistake that I make all the time is not making hsh the last line in the block. If you forget to, the loop returns to the top and sets hsh to: hsh[v[0]], which, in this case, is just a String (i.e. 'Fido'). This will result in an error:


data_arr = [['dog', 'Fido'], ['cat', 'Whiskers'], ['fish', 'Fluffy']]

data_hash = data_arr.inject({}) do |hsh, v|
    hsh[v[0]] = v[1]
end

#=> IndexError: string not matched
#=>   from (irb):50:in `[]='
#=>   from (irb):50
#=>   from (irb):51:in `inject'

An alternative is to use the merge method of the Hash, although this is slower computationally (only noticeable if you're doing a loop in the magnitude of tens of thousands):


data_arr = [['dog', 'Fido'], ['cat', 'Whiskers'], ['fish', 'Fluffy']]
data_hash = data_arr.inject({}) do |hsh, v|
    hsh.merge({v[0]=>v[1]})
end
    
Use inject to convert array into a hash

Convert the result of the above exercise back into an array, but with the first and second elements' positions swapped from the original data_arr


data_hash = {"cat"=>"Whiskers", "fish"=>"Fluffy", "dog"=>"Fido"} 
        
Solution

data_arr = data_hash.inject([]) do |arr, v|
    arr << [v[1], v[0]]
end
        

Two notes here:

  1. When the key=>value pairs are traversed through, they are converted into arrays of two two-elements each, i.e.: [key,value]
  2. Because the «« modifies arr in place, there's no need to devote another line to making sure that arr is the return value of the block.

Understanding the Enumerable methods is nothing more than using your previous understanding of loops and collections and giving you more ways to write clean, compact code. It's important to be able to read and understand them in code; I frequently use them to examples to cut down on my own typing.

The best way to get used to seeing them is to use them whenever you can when you need to create and transform a collection. So instead of:


arr = [2.5, 6.4, 4.2, 12.9]
arr.each_index do |idx|
    arr[idx] = arr[idx].round
end   
puts arr.join(',')
#=> 3,6,4,13

    

Compact that clunky code:


puts [2.5, 6.4, 4.2, 12.9].map{ |v| v.round }.join(',')
#=> 3,6,4,13