Chris Zetter

Blogocube

Keyword Arguments in Ruby 2.0

One of the new features of Ruby 2.0 are keyword arguments.

Keyword arguments make it easier create methods that take optional named arguments. You can now do this:

1
2
3
4
5
6
7
8
def exclaim(text, exclamation: '!', number: 7)
  text + exclamation * number
end

exclaim('hello', number: 4) #=> 'hello!!!'

# equivalent:
exclaim('hello', {:number => 4}) #=> 'hello!!!'

Keyword arguments in the method definition must be symbols given in the new-style hash syntax, but in the method call you may use the old-style hash syntax.

Compare this to the current common pattern of using a hash to represent options:

1
2
3
4
5
6
def old_exclaim(text, options = {})
  exclamation = options[:exclamation] || '!'
  number = options[:number] || 7

  text + exclamation * number
end

Take care when refactoring old hash-style arguments to the new keyword arguments, while in most cases only the method definition needs to change, the new syntax will raise an ArgumentError when you call it with an argument that isn’t defined in the method signature. This will prevent spelling mistakes that would otherwise cause the call to succeed with a incorrect set of arguments.

1
exclaim('hello', start_exclamation: '¡')  #=> raises: unknown keyword: start_exclamation (ArgumentError)

You can stop an ArgumentError being raised and deal with arbitrary keyword arguments by using a double-splat (double asterisk) to capture all keywords not already assigned and puts them in a hash:

1
2
3
4
5
def keyword_test(matched: '1', **additional_arguments)
  return additional_arguments
end

keyword_test(matched: 1, unmatched: 2) #=> {unmatched: 2}

As well as less lines of boiler-plate code, another win of using keyword arguments is that it avoids the false trap common when dealing with option hashes- when you default an argument you must do so on it’s presence, not on it’s truthiness otherwise you will never be able to set an argument to false:

1
2
3
4
5
6
7
options = {to_uppercase:  false}

# incorrect (always overriding nil and false):
options[:to_uppercase] || true # => true

# correct (allowing nil and false):
options.fetch(:to_uppercase, true) # => false

So the result is a useful pattern that replaces a more verbose and error prone one. Just remember kids: don’t refactor for the sake of it. Using keywords arguments will mean your code can’t be used with Ruby 1.9.x anymore and could cause API breaks if users are calling methods with unexpected options.

For more reading about Ruby 2.0 I recommend reading:

HTML CV Template

Last time I was looking for a job amending my CV in Apple Pages was a pain. I realised that when working in content & presentation my brain thinks best in HTML & CSS, not in the WYSIWYG terms of a word processor.

Not wanting to re-learn LaTeX, I wrote my CV like a webpage. Compared to a .pages or .doc this is great. As well as a web page you can print it to PDF easily from a web browser.

A generic version of my CV is now on Github for anyone to use.

Preview the template here.

The ‘bad scan arg format’ Bug

Last week I had upgraded the Engine Yard hosted app to run on ruby 1.9.3. As with most of engine yard config it was easy as pressing a button.

All was going well for the app except a cryptic “fatal: bad scan arg format: 1” exception in the logs coming from the Net::HTTP library.

Searching the Ruby source I found that the method rb_scan_args in class.c was causing the error. rb_scan_args is a helper method that can parse optional arguments.

It turns out that Engine Yard doesn’t restart Nginx on deploy, instance rebuild or even ruby upgrade. This means the Nginx & Passenger had been using ruby 1.9.2 with gems installed for ruby 1.9.3.

You can restart Nginx on Engine Yard by:

1
$ sudo /etc/init.d/nginx restart

Problem solved.

Ruby Parsing Bug

A parsing bug in ruby 1.9.2 allowed you to pass blocks to methods preceded by a comma. The bug only affected blocks with do/end and was fixed in 1.9.3.

$ rvm use ruby 1.8.7

1
2
3
[1].inject :+, do |x| 1 end
# => SyntaxError: compile error
# => (irb):1: syntax error, unexpected kDO_BLOCK

$ rvm use ruby 1.9.2 # tested on ruby-1.9.2-p290

1
2
3
4
5
[1].inject :+, do |x| 1 end
# => 1

[1].inject :+, {|x| 1}
# => SyntaxError: (irb):2: syntax error, unexpected '|', expecting '}'

$ rvm use ruby 1.9.3

1
2
[1].inject :+, do |x| 1 end
# => SyntaxError: (irb):1: syntax error, unexpected keyword_do_block

therubygame Deconstruct

This is a deconstuction of matematikaadit’s submission to therubygame challenge 5; ‘Roman numerals. What are they good IV?’. The goal of the challenge is to take a string representing a roman numeral as input and return the integer that the numeral represents.

matematikaadit currently has the honour of the shortest (by character count) submission for this challenge. At first glance I didn’t understand how it worked so I re-wrote and analyzed it until I did.

Making it readable

matematikaadit’s original submission:

1
2
3
def to_arabic_numeral(roman)
  n=s=0;roman.bytes{|c|s+=n-2*n%n=10**(205558%c%7)%9995};s+n
end

That’s pretty unreadable to me. Lets apply some formatting:

1
2
3
4
5
6
7
def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    s += n - 2 * n % n = 10 ** (205558 % c % 7) % 9995
  }
  s + n
end

and some bracketing:

1
2
3
4
5
6
7
def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    s += n - ((2 * n) % (n = ((10 ** (205558 % c % 7)) % 9995)))
  }
  s + n
end

and introduce some variables:

1
2
3
4
5
6
7
8
9
10
def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    last_n = n
    n = (10 ** (205558 % c % 7)) % 9995
    i = last_n - ((2 * last_n) % n)
    s += i
  };
  s + n
end

and some better named variables:

1
2
3
4
5
6
7
8
9
10
def to_arabic_numeral(roman)
  value = sum = 0
  roman.bytes { |char_code|
    last_value = value
    value = (10 ** (205558 % char_code % 7)) % 9995
    increment = last_value - ((2 * last_value) % value)
    sum += increment
  }
  sum + value
end

and lastly add some logging:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def to_arabic_numeral(roman)
  value = sum = 0
  roman.bytes { |char_code|
    last_value = value
    value = (10 ** (205558 % char_code % 7)) % 9995
    increment = last_value - ((2 * last_value) % value)
    sum += increment

    puts [
           "char:#{char_code.chr}",
           "char_code:#{char_code}",
           "value:#{value.to_s.ljust(4)}",
           "last_value:#{last_value.to_s.ljust(4)}",
           "increment:#{increment.to_s.ljust(4)}",
           "sum:#{sum}"
         ].join('  ')
  }
  sum + value
end

to_arabic_numeral('MCMXCIX') #=> 1999

When run this prints:

char:M  char_code:77  value:1000  last_value:0     increment:0     sum:0
char:C  char_code:67  value:100   last_value:1000  increment:1000  sum:1000
char:M  char_code:77  value:1000  last_value:100   increment:-100  sum:900
char:X  char_code:88  value:10    last_value:1000  increment:1000  sum:1900
char:C  char_code:67  value:100   last_value:10    increment:-10   sum:1890
char:I  char_code:73  value:1     last_value:100   increment:100   sum:1990
char:X  char_code:88  value:10    last_value:1     increment:-1    sum:1989

We see that the sum is always one iteration behind and the last value is added to sum after the loop is finished.

There’s two separate complicated lines here; the conversion of char_code into value and the calculation of increment.

Converting from numeral to integer

First lets look at the calculation of value which converts the ascii character code of a numeral to the integer the numeral represents:

1
value = (10 ** (205558 % char_code % 7)) % 9995

Lets wrap the to a function that takes a character:

1
2
3
4
def roman_numeral_to_integer(char)
  char_code = char.ord
  (10 ** ((205558 % char_code) % 7)) % 9995
end

remind ourselves of the expected mappings:

I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000

and see if it works:

1
2
3
4
5
roman_numeral_to_integer('I') # => 1
roman_numeral_to_integer('C') # => 100

roman_numeral_to_integer('i') # => 1000
roman_numeral_to_integer('Z') # => 5

It appears to work fine for any of the roman numerals and is undefined for other characters. Lets plug some more values into it:

1
2
3
4
5
numerals = "IVXLCDM"
('A'..'Z').each { |c|
  value = roman_numeral_to_integer(c)
  puts "char:#{c} value:#{value.to_s.ljust(4)}  #{'*NUMERAL*' if numerals.include?(c) }"
}

Which prints:

char:A  value:1     
char:B  value:500   
char:C  value:100   *NUMERAL*
char:D  value:500   *NUMERAL*
char:E  value:1     
char:F  value:1000  
char:G  value:500   
char:H  value:1     
char:I  value:1     *NUMERAL*
char:J  value:5     
char:K  value:100   
char:L  value:50    *NUMERAL*
char:M  value:1000  *NUMERAL*
char:N  value:1     
char:O  value:1     
char:P  value:1000  
char:Q  value:50    
char:R  value:1000  
char:S  value:10    
char:T  value:1000  
char:U  value:1     
char:V  value:5     *NUMERAL*
char:W  value:10    
char:X  value:10    *NUMERAL*
char:Y  value:10    
char:Z  value:5

There’s no simple pattern to exploit in the relationship between ascii character codes and the distribution or magnitude of roman numerals.

Numeral value vs. ascii code

Instead roman_numeral_to_integer is a cleverly constructed function that fits all the points on the above graph, mapping all the roman numerals to integers. A lookup hash is certainly more easily constructed, read, and maintainable but this wins on cleverness and character count.

For more about a similar ‘Magic Formula’ for the same purpose, and how you could construct it using brute force methods see Golf: Magic Formula for Roman Numerals.

If you want to play around with this function try Wolfram Alpha: plot (10 ^ ((205558 mod floor(x)) mod 7)) mod 9995, x=65 to 90

Calculating the increment

Now lets look at increment. Increment is the value that should be added to sum. Because the sum calculation is one iteration behind it is last_value and not value that should be added.

1
increment = last_value - ((2 * last_value) % value)

When a smaller numeral is before a larger one it should be subtracted rather than added to the total so the increment will be -last_value when last_value < value, and else will just be last_value

The code works because, for the natural numbers x and y:

x % y ≡ x, when x < y 
x % y ≡ 0, when x >= y and y is a factor of x

Note that for any numeral value x all smaller numeral values are factors. For example, L (50) is greater than I (1), V (5), X (10) and all of these are factors.

So when last_value >= value it just added to sum (the right hand side of the subtraction will be 0) and when last_value < value it is turned negative (the right hand side of the subtraction will be last_value * 2).

Going back to the original program it looks like we only save 1 character doing it this way over a ternary if:

1
2
3
l-2*l%n
# vs.
n>l?-l:l

But remember that in the original program last_value and value are both stored in the single variable n. The program relies on the order of evaluation to use the correct value of n whereas a ternary if will evaluate the conditional first so would require a new variable to be used.

Summing up

The method has clever use of a mathematical function for mapping numerals and modulus operation to avoid an if statement. Most of this is done in a single expression and trusts the evaluation order semantics of Ruby.