Diving into Hash#merge, Hash#merge! and Hash#update in Ruby

Person in Old School Green Scuba Diving Suit

Recently I had the opportunity to use and explore Hash#merge and Hash#merge! (aka Hash#update).

These methods are surprisingly powerful, and if you are working with hashes that need to merge with other hashes, it is surprising to learn about their flexibility.

Possibilities

You will have more explanations in the documentation

But both these methods, as their name imply, allow you to merge some arrays to an initial one.

Hash#merge! will substitute the original array, whereas Hash#merge will return a copy of the original array.

Both methods will behave similarly. They can receive a list of hashes and a block.

  • If no hash is given: Hash#merge will return a copy of itself and Hash#merge! will return self.
  • If hashes are given, they will be merged into the original or copied array appropriately from left to right, with the newest key-value pair substituting the oldest one, unless:
  • If a block is given, if a key is present in more than one hash (including self), The execution of the block will return its value.

Caveats (discovered painfully)

  • Merge will not take into account default values of hashes
  • The time complexity of these methods (as discussed below) can get high if you are not careful
ruby docs Hash#merge! with block example

Dissecting the methods

Hash#merge

The Hash class is implemented in C like most of the Ruby language; here is the current Hash#merge implementation in C

static VALUE rb_hash_merge(int argc, VALUE *argv, VALUE self){return rb_hash_update(argc, argv, rb_hash_dup(self));}

As you can see, this will create a copy of the hash the method is called on:

rb_hash_dup(self)

and then call rb_hash_update, keeping the original arguments but changing self to be the copy.

if we were to pseudo-Ruby it (yes, I made this up, but hey, I am writing so I get to do that)

def merge(*other_hashes = [], &block)  copy = self.copy  copy.merge!(other_hashes) &blockend

As you can see, Hash#merge calls Hash#merge! so let us take a deep dive into:

Hash#merge!

The current implementation of Hash#merge! in C is:

static VALUE rb_hash_update(int argc, VALUE *argv, VALUE self) {  int i;  bool block_given = rb_block_given_p();  rb_hash_modify(self);  for (i = 0; i < argc; i++){    VALUE hash = to_hash(argv[i]);    if (block_given) {      rb_hash_foreach(hash, rb_hash_update_block_i, self);    }    else {      rb_hash_foreach(hash, rb_hash_update_i, self);    }  }  return self;}

This code looks complicated, so let’s dive in and dissect what is happening:

int i; # => Declaring an integer for our later loopbool block_given = rb_block_given_p(); # => Has a block been passed or not?

These two variables above are pretty straightforward. Here on after, the fun stuff starts:

rb_hash_modify(self) will check if the hash has been frozen previously

if that passes, we go to the meat of our method:

for (i = 0; i < argc; i++){ # => Loops through the argument hashes  VALUE hash = to_hash(argv[i]); # => Ensures we are dealing with a hashif (block_given) {
/*
* loops through the new hash and
* adds new keys at the end of the original hash,
* if a key is duplicated it executes the block
rb_hash_foreach(hash, rb_hash_update_block_i, self); }else {
/*
* loops through the new hash and
* adds new keys at the end of the original hash,
* if a key is duplicated it replaces it
rb_hash_foreach(hash, rb_hash_update_i, self); }}

Before we start discussing how cool this is, let’s pseudo-Ruby this method.

def merge!(*other_hashes, &block)  return error if is_frozen?(self)  other_hashes.each do |new_hash|    if block_given?    new_hash.keys.each do |key|      self[key] = new_hash[key] unless self[key]      self[key] = yield(key, self[key], new_hash[key]    else      self[key] = new_hash[key]    end  end  selfend
def update merge!end

This implementation is ingenious; it tries not to iterate through both hashes since it only needs to replace the duplicated keys. If a key does not exist, it is just added at the end of the hash.

By doing that, Ruby avoids doing checks at the block level and only calls the block when it is needed. As discussed below, it is also a way to reduce the time complexity since you are not iterating unnecessarily in multiple hashes, only the hashes you need to iterate in.

Time complexity

This stack overflow answer does an excellent job explaining the time complexity of the Hash#merge method for merging two hashes, but let’s try to out what would be the time complexity of merging multiple hashes.

We are here only looking at O notation (longest theoretical time this function can take, more on that here

For my mental clarity, I will do it based on our pseudo-Ruby:

def merge(*other_hashes = [], &block)  copy = self.copy # => constant  copy.merge!(other_hashes) &block # => We shall see bellow!enddef merge!(*other_hashes, &block)  return error if is_frozen?(self) # => constant  other_hashes.each do |new_hash| # => number of hashes = m    if block_given? # => constant      new_hash.keys.each do |key| # => number of keys on longest hash = n      self[key] = new_hash[key] unless self[key] # => constant      self[key] = yield(key, self[key], new_hash[key] # => We will assume a constant operation here*    else      self[key] = new_hash[key] # => constant    end  end  selfend

If we try to create a formula for the longest path where it will give us:

constant + (constant + (m * (constant + (n * constant))))

We can ignore constants, so we have O(m*n) if we take them away.

I keep m and n separated since the number of hashes should in most cases be small compared to the number of keys, but for the purists out there, our time complexity is O(n²)

We can learn from this analysis that if we want to optimize for time complexity (i.e., the lowest number of iterations), we should put the largest hash as the callee.

However, we would need to explore the consequences of adding a block to change the default behavior(i.e., the new values overwriting the old ones.)

Conclusion

The methods: Hash#merge and Hash#merge! (aka Hash#update), are surprisingly powerful, and if you are working with hashes that need to merge with other hashes, it is a great tool to have in your tool-belt.

Be careful not to overdo it. The time complexity can grow pretty rapidly. However, you can try to optimize it.

Please do not hesitate to correct me or ask any questions you might have, and thank you for your attention :)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pablo Curell Mompo

Pablo Curell Mompo

Full Stack developer @ Seraphin, learned how to code @ Le Wagon. I love coding and plan to keep learning as much as I can about it :)