Move data into 3 Separate Hashes inside loop in ruby

ruby iterate over array of hashes
ruby for loop
ruby iterate over hash
ruby hash
ruby hash of hashes
ruby get value from hash by key
ruby hash map
ruby create hash in loop

It's only my second post and I'm still learning ruby. I'm trying to figure this out based on my Java knowledge but I can't seem to get it right.

What I need to do is: I have a function that reads a file line by line and extract different car features from each line, for example:

def convertListings2Catalogue (fileName)

f = File.open(fileName, "r")
f.each_line do |line|

  km=line[/[0-9]+km/]
  t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
  trans = ....
end end

Now for each line I need to store the extracted features into separate hashes that I can access later in my program.

The issues I'm facing: 1) I'm overwriting the features in the same hash 2) Can't access the hash outside my function

That what's in my file:

65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}

coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used

AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE

Now my function extracts the features of each car model, but I need to store these features in 3 different hashes with the same keys but different values.

listing = Hash.new(0)
  listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }

I tried moving the data from one hash to another, I even tried storing the data in an array first and then moving it to the hash but I still can't figure out how to create separate hashes inside a loop. Thanks

I don't fully understand the question but I thought it was important to suggest how you might deal with a more fundamental issue: extracting the desired information from each line of the file in an effective and Ruby-like manner. Once you have that information, in the form of an array of hashes, one hash per line, you can do with it what you want. Alternatively, you could loop through the lines in the file, constructing a hash for each line and performing any desired operations before going on to the next line.

Being new to Ruby you will undoubtedly find some of the code below difficult to understand. If you persevere, however, I think you will be able to understand all of it, and in the process learn a lot about Ruby. I've made some suggestions in the last section of my answer to help you decipher the code.

Code

words_by_key = {
  type:         %w| sedan coupe hatchback station suv |,
  transmission: %w| auto manual steptronic |,
  drivetrain:   %w| fwd rwd awd |,
  status:       %w| used new |,
  car_maker:    %w| honda toyota mercedes bmw lexus |,
  model:        %w| camry clk crv |
}
  #=> {:type=>["sedan", "coupe", "hatchback", "station", "suv"],
  #    :transmission=>["auto", "manual", "steptronic"],
  #    :drivetrain=>["fwd", "rwd", "awd"],
  #    :status=>["used", "new"],
  #    :car_maker=>["honda", "toyota", "mercedes", "bmw", "lexus"],
  #    :model=>["camry", "clk", "crv"]}

WORDS_TO_KEYS = words_by_key.each_with_object({}) { |(k,v),h| v.each { |s| h[s] = k } }
  #=> {"sedan"=>:type, "coupe"=>:type, "hatchback"=>:type, "station"=>:type, "suv"=>:type,
  #    "auto"=>:transmission, "manual"=>:transmission, "steptronic"=>:transmission,
  #    "fwd"=>:drivetrain, "rwd"=>:drivetrain, "awd"=>:drivetrain,
  #    "used"=>:status, "new"=>:status,
  #    "honda"=>:car_maker, "toyota"=>:car_maker, "mercedes"=>:car_maker,
  #      "bmw"=>:car_maker, "lexus"=>:car_maker,
  #    "camry"=>:model, "clk"=>:model, "crv"=>:model}

module ExtractionMethods
  def km(str)
    str[/\A\d+(?=km\z)/]
  end

  def year(str)
    str[/\A\d+{4}\z/]
  end

  def stock(str)
    return nil if str.end_with?('km')
    str[/\A\d+\p{Alpha}\p{Alnum}*\z/]
  end

  def trim(str)
    str[/\A\p{Alpha}{2}\z/]
  end

  def fuel_consumption(str)
    str.to_f if str[/\A\d+(?:\.\d+)?(?=l\/100km\z)/]
  end
end

class K
  include ExtractionMethods      
  def extract_hashes(fname)
    File.foreach(fname).with_object([]) do |line, arr|
      line = line.downcase
      idx_left = line.index('{')
      idx_right = line.index('}')
      if idx_left && idx_right    
        g = { set_of_features: line[idx_left..idx_right] }
        line[idx_left..idx_right] = ''
        line.squeeze!(',')
      else
        g = {}
      end
      arr << line.split(',').each_with_object(g) do |word, h|
        word.strip!
        if WORDS_TO_KEYS.key?(word)
          h[WORDS_TO_KEYS[word]] = word
        else
          ExtractionMethods.instance_methods.find do |m|
            v = public_send(m, word)
            (h[m] = v) unless v.nil?
            v
          end
        end
      end
    end
  end
end

Example

data =<<BITTER_END
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE
BITTER_END

FILE_NAME = 'temp'
File.write(FILE_NAME, data)
  #=> 353 (characters written to file)

k = K.new
  #=> #<K:0x00000001c257d348>
k.extract_hashes(FILE_NAME)
  #=> [{:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry}",
  #     :km=>"65101", :type=>"sedan", :transmission=>"manual", :year=>"2010",
  #     :stock=>"18131a", :drivetrain=>"fwd", :status=>"used", :fuel_consumption=>5.5,
  #     :car_maker=>"toyota", :model=>"camry", :trim=>"se"},
  #    {:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry, power seats}",
  #     :type=>"coupe", :km=>"1100", :transmission=>"auto", :drivetrain=>"rwd",
  #     :model=>"clk", :trim=>"lx", :stock=>"18fo724a", :year=>"2017",
  #     :fuel_consumption=>6.0, :status=>"used"},
  #    {:set_of_features=>"{heated seats, heated mirrors, keyless entry}",
  #     :drivetrain=>"awd", :type=>"suv", :km=>"0", :transmission=>"auto",
  #     :status=>"new", :car_maker=>"honda", :model=>"crv", :fuel_consumption=>8.0,
  #     :stock=>"19bf723a", :year=>"2018", :trim=>"le"}]

Explanation

Firstly, note that the HEREDOC needs to be un-indented before being executed.

You will see that the instance method K#extract_hashes uses IO#foreach to read the file line-by-line.1

The first step in processing each line of the file is to downcase it. You will then want to split the string on commas to form an array of words. There is a problem, however, in that you don't want to split on commas that are between a left and right brace ({ and }), which corresponds to the key :set_of_features. I decided to deal with that by determining the indices of the two braces, creating a hash with the single key :set_of_features, delete that substring from the line and lastly replace a resulting pair of adjacent commas with a single comma:

  idx_left = line.index('{')
  idx_right = line.index('}')
  if idx_left && idx_right    
    g = { set_of_features: line[idx_left..idx_right] }
    line[idx_left..idx_right] = ''
    line.squeeze!(',')
  else
    g = {}
  end

See String for the documentation of the String methods used here and elsewhere.

We can now convert the resulting line to an array of words by splitting on the commas. If any capitalization is desired in the output this should be done after the hashes have been constructed.

We will build on the hash { set_of_features: line[idx_left..idx_right] } just created. When complete, it will be appended to the array being returned.

Each element (word) in the array, is then processed. If it is a key of the hash WORDS_TO_KEYS we set

h[WORDS_TO_KEYS[word]] = word

and are finished with that word. If not, we execute each of the instance methods m in the module ExtractionMethods until one is found for which m[word] is not nil. When that is found another key-value pair is added to the hash h:

h[m] = word

Notice that the name of each instance method in ExtractionMethods, which is a symbol (e.g., :km), is a key in the hash h. Having separate methods facilitates debugging and testing.

I could have written:

if    (s = km(word))
  s
elsif (s = year(word))
  s
elsif (s = stock(str))
  s
elsif (s = trim(str))
  s
elsif (s = fuel_consumption(str))
  s
end

but since all these methods take the same argument, word, we can instead use Object#public_send:

a = [:km, :year, :stock, :trim, :fuel_consumption]

a.find do |m|
  v = public_send(m, word)
  (h[m] = v) unless v.nil?
  v 
end

A final tweak is to put all the methods in the array a in a module ExtractionMethods and include that module in the class K. We can then replace a in the find expression above with ExtractionMethods.instance_methods. (See Module#instance_methods.)

Suppose now that the data are changed so that additional fields are added (e.g., for "colour" or "price"). Then the only modifications to the code required are changes to words_by_key and/or the addition of methods to ExtractionMethods.

Understanding the code

It may be helpful to run the code with puts statements inserted. For example,

idx_left = line.index('{')
idx_right = line.index('}')
puts "idx_left=#{idx_left}, idx_left=#{idx_left}"

Where code is chained it may be helpful to break it up with temporary variables and insert puts statements. For example, change

arr << line.split(',').each_with_object(g) do |word, h|
  ...

to

a = line.split(',')
puts "line.split(',')=#{a}"
enum = a.each_with_object(g)
puts "enum.to_a=#{enum.to_a}"
arr << enum do |word, h|
  ...

The second puts here is merely to see what elements the enumerator enum will generate and pass to the block.

Another way of doing that is to use the handy method Object#tap, which is inserted between two methods:

arr << line.split(',').tap { |a| puts "line.split(',')=#{a}"}.
            each_with_object(g) do |word, h|
              ...

tap (great name, eh?), as used here, simply returns its receiver after displaying its value.

Lastly, I've used the method Enumerable#each_with_object in a couple of places. It may seem complex but it's actually quite simple. For example,

arr << line.split(',').each_with_object(g) do |word, h|
  ...
end

is effectively equivalent to:

h = g
arr << line.split(',').each do |word|
  ...
end
h

1 Many IO methods are typically invoked on File. This is acceptable because File.superclass #=> IO.

Ruby Hashes - A Detailed Guide, While Loops · Until Loops · Do/While Loops · For Loops · Conditionals Within Loops A hash is a data structure that stores items by associated keys. Because hashes can have multiple elements in them, there will be times when you'll want to iterate over a hash to do Then add more little parts as you move along. Hashes Hashes are another way to store multiple values inside a variable. In an array, you don't have any control over the indexes. They are numbers, and they go up by one with each item, starting from 0. In a hash, you provide key-value pairs, where the key doesn't have to be a number. To create a new hash, you have two possiblities:

You could leverage the fact that your file instance is an enumerable. This allows you to leverage the inject method, and you can seed that with an empty hash. collector in this case is the hash that gets passed along as the iteration continues. Be sure to (implicitly, by having collector be the last line of the block) return the value of collector as the inject method will use this to feed into the next iteration. It's some pretty powerful stuff!

I think this is roughly what you're going for. I used model as the key in the hash, and set_of_features as your data.

def convertListings2Catalogue (fileName)
  f = File.open(fileName, "r")

  my_hash = f.inject({}) do |collector, line|
    km=line[/[0-9]+km/]
    t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
    trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
    dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
    status = line[(Regexp.union(/used/i, /new/i))]
    car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]  
    stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
    year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
    trim = line.scan(/\b[a-zA-Z]{2}\b/).first
    fuel = line.scan(/[\d.]+L\/\d*km/).first
    set_of_features = line.scan(/\{(.*?)\}/).first
    model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]
    collector[model] = set_of_features
    collector
  end
end

Loops in Ruby, Loops and iterators in Ruby are a great way to perform repeated operations on a data set. Next, let's look at adding conditions within a loop by printing all even numbers from 0 up to 10. Let's look at another example using an array instead of a range. # countdown4.rb x = [1, 2, 3, 4, 5] for i in x do puts i end puts "Done!". This dives another level into the hash and returns the value of the second key/value pair, holiday = > supplies: ["lights", "tree"]. Looping Through a Nested Hash The best way to understand a nested loop is this: the first iteration loops through the outer hash and its key/value pair, and the second iteration loops through the inner hash and

Hope I understood you're question correctly. I would do this like below. Now everytime you would call this action it will return a hash with each listing in it.

    def convertListings2Catalogue (fileName)
      listings = []

      f = File.open(fileName, "r")
      f.each_line do |line|

        km=line[/[0-9]+km/]
        t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
        trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
        dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
        status = line[(Regexp.union(/used/i, /new/i))]
        car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]  
        stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
        year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
        trim = line.scan(/\b[a-zA-Z]{2}\b/).first
        fuel = line.scan(/[\d.]+L\/\d*km/).first
        set_of_features = line.scan(/\{(.*?)\}/).first
        model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]

        listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }

        listings.push listing

        return listings
      end 
    end

Then wherever you use this you could just do.

listnings = convertListings2Catalogue("somefile.txt")
listnings.first #to get the first listing 

Computer Science Programming Basics in Ruby: Exploring Concepts , Exploring Concepts and Curriculum with Ruby Ophir Frieder, Gideon Frieder, David Grossman We discussed one-dimensional arrays, arrays of arrays, and hashes. The data stored in an array is accessed using numbers as an index starting at 0. Traverse: To move from one element to another within an array. • Hash:  Loops and iterators are a great way to perform repeated operations on a data set. Often, in Ruby, you'll find yourself reaching for an iterator before a loop, but not all the time. Recursion, the ability to call a method inside of itself, can also do some powerful operations when solving problems. Let's test these out with some exercises!

Level Up Your Ruby Skillz: Working With Hashes - DEV, Level Up Your Ruby Skillz (3 Part Series). 1) Level Up One of the most valuable things you can do with a hash is iterate over it. One of the  Browse other questions tagged ruby-on-rails ruby hash hash-of-hashes or ask your own question. The Overflow Blog Podcast 236: A glitch in the Matrix

Building Bioinformatics Solutions, hashes. Using this naming scheme can help you keep track of what is going on There are three different types of data structures/variables available in Perl: The Control structures, like loops including for, foreach, and while, and conditional so if you decide to move on to another scripting language, like Ruby, Python,  I don't know if this is actually good ruby code, but what I am trying to do is split a String into two separate sections and put the two as values to two specific keys. For example: name_a = "He

Practical Ruby for System Administration, It is impossible for the same dates in two consecutive years to fall on the same Hence the while loop keeps moving date backward a year until both Finally, we are able to return the parsed event, which is a hash containing all the pertinent data. Ideally, it should be possible to drive all three of these scenarios from the​  3:16. And if I run that again, we'll see it prints out the keys and. 3:21. the values just like in arrays. 3:23. Any way that we would manipulate these keys and. 3:26. values and items inside of the each method, 3:30. when iterating over a hash, will not effect the hash itself. 3:37. Try practicing iterating over hashes now, using workspaces.

Comments
  • Can you add an example of the expected or preferred output?
  • Yeah of course!
  • Thank you for this very detailed answer, very helpful
  • Hey, thanks for the answer, will this create separate hashes ? In my case I need 3 hashes all with the same keys but different values
  • @SaraMoufarrej Ah, so you want an array of hashes, where all the values for each hash are constructed in each iteration of the loop? So, in other words, one hash for each line?
  • Yes, that's it exactly!
  • Hey, i'm getting a syntax error q3.rb:22: syntax error, unexpected ':', expecting '}' listings.push { kilometers: km, type: t, transmission: t...
  • Try :kilometers => km etc.!
  • Yeah i'm still getting the same error q3.rb:22: syntax error, unexpected =>, expecting &. or :: or '[' or '.' ...h { :kilometers => km, :type => t, :transmission => trans, :... ...
  • ah my fault do it like this: listing = { kilometers: km, transmission: tm ........} listings.push listing
  • updated my answer, this doesnt produce the error on my end, so hopefully it works for u!