Big Data/Analytics Zone is brought to you in partnership with:

Ayende Rahien is working for Hibernating Rhinos LTD, a Israeli based company producing developer productivity tools for OLTP applications such as NHibernate Profiler (nhprof.com), Linq to SQL Profiler(l2sprof.com), Entity Framework Profiler (efprof.com) and more. Ayende is a DZone MVB and is not an employee of DZone and has posted 485 posts at DZone. You can read more from them at their website. View Full User Profile

Fail, Fail, Fail: More Job Candidate Fails

02.24.2014
| 8337 views |
  • submit to reddit

Sometimes, reading candidates answers is just something that I know is going to piss me off.

We have a question that goes something like this (the actual question is much more detailed):

We have a 15TB csv file that contains web log, the entries are sorted by date (since this is how they were entered). Find all the log entries within a given date range. You may not read more than 32 MB.

A candidate replied with an answer that had the following code:

    string line = string.Empty;
    StreamReader file;
     
    try
    {
        file = new StreamReader(filename);
    }
    catch (FileNotFoundException ex)
    {
       Console.WriteLine("The file is not found.");
       Console.ReadLine();
       return;
   }
    
   while ((line = file.ReadLine()) != null)
   {
       var values = line.Split(',');
       DateTime date = Convert.ToDateTime(values[0]);
       if (date.Date >= startDate && date.Date <= endDate)
           output.Add(line);
    
       // Results size in MB
       double size = (GetObjectSize(output) / 1024f) / 1024f;
       if (size >= 32)
       {
           Console.WriteLine("Results size exceeded 32MB, the search will stop.");
           break;
       }
   }

My reply was:

The data file is 15TB in size, if the data is beyond the first 32MB, it won't be found.

The candidate then fixed his code. It now includes:

  var lines = File.ReadLines(filename);

Yep, this is on a 15TB file.

Now I’m going to have to lie down for a bit, I am not feeling so good.

Published at DZone with permission of Ayende Rahien, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Deepu Roy replied on Mon, 2014/02/24 - 5:33am

I remember asking a candidate the following question:

Given a method:

public static int fibonacci(int n) {
     return n + fibonacci(n - 1);
}

what is the value of fibonacci(3)?

The answer I got was 5. Needless to say, the discussion stopped there.


 

philippe tseyen replied on Mon, 2014/02/24 - 11:02am in response to: Deepu Roy

I hope you didn't hire the candidate that answered 7 :-)

Hugo Ribeiro replied on Mon, 2014/02/24 - 12:47pm in response to: Deepu Roy

http://youtu.be/5VYtiyjqx7E?t=36s

LOL :)

Abdul Habra replied on Mon, 2014/02/24 - 4:37pm in response to: Deepu Roy

 Please note that a Fibonacci number is defined as:

f(n) = f(n-1) + f(n-2)

The function Deepu Roy provided should have this:

return fibonacci(n-1) + fibonacci(n-2);

According to the implementation Deepu provided:

assume f(0) = 0
f(1) = 1
f(2) = 2 + f(1) = 3

f(3) = 3 + f(2) = 6

Robert Greathouse replied on Mon, 2014/02/24 - 5:23pm

How is this a constructive conversation/article?

Goran Magdic replied on Tue, 2014/02/25 - 3:25am

Question: How would the best solution be implemented?
Check first and last date in the file and calculate where would the given dates be, and recalculate if they are not there?

Martin Tunzer replied on Wed, 2014/03/05 - 9:26am in response to: Abdul Habra

 Man, you would fail the test too. The trick with the Fibonacci function used in the test is, that it is wrong, the recursion never ends.

Martin Tunzer replied on Wed, 2014/03/05 - 9:33am in response to: Goran Magdic

How would you calculate it? The log can have arbitrary number of records in a day. A binary search must be used on random access file to find any entry from the time range. Then it must be searched in both directions from the entry position to find the first and the last entry from the interval.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.