I'm a software architect/consultant in Boulder, Colorado. I started blogging at: http://wayne-adams.blogspot.com/, but have since started the new blog, Data Sceintist in Training which will cover the areas I am learning as part of my own "big data" training, including the Hadoop family, frameworks like OpenStack, the R programming language, column-oriented databases and the art of examining data sets for useful patterns. The original blog will be kept alive with Java-specific posts. Wayne is a DZone MVB and is not an employee of DZone and has posted 35 posts at DZone. You can read more from them at their website. View Full User Profile

Haskell From an OO Developer's Perspective, Part 3

11.07.2011
| 2998 views |
  • submit to reddit

Today I'm going to process a set of structured data using Haskell, tainted by years of Smalltalk, C++, Java and C# experience. I've been following the book Real World Haskell, which I chose for a couple of reasons:

  • From the little I know of functional programming, Haskell appears to be a relatively "pure" FP language, and
  • The book is very comprehensive, covering FP topics of interest in depth.
While there are shorter introductions to FP, or good books on hybrid OO/FP languages, my preference is to make a clean break with OO and quickly become proficient in FP.

The structured data

I'm going to process a set of events generated by a Java program (they could be generated by any program, but this choice allows me to leverage one of my other ongoing projects). Each event consists of:
  • A timestamp, the epoch time measured in the usual Java milliseconds
  • A fully-qualified Java class name
  • The line number in the class at which the event is generated
  • A short descriptive message
  • A list of key-value pairs
The key-value pairs are analogous to the properties of a JMS message. In my application, I'll assume that some keys are present in every message, for the purposes of filtering messages during processing.

If you, too, are coming to Haskell or FP in general from an OO background, you don't need to see how this data structure would be implemented as an OO object, so I will focus instead of implementing it in Haskell. Here is my first cut:
    -- file:  c:/HaskellDev/eventProcessor/eventProcessor.hs

    type Timestamp = Integer
    type ClassName = String
    type LineNumber = Integer
    type Message = String
    type Key = String
    type Value = String
    type Property = (Key, Value)
    type Properties = [Property]

    data Event= Event Timestamp ClassName LineNumber Message Properties
      deriving (Show)

I have defined Properties to be a list of Property structures, where a Property is a 2-tuple consisting of a String Key and a String Value. Hopefully this structure will be interesting enough to provide a non-trivial example of an event processor in Haskell. Note that I included deriving (Show) in the type definition so that ghci will know how to output a value of my new type.

Now that we've defined the data type, let's create an Event value:
    *Main> let prop1 = ("sessionId", "ABCD1234")
    *Main> :type prop1
    prop1 :: ([Char], [Char])
    *Main> let props = [prop1]
    *Main> let evt1 = Event 1320512200548 "java.lang.String" 1293 "NPE in substring()" props
    *Main> evt1
    Event 1320512200548 "java.lang.String" 1293 "NPE in substring()" [("sessionId","ABCD1234")]
Next, we'll look at what we might want to do with this kind of data, and how we could process it in Haskell.

Generating the data

Suppose for a moment that we have a large application running, with many concurrent user sessions, producing hundreds or thousands of these values (our Events above) per minute. Occasionally a session will "come off the rails", in which case it would be really useful to be able to extract all messages specific to that user's session to see what is going on. We may want to do this in real time, or we may just process the data "after the fact", restricting ourselves to a time window during which the issues arose

In this example, I'm going to concentrate on the post-processing case. One reason is that I don't want to get into event-driven Haskell programming just yet, and a second reason is that I can arrange to skip (for now) the "non-FP-pure" code used to get the Events (file I/O, for example) and focus only on the "pure-FP" code. So I'll assume that we have a ready-made Haskell List of Event values to process in our example. I'll set the data up in my eventProcessor.hs file, entering as many values as I can before being overcome by boredom. I will put two properties in each property list: a session ID and a user ID (on the assumption that sessions come and go for the same user, and a user may even have multiple sessions open simultaneously). In my example data set there will be three user sessions going at any one time. Note that the let in variable assignment is a ghci artifact, so it isn't necessary if you make your assignments in the Haskell source file, below my Event data type definition:
 prop1 = ("sessionId", "ABCD1234")
    prop2 = ("sessionId", "EFGH5678")
    prop3 = ("sessionId", "WXYZ9876")
    prop4 = ("userId", "smith")
    prop5 = ("userId", "adams")
    prop6 = ("userId", "jobim")

    propList1 = [prop4, prop1]
    propList2 = [prop5, prop2]
    propList3 = [prop6, prop3]

    eventList = [
      Event 1320512200548 "java.lang.String" 1293 "NPE in substring()" propList1,
      Event 1320512200699 "javax.swing.JPanel" 388 "initialized" propList3,
      Event 1320512203699 "javax.swing.JList" 1255 "model replaced" propList3,
      Event 1320513130333 "com.adamsresearch.jarview.JarView" 388 "fileNotFound" propList2,
      Event 1320513255342 "com.adamsresearch.jarview.FileFilter" 79 "initialized" propList1,
      Event 1320513257324 "com.adamsresearch.jarview.ArchiveSearch" 193 "search started13255342" propList2,
      Event 1320512259333 "javax.lang.Integer" 133 "number format exception" propList3,
      Event 1320512260122 "com.adamsresearch.jarview.JarView" 725 "search started" propList2,
      Event 1320512263122 "com.adamsresearch.jarview.JarView" 779 "search completed" propList2,
      Event 1320512265147 "javax.swing.JPanel" 388 "initialized" propList1
      ]
Note a common beginner mistake, which I made initially -- I know that indentation is important in Haskell, but I still made the old-style indentation mistake of putting the final, closing square bracket in column 1. This results in an error because the closing bracket is itself part of the List assignment statement. Hence the indentation, above, on the last line of the eventList assignment. The indentation means we're still in the assignment statement.

Processing the data

Given this List of sample data, in imperative-style programming you would probably resort to a loop statement to process the individual Events. Haskell does not have a loop construct; instead, learn to think about loop-style processing of a List as
  • Recursively operating on each element of the List, or
  • Using Haskell library functions which operate on all elements of a List.
Although we'll settle on the second case, most FP books discuss the first case first, as it seems a more natural transition from imperative programming, so we will follow that approach here, too.

To introduce the idea of processing a List of data with a recursive function, let's first aim low: let's write a function that just returns a new List of the same data. This will be a simple function and looks as follows:
-- file:  c:/HaskellDev/eventProcessor/clone.hs

    clone :: [a] -> [a]

    clone (x:xs) = x : clone xs
    clone [] = []
clone takes as an argument a List, and returns a List. The first equation of clone performs a pattern match on the incoming List using the List constructor (:). If this pattern matches, the head of the List (x) is extracted from the list, and clone is called on the tail of the List. The head of the list and the return of clone will be used, as the constructor indicates, to form a new List. So we're just cloning the input List here. As recursion continues, eventually the tail will be an empty List, at which point the second equation will match and the function will return an empty List (appended to the List it has been building on each recursive call).

To see this function in action, we load it into ghci, then inspect the value of the List returned when we pass a List to clone:
*Main> :load clone.hs
    [1 of 1] Compiling Main             ( clone.hs, interpreted )
    Ok, modules loaded: Main.
    *Main> let clone1 = clone [1,2,3,4,5]
    *Main> clone1
    [1,2,3,4,5]
Now, on to something non-trivial: I want to extract, from our sample data, all Event values with a specific user ID. In clone, we took each value indiscriminately because we wanted all of them; in this case, we'll need to pattern match on the attributes of the Event. Not only that, but we'll have to find the user ID in the List of properties nested within the Event. These requirements will allow me to show an example with a little more substance than a "Hello, World" example.

Recall the format of our Event:
Event 1320512200548 "java.lang.String" 1293 "NPE in substring()" [("userId","smith"),("sessionId","ABCD1234")]
Let's say we are interested in extracting every Event with a specified user ID, and let's call the function getMessagesForUser. For each Event, once we determine that the user ID matches, we append the itemit to a List which will be the function's return value.

The way I approached this is to recognize that I first need a function that will iterate over a list of key-value pairs, searching for a key of userId and a value (actually, the second element of a 2-tuple) matching the passed-in user ID. I'll call this function matchesUserId.

Before I get started, though, I'm going to revisit my original algebraic data type Event and make it a little more user-friendly. The reason: I want to easily access the fields of an Event without having to write boilerplate accessor functions. By boilerplate I mean something like the following, to access the timestamp of an Event:
    timestamp (Event ts _ _ _ _) = ts
where you would have to supply one of these for each field you want to access. Instead, we can use Haskell record syntax to supply default accessors while defining the data type, which I modify here. While I'm at it, I'll perform a similar change to Property:
data Property = Property {
      key :: Key,
      value:: Value }
      deriving (Show)

    data Event= Event {
      timestamp :: Timestamp,
      className :: ClassName,
      lineNumber :: LineNumber,
      message :: Message,
      properties :: Properties }
      deriving (Show)
Having done this, I'll need to change my initialization of my Property values, as they are now no longer simple 2-tuples:
    prop1 = Property "sessionId" "ABCD1234"
    prop2 = Property "sessionId" "EFGH5678"
    prop3 = Property "sessionId" "WXYZ9876"
    prop4 = Property "userId" "smith"
    prop5 = Property "userId" "adams"
    prop6 = Property "userId" "jobim"
Remember that my eventProcessor.hs file creates a List of sample Event objects. Now I can use the head function to retrieve the first Event in the list, and then use the new timestamp accessor function to retrieve the timestamp, as in the following. Note also that timestamp's type signature ("takes an Event; returns a Timestamp") can be displayed in ghci:
 *Main> let evt1 = head eventList
    *Main> evt1
    Event {timestamp = 1320512200548, className = "java.lang.String", lineNumber = 1293, message = "NPE in substring()", properties = [("userId","smith"),("sessionId","ABCD1234")]}
    *Main> :type timestamp
    timestamp :: Event -> Timestamp
    *Main> timestamp evt1
    1320512200548
Now we can go back to our function matchesUserId. I decided that before I create this function, I need an additional small function that looks at a single Property and returns True if the key is userId and the value is the passed-in user ID. I called this function isUserId and placed it in the eventProcessor.hs file, directly below the Event data type definition:
 isUserWithId :: Property -> String -> Bool
    isUserWithId prop id =
      if (key prop == "userId" && value prop == id)
        then True
        else False
Note the indentation. I have defined the type signature in a way that we'll need to discuss later, when we talk about partial functions and currying. Suffice to say the function takes a Property and a String and returns a Bool. We can test this function against some of the properties I create in this file:
 *Main> prop1
    Property {key = "sessionId", value = "ABCD1234"}
    *Main> isUserWithId prop1 "smith"
    False
    *Main> prop4
    Property {key = "userId", value = "smith"}
    *Main> isUserWithId prop4 "smith"
    True
    *Main> prop6
    Property {key = "userId", value = "jobim"}
    *Main> isUserWithId prop6 "smith"
    False
With that function completed, we can work on containsUserPropertyWithValue, which looks for the user ID in a List of properties. Here's what I came up with:
    containsUserPropertyWithValue :: [Property] -> String -> Bool
    containsUserPropertyWithValue (x:xs) id =
      if (isUserWithId x id)
        then True
        else containsUserPropertyWithValue xs id
    containsUserPropertyWithValue [] _ = False 
and we can test this against a couple of the Property Lists that we created in our sample data:
*Main> containsUserPropertyWithValue propList3 "jobim"
    True
    *Main> containsUserPropertyWithValue propList3 "smith"
    False
Before we move on, note that Haskell has a where clause, in which I can add my isUserWithId function directly to this function. It makes sense in this case; I am unlikely to need this function anywhere else besides the containsUserPropertyWithValue function. Let's try this:
  containsUserPropertyWithValue :: [Property] -> String -> Bool
    containsUserPropertyWithValue (x:xs) id =
      if (isUserWithId x id)
        then True
        else containsUserPropertyWithValue xs id
      where
        isUserWithId prop id =
          if (key prop == "userId" && value prop == id)
            then True
            else False
    containsUserPropertyWithValue [] _ = False
and I can verify that everything works as before, except that I now no longer have an isUserWithId function in scope:
*Main> isUserWithId prop4 "smith"

    :1:1: Not in scope: `isUserWithId'

This function was defined within another function only and so is no longer available to me as a standalone function.

The final step is to use this function while iterating over the entire list of Events. As I described earlier, we won't be iterating in the usual sense. First I'll experiment with using a recursive function. On each call, the function will add an Event to the list if it matches the specified user ID. Here's what my first cut looks like:
listEventsForUser :: [Event] -> String -> [Event]
    listEventsForUser (x:xs) id =
      if (containsUserPropertyWithValue (properties x) id)
        then x : listEventsForUser xs id
        else listEventsForUser xs id
    listEventsForUser [] _ = []


If I run this against the sample set of events I generate in this file, I get exactly what I expected:
Prelude> :load eventProcessor.hs
    [1 of 1] Compiling Main             ( eventProcessor.hs, interpreted )
    Ok, modules loaded: Main.
    *Main> let evtList1 = listEventsForUser eventList "adams"
    *Main> evtList1
    [Event {timestamp = 1320513130333, className = "com.adamsresearch.jarview.JarView", lineNumber = 388, message = "fileNotFound",
properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]},
    Event {timestamp = 1320513257324, className = "com.adamsresearch.jarview.ArchiveSearch", lineNumber = 193, message = "search started13255342",
properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]},
    Event {timestamp = 1320512260122, className = "com.adamsresearch.jarview.JarView", lineNumber = 725, message = "search started",
properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]},
    Event {timestamp = 1320512263122, className = "com.adamsresearch.jarview.JarView", lineNumber = 779, message = "search completed",
properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]}]
    *Main>
(with a little bit of output formatting to make it easier to see that the correct Events have been selected).

This post has become chapter-length (although note how compact the actual working code is!), but I have one more topic to discuss. FP languages like Haskell recognize the above pattern occurs so often that there is built-in support in the language to perform operations on every element in a List, leading to extremely compact but very readable code. For example, there is the Haskell map function, which takes a function as an argument and applies that function to each element in a list. Some such functions return Lists, while others reduces Lists to a scalar value. What would be helpful in our case is the filter function, which applies a function to each List element and returns a List of items for which the filter is true.

Here is how I would have used filter in this example. First, I would like to define a function that would apply to an entire event. To make things a little more compact, I will include (in nested where clauses) my previously-defined functions, as follows:
eventGeneratedByUser :: String -> Event -> Bool
    eventGeneratedByUser id event =
      if (containsUserPropertyWithValue (properties event) id)
        then True
        else False
      where
        containsUserPropertyWithValue (x:xs) id =
          if (isUserWithId x id)
            then True
            else containsUserPropertyWithValue xs id
          where
            isUserWithId prop id =
              if (key prop == "userId" && value prop == id)
                then True
                else False
        containsUserPropertyWithValue [] _ = False

This is a good time for a "teaser" on partial functions. Note how I defined my function signature -- I put the String user ID first, then the Event. The reason for this organization is that I want to use Haskell's filter, which takes a predicate. In most examples, the predicate is something simple like the Haskell odd function, which needs no arguments. If you say filter odd [1,2,3,4,5,6], you get back [1,3,5].

What do you do if you need to pass an argument to a function (in this case, the user ID) to create the predicate? The answer lies in that odd signature, in this case

eventGeneratedByUser :: String -> Event -> Bool

I pass in a user ID and an Event. In that order, specifically. You've probably noticed by now that the signature (whether I define it, or request it from ghci) is not
    eventGeneratedByUser :: String  Event -> Bool
as you would expect from a function taking two arguments. The function appears (from the actual signature) to be a series of single-argument function invocations. The reason: all functions in Haskell actually take only one argument! Someone might correct my description, but I look on it this way: this signature defines a function that takes a String, and that newly defined function is itself a function that takes an Event. In my case, this helps a lot, because in Haskell you can take a multi-argument function and create a partial function for which you have defined some but not all of the arguments. Hence the choice for my ordering of my arguments. That ordering allows me to create a predicate which is a partial function where "user ID" has already been defined:
    *Main> let filterPred = eventGeneratedByUser "adams"

Since I have not fully specified the arguments to this function, what I get back is a function which requires only an Event to return True or False. Now I have the predicate I need for filter, and here is how I use it:
 *Main> let eventsForAdams = filter filterPred eventList
    *Main> eventsForAdams
    [Event {timestamp = 1320513130333, className = "com.adamsresearch.jarview.JarView", lineNumber = 388, 
message = "fileNotFound", properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]}, Event {timestamp = 1320513257324, className = "com.adamsresearch.jarview.ArchiveSearch", lineNumber = 193,
message = "search started13255342", properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]}, Event {timestamp = 1320512260122, className = "com.adamsresearch.jarview.JarView", lineNumber = 725,
message = "search started", properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]}, Event {timestamp = 1320512263122, className = "com.adamsresearch.jarview.JarView", lineNumber = 779,
message = "search completed", properties = [Property {key = "userId", value = "adams"},Property {key = "sessionId", value = "EFGH5678"}]}]
If I could not define a partial function in Haskell, I could not have used the Haskell filter function for this exercise, as it takes a single argument -- the predicate. This predicate (by now a single-argument function looking for an Event) is applied to each element of the Event List, and I get exactly what I wanted.

I really enjoyed creating this post. For those of you who are at the same level of FP understanding as I am, I hope you found this post both interesting and helpful.

 

From http://wayne-adams.blogspot.com/2011/11/haskell-from-oo-developers-perspective.html

Published at DZone with permission of Wayne Adams, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Damien Lepage replied on Tue, 2011/11/08 - 7:18pm

Thanks for this post Wayne. I just wanted to glance at it but you caught me and I read entirely ;o)

I come from a Java background and started learning Clojure a few months ago. So I was familiar with all the FP basics you mentioned but I still believe any pure OO programmer would easily understand too. For my part, I now have an idea what Haskell code looks like and it's interesting to see. But I'll stick to Clojure because of the Java interop and the significant amount of time I already invested.

Anyway, great job! Keep posting your experiments.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.