Big Data/Analytics Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 543 posts at DZone. You can read more from them at their website. View Full User Profile

Thoughts on Software Development R: Apply a Custom Function Across Multiple Lists

07.17.2014
| 3466 views |
  • submit to reddit

In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items.

If we just want to use a built in function such as subtraction between two lists it’s quite easy to do:

> c(10,9,8,7,6,5,4,3,2,1) - c(5,4,3,4,3,2,2,1,2,1)
 [1] 5 5 5 3 3 3 2 2 0 0

I wanted to do a slight variation on that where instead of returning the difference I wanted to return a text value representing the difference e.g. ’5 or more’, ’3 to 5′ etc.

I spent a long time trying to figure out how to do that before finding an excellent blog post which describes all the different ‘apply’ functions available in R.

As far as I understand ‘apply’ is the equivalent of ‘map’ in Clojure or other functional languages.

In this case we want the mapply variant which we can use like so:

> mapply(function(x, y) { 
    if((x-y) >= 5) {
        "5 or more"
    } else if((x-y) >= 3) {
        "3 to 5"
    } else {
        "less than 5"
    }    
  }, c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

We could then pull that out into a function if we wanted:

summarisedDifference <- function(one, two) {
  mapply(function(x, y) { 
    if((x-y) >= 5) {
      "5 or more"
    } else if((x-y) >= 3) {
      "3 to 5"
    } else {
      "less than 5"
    }    
  }, one, two)
}

which we could call like so:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

I also wanted to be able to compare a list of items to a single item which was much easier than I expected:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1), 1)
 [1] "5 or more"   "5 or more"   "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"     
 [8] "less than 5" "less than 5" "less than 5"

If we wanted to get a summary of the differences between the lists we could plug them into ddply like so:

> library(plyr)
> df = data.frame(x=c(10,9,8,7,6,5,4,3,2,1), y=c(5,4,3,4,3,2,2,1,2,1))
> ddply(df, .(difference=summarisedDifference(x,y)), summarise, count=length(x))
   difference count
1      3 to 5     3
2   5 or more     3
3 less than 5     4
Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)