Big Data/Analytics Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 522 posts at DZone. You can read more from them at their website. View Full User Profile

R: Order by Data Frame Column and Take Top 10 Rows

07.07.2014
| 2145 views |
  • submit to reddit

I’ve been doing some ad-hoc analysis of the Neo4j London meetup group using R and Neo4j and having worked out how to group by certain keys the next step was to order the rows of the data frame.

I wanted to drill into the days on which people join the group and see whether they join it at a specific time of day. My feeling was that most people would join on a Monday morning.

The first step was to run the query using RNeo4j and then group by day and hour:

library(Rneo4j)
 
query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
         RETURN membership.joined AS joinDate"
 
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")
 
meetupMembers = cypher(graph, query)
meetupMembers$joined <- timestampToDate(meetupMembers$joinDate)
 
groupBy = function(dates, format) {
  dd = aggregate(dates, by= list(format(dates, format)), function(x) length(x))
  colnames(dd) = c("key", "count")
  dd
}
 
byDayTime = groupBy(meetupMembers$joined, "%A %H:00")

This returned quite a few rows so we’ll just display a subset of them:

> byDayTime[12:25,]
            key count
12 Friday 14:00    12
13 Friday 15:00     8
14 Friday 16:00    11
15 Friday 17:00    10
16 Friday 18:00     3
17 Friday 19:00     1
18 Friday 20:00     3
19 Friday 21:00     4
20 Friday 22:00     7
21 Friday 23:00     2
22 Monday 00:00     3
23 Monday 01:00     1
24 Monday 03:00     1
25 Monday 05:00     3

The next step was to order by the ‘count’ column which wasn’t too difficult:

> byDayTime[order(byDayTime$count),][1:10,]
              key count
2    Friday 03:00     1
3    Friday 04:00     1
4    Friday 05:00     1
5    Friday 07:00     1
17   Friday 19:00     1
23   Monday 01:00     1
24   Monday 03:00     1
46 Saturday 03:00     1
66   Sunday 06:00     1
67   Sunday 07:00     1

If we run the order function on its own we’ll see that it returns the order in which the current rows in the data frame should appear:

> order(byDayTime$count)
  [1]   2   3   4   5  17  23  24  46  66  67 109 128 129   1  21  44  47  48  81  86  87  88 108 130  16  18  22  25  45  53  64  71  75 107  19  26  49  51  55  56  58  59  61
 [44]  65  68  77  79  85 106 110 143  50  52  54  82  84 101 127 146  27  57  60  62  63  69  70  73  99 103 126 145   6  20  76  83  89 105 122 131 144   7  13  40  43  72  80
 [87] 102  39  78 100 132 147  15  94 121 123 142  14  42  74 104 137 140  12  38  92  93 111 124   8   9  11  90  96 125 139  10  32  34  36  95  97  98  28 135 136  33  35 112
[130] 113 116 134  91 141  41 115 120 133  37 119 138  31 117 118  30 114  29

The first 4 rows in our sorted data frame will be rows 2-5 from the initial data frame, which are:

           key count
2 Friday 03:00     1
3 Friday 04:00     1
4 Friday 05:00     1
5 Friday 07:00     1

So that makes sense! In our case we want to sort in descending order which we can do by prefixing the sorting variable with a minus sign:

> byDayTime[order(-byDayTime$count),][1:10,]
                key count
29     Monday 09:00    34
30     Monday 10:00    28
114   Tuesday 11:00    28
31     Monday 11:00    27
117   Tuesday 14:00    27
118   Tuesday 15:00    27
138 Wednesday 14:00    23
119   Tuesday 16:00    22
37     Monday 17:00    21
115   Tuesday 12:00    20

As expected Monday morning makes a strong showing although Tuesday afternoon is also popular which is unexpected. We’ll need to do some more investigation to figure out what’s going on there.

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)