NoSQL Zone is brought to you in partnership with:

Davy Suvee is the founder of Datablend. He is currently working as an IT Lead/Software Architect in the Research and Development division of a large pharmaceutical company. Required to work with big and unstructured scientific data sets, Davy gathered hands-on expertise and insights in the best practices on Big Data and NoSql. Through Datablend, Davy aims at sharing his practical experience within a broader IT environment. Davy is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

Running along the graph using Neo4J Spatial and Gephi

01.05.2012
| 7658 views |
  • submit to reddit

When I started running some years ago, I bought a Garmin Forerunner 405. It’s a nifty little device that tracks GPS coordinates while you are running. After a run, the device can be synchronized by uploading your data to the Garmin Connect website. Based upon the tracked time and GPS coordinates, the Garmin Connect website provides you with a detailed overview of your run, including distance, average pace, elevation loss/gain and lap splits. It also visualizes your run, by overlaying the tracked course on Bing and/or Google maps. Pretty cool! One of my last runs can be found here.

Apart from simple aggregations such as total distance and average speed, the Garmin Connect website provides little or no support to gain deeper insights in all of my runs. As I often run the same course, it would be interesting to calculate my average pace at specific locations. When combining the data of all of my courses, I could deduct frequently encountered locations. Finally, could there be a correlation between my average pace and my distance from home? In order to come up with answers to these questions, I will import my running data into a Neo4J Spatial datastore. Neo4J Spatial extends the Neo4J Graph Database with the necessary tools and utilities to store and query spatial data in your graph models. For visualizing my running data, I will make use of Gephi, an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs.

 

1. Extracting GPX data

The Garmin Connect website allows to download running data through various formats, including KML, TCX and GPX. GPX (the GPS Exchange Format) is a light-weight XML data format that is used for interchanging GPS data (waypoints, routes, and tracks) between applications and web services. Below, you can find a GPX extract enumerating several tracked points. Each of these points contains the GPS location, the elevation and the corresponding timestamp.

<trkpt lon="4.723870977759361" lat="51.075748661533">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:39.000Z</time>
</trkpt>
<trkpt lon="4.724105251953006" lat="51.075623352080584">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:45.000Z</time>
</trkpt>
<trkpt lon="4.724143054336309" lat="51.07560558244586">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:46.000Z</time>
</trkpt>

Based upon this data, one is able to calculate various metrics, including pace. For this, we will use GPSdings, a Java library that provides the required functionality to extract and analyze GPX data. We start by reading in a GPX file. Afterwards, we analyze the content using the GPSdings TrackAnalyzer which, amongst other metrics, calculates the pace for each point that was tracked during a run. The information we need is stored in the first segment of the first track.

// Start by reading the file and analyzing it contents
Gpx gpx = GPSDings.readGPX(new FileInputStream(file));
TrackAnalyzer analyzer = new TrackAnalyzer();
analyzer.addAllTracks(gpx);
// The garmin GPX running data contains only one track containing one segment
Trkseg track = gpx.getTrk(0).getTrkseg(0);

2. Importing GPS data in Neo4J Spatial

Neo4J Spatial is build on top of Neo4J and provides support for spatial data. Once your data is stored, spatial operations can be executed, which for instance allow to search for data within specified regions or within a specified distance of a particular point of interest. We start by setting up a Neo4J EmbeddedGraphDatabase. We then wrap it as a SpatialDatabaseService, which allows us to create an EditableLayer. EditableLayer is Neo4J’s main abstraction, which is used to define a collection of geometries. Each layer needs to be initialized with a specific GeometryEncoder, which acts a kind of adapter to map from the graph to the geometries and vice versa. In our case, we will employ the SimplePointEncoder.

// Create the graph db
graphDb = new EmbeddedGraphDatabase("var/geo");
// Wrap it as a spatial db service
spatialDb = new SpatialDatabaseService(graphDb);
// Create the layer to store our spatial data
runningLayer = (EditableLayer) spatialDb.getOrCreateLayer("running", SimplePointEncoder.class, EditableLayerImpl.class, "lon:lat");

Adding spatial data to the running layer is very easy. We start by creating a Coordinate for each point that is parsed by GPSdings. Next, we add this new coordinate to the running layer. This operation returns a SpatialDatabaseRecord which, under the hood, is just a regular Neo4J node. Hence, we can add any property we want to this node. In our case, we will add two properties. One property, named speed, indicating the (average) pace. One property, named occurrences, indicating the number of times this particular coordinate was encountered in the overall data set. Once the new coordinate is created, we connect the previous node with the newly created node through the NEXT relationship type. Hence, our graph is an enumeration of the encountered coordinates, interlinked through NEXT edges.

// Create a new coordinate for this point
Coordinate to = new Coordinate(track.getTrkpt(i).getLon().doubleValue(),track.getTrkpt(i).getLat().doubleValue());

// Add the new coordinate
torecord = runningLayer.add(runningLayer.getGeometryFactory().createPoint(to));
// Set the data accordingly
torecord.setProperty("speed", analyzer.getHorizontalSpeed(track.getTrkpt(i).getTime()));
torecord.setProperty("occurences", 1);

// Add relationship
Relationship next = fromrecord.getGeomNode().createRelationshipTo(torecord.getGeomNode(), RelTypes.NEXT);

In case a coordinate is encountered multiple times, we recalculate the average speed and increment the number of encounters.

// Recalculate average speed
double previousspeed  =  (Double)torecord.getProperty("speed");
int previousoccurences =  (Integer)torecord.getProperty("occurences");
double currentspeed = analyzer.getHorizontalSpeed(track.getTrkpt(i).getTime());
double denormalizespeed = previousspeed * previousoccurences;
double newspeed = ((denormalizespeed + currentspeed) / (previousoccurences + 1));
// Update the data accordingly
torecord.setProperty("speed",newspeed);
torecord.setProperty("occurences",previousoccurences+1);

Unfortunately, chances are low to encounter an already existing coordinate, as coordinates in a GPX file have a 15-digit precision right of the decimal point. Instead of trying to round these coordinates ourselves, we will use the Neo4J Spatial querying API. A simple nearest neighbor-search limited to 20 meters allows us to find matching coordinates. (I choose 20 meters, as 20 is a little above the average distance between two coordinates). In case we find a coordinate within this 20-meter range, we will reuse it. Otherwise, we just create a new coordinate. The full algorithm for importing multiple GPX datasets can be found below.

// Import the data from a GPX file. Boolean indicates whether data has been imported before
public void addData(File file, boolean firsttime) throws IOException, FunctionEvaluationException {

    // Start by reading the file and analyzing it contents
    Gpx gpx = GPSDings.readGPX(new FileInputStream(file));
    TrackAnalyzer analyzer = new TrackAnalyzer();
    analyzer.addAllTracks(gpx);
    // The garmin GPX running data contains only one track containing one segment
    Trkseg track = gpx.getTrk(0).getTrkseg(0);

    // Start a new transaction
    Transaction tx = graphDb.beginTx();
    // Contains the record that was added previously (in order to create a relation between the new and the previous node)
    SpatialDatabaseRecord fromrecord = null;

    // Iterate all points
    for (int i = 0; i < track.getTrkptCount(); i++) {

        // Create a new coordinate for this point
        Coordinate to = new Coordinate(track.getTrkpt(i).getLon().doubleValue(),track.getTrkpt(i).getLat().doubleValue());

        // Check whether we can find a node from which is located within a distance of 20 meters
        List<GeoPipeFlow> closests = 
            GeoPipeline.startNearestNeighborLatLonSearch(runningLayer, to, 0.02).sort("OrthodromicDistance").getMin("OrthodromicDistance").toList();
        SpatialDatabaseRecord torecord = null;

        // If first time, we add all nodes. Otherwise, we check whether we find a node that is close enough to the current location
        if (!firsttime && (closests.size() == 1)) {
            // Retrieve the node
            System.out.println("Using existing: " + closests.get(0).getProperty("OrthodromicDistance"));
            torecord = closests.get(0).getRecord();
            // Recalculate average speed
            double previousspeed  =  (Double)torecord.getProperty("speed");
            int previousoccurences =  (Integer)torecord.getProperty("occurences");
            double currentspeed = analyzer.getHorizontalSpeed(track.getTrkpt(i).getTime());
            double denormalizespeed = previousspeed * previousoccurences;
            double newspeed = ((denormalizespeed + currentspeed) / (previousoccurences + 1));
            // Update the data accordingly
            torecord.setProperty("speed",newspeed);
            torecord.setProperty("occurences",previousoccurences+1);
        }
        else {
            // New node, add it
            torecord = runningLayer.add(runningLayer.getGeometryFactory().createPoint(to));
            // Set the data accordingly
            torecord.setProperty("speed", analyzer.getHorizontalSpeed(track.getTrkpt(i).getTime()));
            torecord.setProperty("occurences", 1);
        }

        // If a previous node is available (and they are not identical), add a directed relationship between both
        if (fromrecord != null && (!fromrecord.equals(torecord)))  {
            Relationship next = fromrecord.getGeomNode().createRelationshipTo(torecord.getGeomNode(), RelTypes.NEXT);
        }
        // Previous record is put on new record
        fromrecord = torecord;
    }

    // Commit transaction
    tx.success();
    tx.finish();

}

3. Visualizing running data

By using the Neo4J Spatial querying API, we are able to retrieve the set of coordinates that satisfy a particular condition. However, coordinates are somewhat abstract to interpret. Instead, we will use the excellent Gephi Graph visualization and exploration tool. By installing the Gephi Neo4J plugin, we are able to load and explore graphs that are stored in a Neo4J (Spatial) datastore. Let’s start by importing our dataset in Gephi.

gephi

The displayed graph contains other types of nodes and edges (i.e. Layer and RTree index information), in addition to the coordinates and NEXT edges that we added ourselves. Let’s get rid of those by filtering our graph on the NEXT relationship-type.

gephi

 

Only half of the edges remain … However, we will still not gain novel insights from this mess. Let’s layout our graph by using the Gephi GeoLayout plugin. This layouter takes geocoded graphs as input and will layout graphs according to the geocoded attributes. Make sure to increase scaling, as our coordinates are located closely together. Cool! This view clearly outlines the courses I’m running.

gephi

Let’s visualize the coordinates that were frequently encountered during the 4 runs that are imported in the Neo4J Spatial datastore. For this, we will use the InDegree node property, which indicates the number of incoming edges for each coordinate. We rank node weight (i.e. node size) through this property. Hence, frequently encountered nodes will show up bigger. In my case, frequently encountered coordinates are found around the place where I live (and hence start my runs) and on street intersections.

gephi

Let’s do one final analysis, namely a visualization that illustrates the average pace throughout all runs. For this, we rank both node weight and node color through the speed property. Hence, coordinates with a high average pace are colored green and show up bigger. Coordinates with a low average pace are colored red and show up smaller. With the blink of an eye, I can now interpret my average pace, taking into account my overall running data set!

gephi

 

4. Conclusion

This article describes the use of the Neo4J Spatial datastore and Gephi to analyze Garmin running data. As always, the complete source code can be found on the Datablend public GitHub repository. Any ideas for other types of analysis that could be performed on the dataset?


Source:  http://datablend.be/?p=1255

Published at DZone with permission of Davy Suvee, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)