Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 492 posts at DZone. You can read more from them at their website. View Full User Profile

Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state

11.24.2011
| 3295 views |
  • submit to reddit

On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.

As a result we delegate from Scala code to the system unzip command like so:

def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

 

We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.

After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging

a.k.a. RTFM:

The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.

The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).

The parent process uses these streams to feed input to and get output from the subprocess.

Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.

For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.

In order to get around the problem we needed to gobble up the output stream from unzip like so:

import org.apache.commons.io.IOUtils
def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.

On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…

If anyone knows then please let me know in the comments.

 

From http://www.markhneedham.com/blog/2011/11/20/javascala-runtime-exec-hangingin-pipe_w-state

Published at DZone with permission of Mark Needham, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Silvio Bierman replied on Thu, 2011/11/24 - 3:13am

One reason it that you should both read the data from stdout and stderr. Since there is no way in general to know in what order you need to read how much from both streams the process could block writing to either stream while you are trying to read from the other one. So you would need two threads for that.

Arnaud Des_vosges replied on Thu, 2011/11/24 - 3:58am

The classic StreamGobbler trick: http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=4

 See also Commons Exec: http://commons.apache.org/exec/apidocs/org/apache/commons/exec/StreamPumper.html

 

 

Biju Scaria replied on Thu, 2011/11/24 - 5:57am

An alternative would be to use the zip library available in java itself: http://java.sun.com/developer/technicalArticles/Programming/compression/

Christian Schli... replied on Thu, 2011/11/24 - 8:40am

I wouldn't call an external process for such a simple task but instead use TrueZIP. Here's your use case: http://truezip.java.net/usecases/sbt.html 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.