Several years of experience in designing and building products. My roles have spanned from writing code to leading development for core technical features of Java EE App Server (Container Development), Business Intelligence Server, Web Service Stack, and various collaboration tools. Presently work on development a BI Search Engine for Cognos Software at IBM. Committer on the open source Apache Web Services Stack – Apache CXF. Member on the JSR 303 expert group - Bean Validation spec. Bharath has posted 3 posts at DZone. View Full User Profile

Before You Say "It's Not Working"

07.14.2009
| 10841 views |
  • submit to reddit

A few days back a couple of interns working in my office came up to me and said "It's not working". They were talking about a part of an application they had built. Their application was invoking an API given out by my team and it was throwing an error. They were invoking one of our Web Services through a client side Java API provided by us. Their app was running on Tomcat and that was accessing our product that runs on another Tomcat server. So they have access to both these servers. After we exchanged a few IM messages we solved the issue - but I am sure that if they once again face any issue, they would not immediately ping and ask "Why is it not working?".


After we figured out the issue, I was thinking of sending a note to the interns on the first step they need to do when they face any such issue and I ended up writing this.

Check the Logs. Repeat- Check the Logs.

From what I have seen 99.99% issues get solved the moment you Check the Logs. And Yes, many people don't do that before they report the issue. In the present case, the WS client in the SDK was throwing an exception and it said "An Error occurred with the client". So when they were seeing this, they actually could not make out anything. But that was at the client side. The first thing anyone should do is check the server logs. Server logs in this case would mean the Application logs on the App Server where the service is hosted and the Application server's own logs.

Many times you see that the logs are way too much to deduce anything. They run into pages. And I have seen, this very fact demotivates any newbie who looks at it. In that case, If the issue can be replicated I suggest -

  • Shutdown the server and client
  • Backup your logs
  • Cleanup the logs directory
  • Restart the servers and reproduce the issue


Now the logs are much lesser and comfortable to read. Also it would be helpful if you can clearly separate the server start-up logs from the subsequent logging that happened during the issue replication. Now read that line by line and most of the times the issue would be solved. If not, read every line of the logs few more times. Yes, Few more times.

Still unable to figure out?

Check the logging level

Every logging framework provides you various logging levels, that you can configure. Most of them have a "debug" logging mode, which when enabled- logs the most granular details. Enable this level of logging on the application and the server. Well written applications log their code flow on to the debug level. Hence with this level of logging set, there is a very high chance that you get to reach the root of the problem.
Even if you are not able to figure out the issue even after looking at the debug level logs, you still have helped the person who has to take a look at them. You have reduced the turnaround time. These debug logs will definitely help the API/Service developer who is going to look at your issue. He now has most of the information with him.

Not everything on the Server - There's client too

Most Apps are like this - You write a client program that accesses a service running on a server. The Application that runs on the server would also provide client side API's that would do the task of marshalling your arguments and passing it on the wire through SOAP/RMI etc. A whole lot of things happen between the time you invoke the client API and the time the message is sent on the wire (Goes out of this VM). The same is the case when you get the return message - from the time the message comes through the wire and till the time it's handed over to your client program, in the format it accepts.  Typically these are marshalling/unmarshalling tasks carried out by the client side API's. Now the concern is - what happens if something goes wrong here?

Most times we ignore the logging/trace on the client side. Yes, on the client side too you could configure the logging level. The way you do this depends on the logging framework that the API you are accessing uses. Most times you would have to place a logging config properties file in the client classpath. The logging framework on the client API that you use, loads this file and determines the level of logging and the appender/handler to which logs have to be written.
So once you have the client side logs, you would be able to figure out what really was sent from the client, what it received and other info related to marshalling/unmarshalling etc. One suggestion that I have here  - In case you have access to both the client and server ensure that both of them have the same system time. With this, you could easily track the flow from client to server and the way back with respect to time.

All these really help even if you are debugging an issue on something that you have developed or are developing - Something that can be done before debugging your code by placing breakpoints or even worse - by adding those System.out statements.

Too many threads?

Imagine the case where you have too many threads in your application that are doing their work simultaneously. The log records might be jumbled together. In such a scenario, to be able to uniquely identify and track the flow of one particular thread, you could have an identifier in the log record that contains the thread name.

Logging only failures?

A lot of times when your application is running on production, it would not be a very good idea to log all of the threads on the server side. You would be interested in logging only the failures. In such a case, what you typically do is - Accumulate the log records for each thread and at a later point of time depending on the success or failure of that thread, you decide whether to write its records to the log files or not. Note that these records are in the memory for the whole duration of the operation - and that could be quite expensive.

Beware

Most application servers would have their own logging framework which logs the tasks performed by the server. In certain logging frameworks you need to tell the framework when to start and stop logging on a thread. So when the server assigns a particular thread from a pool to your request it would do the job of telling the framework to start the logging. So If you explicitly create threads on an application server, these threads might not be logging information at all, though your code might have log statements.

Another important thing to note is not all issues that you encounter will get replicated even one more time. A lot of times I have seen that the moment you change the logging level or add a new log statement the issue goes away. Often this the case with issues related to multi-threaded programs, where even one new instruction would change the code dynamics or timing.

Solved. What should I do?

 If a log information really helped you to better understand an issue, you then have a very important learning. You too should "Log" in your application. Many times I end up spending more time determining what should be logged and at what levels than the time I take to write the code. After all, logging is an art.

Source: http://thoughts.bharathganesh.com

Published at DZone with permission of its author, Bharath Ganesh.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Developer Dude replied on Tue, 2009/07/14 - 1:18pm

Some logging frameworks like Log4J allow you to setup your logging such that you can have different statements go to different logs. This is helpful as it is easier to find things. I personally like to have security events go to a separate log from say debug events. So, a stable deployed server app (or suite of apps, or services, or whatever) could have separate log files for status, security events, debug, errors - and these will be much easier to parse (manually or automatically) - it just takes a little configuration and maybe writing a simple filter or two. Especially true if your debug logging statements are going to contain dumps of XML, or JSON, or whatever - it can get messy (and large) real quick.

I have noticed some projects (like JasperServer) don't make use of even the basic built-in features of a log framework. Something as simple as adding the date to the statement really helps, as does having a 'rolling' appender where each day gets a new log file. Simple things like that make it much easier to find problems. Fortunately it was easy for me to change the log configuration, but it should have been the default.

As for client/server interactions, I am an advocate of attaching a GUID to an error log statement that gets reported to a client. That GUID is displayed in the client with the error message so that the user can report this back to their admin and the admin or dev can look up which statement is tied to the error and not just guess based on the poor recollection of the user as to what date/time the problem happened. Even if they have the exact correct time, a lot of stuff could be going on and the log statement for a given time might not at all be related to the error the client sees.

Andreou Dimitris replied on Wed, 2009/07/15 - 4:06am

Or the service could send back a legible error, instead of hiding the cause of the problem in the logs. At least with jax-ws, a stack trace can be sent back (not one that necessarily contains db access info that is :)).

Developer Dude replied on Fri, 2009/08/07 - 6:55pm

Andreou,

There is a tension between giving the user such info as a stack trace and just telling them something went wrong. It depends a lot on your audience. If they are technical then they may appreciate a stack trace. If not, then not - some users panic when they see anything beyond a simple error. I have more than once been asked to really dumb it down to the point of telling them nothing except to see their admin - and I can see their point.

Either way, having a GUID attached that can allow the person diagnosing the problem to look for a specific log entry can really help, whether that person is the user of some poor person in operations at the server backend.

As for a legible error, yes, you can often map some error code to some kind of user friendly message. That is another desirable thing to do, but the GUID is still helpful in these cases for drilling down to get more information when needed.

Unfortunately, I rarely see dev teams go to this extent - these kinds of niceties are low priority for them. But it is like unit testing - if you spend the time to provide this kind of traceability and diagnostic info, you can spend less time running in circles trying to figure out a problem when it is encountered and therefore less time fixing it, and more time writing code. Laying down this infrastructure pays dividends in the long run, but few dev teams (or businesses) are managed that way - so we muddle along. I mean this isn't rocket science or something extremely controversial here - it isn't like we are talking about AOP v. OOP v. DDD here, just a simple robust logging setup.

*sigh*

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.