Before You Say "It's Not Working"
A few days back a couple of interns working in my office came up to me and said "It's not working". They were talking about a part of an application they had built. Their application was invoking an API given out by my team and it was throwing an error. They were invoking one of our Web Services through a client side Java API provided by us. Their app was running on Tomcat and that was accessing our product that runs on another Tomcat server. So they have access to both these servers. After we exchanged a few IM messages we solved the issue - but I am sure that if they once again face any issue, they would not immediately ping and ask "Why is it not working?".
After we figured out the issue, I was thinking of
sending a note to the interns on the first step they need to do when
they face any such issue and I ended up writing this.
Check the Logs. Repeat- Check the Logs.
From
what I have seen 99.99% issues get solved the moment you Check the
Logs. And Yes, many people don't do that before they report the issue.
In the present case, the WS client in the SDK was throwing an exception
and it said "An Error occurred with the client". So when they
were seeing this, they actually could not make out anything. But that
was at the client side. The first thing anyone should do is check the
server logs. Server logs in this case would mean the Application logs
on the App Server where the service is hosted and the Application
server's own logs.
Many times you see that the logs are way
too much to deduce anything. They run into pages. And I have seen, this
very fact demotivates any newbie who looks at it. In that case, If the
issue can be replicated I suggest -
- Shutdown the server and client
- Backup your logs
- Cleanup the logs directory
- Restart the servers and reproduce the issue
Now
the logs are much lesser and comfortable to read. Also it would be
helpful if you can clearly separate the server start-up logs from the
subsequent logging that happened during the issue replication. Now read
that line by line and most of the times the issue would be solved. If
not, read every line of the logs few more times. Yes, Few more times.
Still unable to figure out?
Check the logging level
Every
logging framework provides you various logging levels, that you can
configure. Most of them have a "debug" logging mode, which when
enabled- logs the most granular details. Enable this level of logging
on the application and the server. Well written applications log their
code flow on to the debug level. Hence with this level of logging set,
there is a very high chance that you get to reach the root of the
problem.
Even if you are not able to figure out the issue even
after looking at the debug level logs, you still have helped the person
who has to take a look at them. You have reduced the turnaround time.
These debug logs will definitely help the API/Service developer who is
going to look at your issue. He now has most of the information with
him.
Not everything on the Server - There's client too
Most
Apps are like this - You write a client program that accesses a service
running on a server. The Application that runs on the server would also
provide client side API's that would do the task of marshalling your
arguments and passing it on the wire through SOAP/RMI etc. A whole lot
of things happen between the time you invoke the client API and the
time the message is sent on the wire (Goes out of this VM). The same is
the case when you get the return message - from the time the message
comes through the wire and till the time it's handed over to your
client program, in the format it accepts. Typically these are
marshalling/unmarshalling tasks carried out by the client side API's.
Now the concern is - what happens if something goes wrong here?
Most
times we ignore the logging/trace on the client side. Yes, on the
client side too you could configure the logging level. The way you do
this depends on the logging framework that the API you are accessing
uses. Most times you would have to place a logging config properties
file in the client classpath. The logging framework on the client API
that you use, loads this file and determines the level of logging and
the appender/handler to which logs have to be written.
So once you
have the client side logs, you would be able to figure out what really
was sent from the client, what it received and other info related to
marshalling/unmarshalling etc. One suggestion that I have here - In
case you have access to both the client and server ensure that both of
them have the same system time. With this, you could easily track the
flow from client to server and the way back with respect to time.
All
these really help even if you are debugging an issue on something that
you have developed or are developing - Something that can be done
before debugging your code by placing breakpoints or even worse - by
adding those System.out statements.
Too many threads?
Imagine the case where you have too many threads in your application that are doing their work simultaneously. The log records might be jumbled together. In such a scenario, to be able to uniquely identify and track the flow of one particular thread, you could have an identifier in the log record that contains the thread name.
Logging only failures?
A lot of times when your application is running on production, it would not be a very good idea to log all of the threads on the server side. You would be interested in logging only the failures. In such a case, what you typically do is - Accumulate the log records for each thread and at a later point of time depending on the success or failure of that thread, you decide whether to write its records to the log files or not. Note that these records are in the memory for the whole duration of the operation - and that could be quite expensive.
Beware
Most
application servers would have their own logging framework which logs
the tasks performed by the server. In certain logging frameworks you
need to tell the framework when to start and stop logging on a thread.
So when the server assigns a particular thread from a pool to your
request it would do the job of telling the framework to start the
logging. So If you explicitly create threads on an application server,
these threads might not be logging information at all, though your code
might have log statements.
Another important thing to note is not all issues that you encounter will get
replicated even one more time. A lot of times I have seen that the
moment you change the logging level or add a new log statement the
issue goes away. Often this the case with issues related to
multi-threaded programs, where even one new instruction would change the
code dynamics or timing.
Solved. What should I do?
If a log information really helped you to better understand an issue, you then have a very important learning. You too should "Log" in your application. Many times I end up spending more time determining what should be logged and at what levels than the time I take to write the code. After all, logging is an art.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Developer Dude replied on Tue, 2009/07/14 - 1:18pm
Some logging frameworks like Log4J allow you to setup your logging such that you can have different statements go to different logs. This is helpful as it is easier to find things. I personally like to have security events go to a separate log from say debug events. So, a stable deployed server app (or suite of apps, or services, or whatever) could have separate log files for status, security events, debug, errors - and these will be much easier to parse (manually or automatically) - it just takes a little configuration and maybe writing a simple filter or two. Especially true if your debug logging statements are going to contain dumps of XML, or JSON, or whatever - it can get messy (and large) real quick.
I have noticed some projects (like JasperServer) don't make use of even the basic built-in features of a log framework. Something as simple as adding the date to the statement really helps, as does having a 'rolling' appender where each day gets a new log file. Simple things like that make it much easier to find problems. Fortunately it was easy for me to change the log configuration, but it should have been the default.
As for client/server interactions, I am an advocate of attaching a GUID to an error log statement that gets reported to a client. That GUID is displayed in the client with the error message so that the user can report this back to their admin and the admin or dev can look up which statement is tied to the error and not just guess based on the poor recollection of the user as to what date/time the problem happened. Even if they have the exact correct time, a lot of stuff could be going on and the log statement for a given time might not at all be related to the error the client sees.
Andreou Dimitris replied on Wed, 2009/07/15 - 4:06am
Developer Dude replied on Fri, 2009/08/07 - 6:55pm
Andreou,
There is a tension between giving the user such info as a stack trace and just telling them something went wrong. It depends a lot on your audience. If they are technical then they may appreciate a stack trace. If not, then not - some users panic when they see anything beyond a simple error. I have more than once been asked to really dumb it down to the point of telling them nothing except to see their admin - and I can see their point.
Either way, having a GUID attached that can allow the person diagnosing the problem to look for a specific log entry can really help, whether that person is the user of some poor person in operations at the server backend.
As for a legible error, yes, you can often map some error code to some kind of user friendly message. That is another desirable thing to do, but the GUID is still helpful in these cases for drilling down to get more information when needed.
Unfortunately, I rarely see dev teams go to this extent - these kinds of niceties are low priority for them. But it is like unit testing - if you spend the time to provide this kind of traceability and diagnostic info, you can spend less time running in circles trying to figure out a problem when it is encountered and therefore less time fixing it, and more time writing code. Laying down this infrastructure pays dividends in the long run, but few dev teams (or businesses) are managed that way - so we muddle along. I mean this isn't rocket science or something extremely controversial here - it isn't like we are talking about AOP v. OOP v. DDD here, just a simple robust logging setup.
*sigh*