Cameron programs mostly in high-reliability, high-security, and/or high-performance areas for Phaseit, Inc. While enthusiastic about technologies ranging from SVG to pervasive computing, his most passionate advocacy is for end-users and the developers who serve them. Cameron has posted 3 posts at DZone. View Full User Profile

How to repair a full Unix directory

07.08.2010
| 5416 views |
  • submit to reddit

A reader wrote me this week that his bash scripts were complaining "out of memory"; what should he do? It didn't take long to get him moving again.

While my colleague Sandra Henry-Stocker usually covers this territory in her "Unix as a Second Language", the ideas involved in this episode apply nicely in common situations developers and Windows administrators encounter, so I think there is value in reporting them here. My correspondent knew that he wanted to run

    find . -type f -exec grep -i -l -H "keyword" '{}' + | xargs rm -rf

but he was getting "out of memory" because he had millions (!) of files in his directory tree, and, if I understood him correctly, was operating with an older host that only had 256 megabytes of main memory. What should he do?

My first thought:

    INTERMEDIATE_FILE=/tmp/xyz.txt
# Caution: this coding is fragile, in that it mishandles filenames which
# embed blanks. Accommodating those is a story for another day.
find . -type f -exec grep -i -l -H "$keyword" {} \; > $INTERMEDIATE_FILE
for NAME in `cat $INTERMEDIATE_FILE`
do
rm -rf $NAME
done

Did that help? "Yes!", the report came back--well, "yes and no." As I'm a big believer that long journeys begin with small steps, I found more encouragement than discouragement in the answer. Apparently the questioner needed to do several waves of cleanup, and "unrolling" the one-liner with an $INTERMEDIATE_FILE helped with some of the out-of-memory situations, but not all.

"One step at a time", I thought. After a little more negotiation, we reduced his symptoms to "out of memory" faults with....

    ls -1 >> $INTERMEDIATE_FILE

and

    find ./ -size -6k -type f >> $INTERMEDIATE_FILE

Did I have any tricks left for those?

Sure; in fact, I have a history of creating this situation for myself. I often use temporary files for various test automations I run, and, unless I'm scrupulous about cleaning up after the tests, it's easy to find myself with tens of thousands of files named, for example, /tmp/tmp${RANDOM}.log. I've often had so many of these that trying to clean up the mess with rm /tmp/tmp*log does just what my questioner described: complains "out of memory". In a case like this, it's time to "eat the elephant one bite at a time", which translates, in this case, to something like

    rm /tmp/tmp*a*.log
rm /tmp/tmp*b*.log
...
rm /tmp/tmp*[g-j]*.log
...
rm /tmp/tmp*[A-H]*.log
...

In English, the idea is to specify a subset of /tmp/tmp*.log small enough to fit in memory, but large enough to nibble away at the whole list. After slicing out a few "chunks", we quickly reduce the whole collection of remaining /tmp/tmp*.log to a manageable size, where more traditional bash programming can take over.

For find, a homologous approach would be something like

    find . -name "*a*" -size -6k -type f >> $INTERMEDIATE_FILE
find . -name "*[bc]*" -size 6k -type f >> $INTERMEDIATE_FILE
...

The excitement wasn't quite over yet, of course; situations like this seem always to have "loose ends". In the case of my questioner, he had many files whose names included non-ASCII Unicode characters. I've got plenty of tricks for dealing with those, too, including switching to Tcl for my scripting. This time, though, we started with the files whose names were easy to express, processed all of them, and then determined, to my non-surprise, that the residuum which remained was small enough that the questioner could use his usual bash coding skills. Mission accomplished.

What's the conclusion? I don't have a particularly polished aphorism to summarize what happened. I do know, though, that many cases that look like "show-stoppers" the first time encountered turn out to be easy to solve for someone with just a little more experience. If you're feeling stuck, be clear with yourself what your true requirements are, what you're getting, and what appears to constrain you. Ask for help; someone else, with a different perspective, might quickly see a way to fit together all the elements of your problem to make a solution.

There's also a lesson here about craft-work that I don't yet know how to put into words. Part of the difference between "textbook learning" and the kind of professional training that diesel mechanics, physicians, lawyers, and plumbers all practice has to do with learning how to handle novel situations. It involves thorough apprenticeship in the basics, followed by exposure to progressively more challenging variations. If rm * doesn't give you what you want, break down the * part into pieces small enough to handle.





References
Published at DZone with permission of its author, Cameron Laird. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Alessandro Santini replied on Thu, 2010/07/08 - 7:48am

Although I appreciate your effort - what does this have to do with Java?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.