Lexicographically Sorting Large Files in Linux
When I hear the word “sort” my first thought is usually “Hadoop”! Yes, sorting is one thing that Hadoop does well, but if you’re working with large files in Linux the built-in sort command is often all you need.
LC_COLLATE=C sort --buffer-size=1G --temporary-directory=./tmp --unique bigfile.txt
Let’s break this command down and examine each part in detail.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)