Java: What is the limit to the number of threads you can create?
I have seen a number of tests where a JVM has 10K threads. However, what happens if you go beyond this?
My recommendation is to consider having more servers once your total
reaches 10K. You can get a decent server for $2K and a powerful one for
$10K.
Creating threads gets slower
The time it takes to create a thread increases as you create more thread. For the 32-bit JVM, the stack size appears to limit the number of threads you can create. This may be due to the limited address space. In any case, the memory used by each thread's stack add up. If you have a stack of 128KB and you have 20K threads it will use 2.5 GB of virtual memory.| Bitness | Stack Size | Max threads |
|---|---|---|
| 32-bit | 64K | 32,073 |
| 32-bit | 128K | 20,549 |
| 32-bit | 256K | 11,216 |
| 64-bit | 64K | stack too small |
| 64-bit | 128K | 32,072 |
| 64-bit | 512K | 32,072 |
Java 6 update 26 32-bit,-XX:ThreadStackSize=64
4,000 threads: Time to create 4,000 threads was 0.522 seconds 8,000 threads: Time to create 4,000 threads was 1.281 seconds 12,000 threads: Time to create 4,000 threads was 1.874 seconds 16,000 threads: Time to create 4,000 threads was 2.725 seconds 20,000 threads: Time to create 4,000 threads was 3.333 seconds 24,000 threads: Time to create 4,000 threads was 4.151 seconds 28,000 threads: Time to create 4,000 threads was 5.293 seconds 32,000 threads: Time to create 4,000 threads was 6.636 seconds After creating 32,073 threads, java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46) at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)
Java 6 update 26 32-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.525 seconds 8,000 threads: Time to create 4,000 threads was 1.239 seconds 12,000 threads: Time to create 4,000 threads was 1.902 seconds 16,000 threads: Time to create 4,000 threads was 2.529 seconds 20,000 threads: Time to create 4,000 threads was 3.165 seconds After creating 20,549 threads, java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46) at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)
Java 6 update 26 32-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.526 seconds 8,000 threads: Time to create 4,000 threads was 1.212 seconds After creating 11,216 threads, java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46) at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)
Java 6 update 26 64-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.577 seconds 8,000 threads: Time to create 4,000 threads was 1.292 seconds 12,000 threads: Time to create 4,000 threads was 1.995 seconds 16,000 threads: Time to create 4,000 threads was 2.653 seconds 20,000 threads: Time to create 4,000 threads was 3.456 seconds 24,000 threads: Time to create 4,000 threads was 4.663 seconds 28,000 threads: Time to create 4,000 threads was 5.818 seconds 32,000 threads: Time to create 4,000 threads was 6.792 seconds After creating 32,072 threads, java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46) at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)
Java 6 update 26 64-bit,-XX:ThreadStackSize=512
4,000 threads: Time to create 4,000 threads was 0.577 seconds 8,000 threads: Time to create 4,000 threads was 1.292 seconds 12,000 threads: Time to create 4,000 threads was 1.995 seconds 16,000 threads: Time to create 4,000 threads was 2.653 seconds 20,000 threads: Time to create 4,000 threads was 3.456 seconds 24,000 threads: Time to create 4,000 threads was 4.663 seconds 28,000 threads: Time to create 4,000 threads was 5.818 seconds 32,000 threads: Time to create 4,000 threads was 6.792 seconds After creating 32,072 threads, java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46) at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)
The Code
MaxThreadsMain.java
From http://vanillajava.blogspot.com/2011/07/java-what-is-limit-to-number-of-threads.html
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)
Tags:






Comments
Wojciech Kudla replied on Tue, 2011/07/26 - 5:48am
When you're using thread pools the cost of creating a thread becomes insignificant. Just out of curiosity, what is the point of measuring how JVM performs with handling this number of threads?
Jose Maria Arranz replied on Tue, 2011/07/26 - 8:36am
I think it's worth mentioning that if you're handling several thousands of threads in your application there's obviously something wrong with your design. This will seriously degrade performance because of frequent context switches
I'm sorry but this statement is FALSE.
More info here and here.
The number of concurrent alive threads running without significative degraded performance is SO big that in this case be sure your system will be painfully slow because you are trying to serve too many concurrent users/tasks, for instance try to figure many thousands of concurrent database operations in the same server, be sure the small penalty of context switching is not the real problem in this scenario.
By the way.
Xmx + MaxPermSize + (Xss * number of threads) = Max Memory of a process in OS (2Gb in 32 bit Windows)
If you set Xmx with a higher value you get more threads (Xmx has a default value).
Wojciech Kudla replied on Tue, 2011/07/26 - 12:18pm
in response to:
Jose Maria Arranz
If your threads are executing CPU-intensive jobs then large number of threads and frequent context switching will degrade the performance severely. I will be showing the impact of context switching for such scenarios in my upcoming article on the matter.
For instance, I managed to boost performance of the disruptor 4 times by assigning fixed CPU affinity to worker threads. The gain comes mainly from avoiding context switching.
However, I agree that for use cases with a lot of io-based waiting involved it makes perfect sense to employ larger number of threads.
So basically, it depends on the problem you're dealing with; large number of threads or pinning threads to specific cores cannot be treated as generic solution for all sorts of concurrency problems.
So we are both partially right and partially wrong :)
Jose Maria Arranz replied on Tue, 2011/07/26 - 2:32pm
in response to:
Wojciech Kudla
If your threads are executing CPU-intensive jobs then large number of threads and frequent context switching will degrade the performance severely
As you can read in my tests (TheServerSide.com article) I've evaluated the cost of context switching in extremely shared-none tests and the cost of thread switching is basically NONE because the results are the same in thousands of threads than in case of 1 thread per CPU core.
In my opinion the cost of context switching is a myth in modern OSs and JVMs, may be significative with a very high number of threads but in this case, no system can support so big load with a decent time to serve any unit of work, a lower number of threads does not solve the primary problem (too many concurrent users for too few cores-hardware-threads).
I don't know your case (disruptor) but trying to assign the performance problem to context switching is only possible in extremely shared-none code where the thread scheduler is intensively working.
Nicolas Bousquet replied on Tue, 2011/07/26 - 3:12pm
I think the main advantage of NIO is that you lower the cost in memory of an open connection to the client. Context swith problem is directly linked to memory anyway as loading a context that is not is the cache will be costly no matter what.
A few years ago, the maximum number of threads were severely limited by available memory. So if the cost of one active connection could be reduced to a small session footprint (say 1KB)k gain were very senssible.
Now that a 1000$ PC can come with 4 core and 16GB of RAM, dealing with a few thousand threads is no longer a problem.
The thing is not so many application in production need to maintain 10K user connected at a time, and when this need really arise, the cost of an aditionnal or more powerfull server might be more subtenable than spending all your engineering time in optimizing you manual context switching code.
Wojciech Kudla replied on Tue, 2011/07/26 - 9:13pm
in response to:
Jose Maria Arranz
On top of that, saying the cost of context switching is a myth in modern OSs and JVMs is a bold statement to make. I'd suggest more scientific and organised approach to measuring the cost of context switching:
Revisiting the Cache Interference Costs of Context Switching
Quantifying The Cost of Context Switch
Unnecessary Context Switches & the Myth of Multitasking
Also can you explain the "extremely shared-none tests" in more detail? I read your articles but could not find any information on those.
Loren Kratzke replied on Tue, 2011/07/26 - 9:19pm
in response to:
Nicolas Bousquet
Jose Maria Arranz replied on Wed, 2011/07/27 - 2:37am
in response to:
Wojciech Kudla
focusing on io-related threading problems
The second article has nothing to do with IO.
I'd suggest more scientific and organised approach
I have no time at this moment to read your cited articles, anyway be careful with its conclusions they are old stuff (around 2001), thread schedulers were crap some time ago for instance in Linux kernel and old JVMs, this is no longer true.
Also can you explain the "extremely shared-none tests" in more detail?
Read it as "extremely non-blocking tests" The TheServerSide.com article shows an example of a simple not shared variable used as a counter, apparently no blocking happens, I've tried with mathematical calculus (again non-blocking) with the same results, the cost of context switching is almost none and the more threads more use of cores because this reduces the probability to get some CPU core waiting for threads stopped/blocked, of course everything has a limit, but the limit is very high.