Peter is a DZone MVB and is not an employee of DZone and has posted 155 posts at DZone. You can read more from them at their website. View Full User Profile

Java: What is the limit to the number of threads you can create?

07.25.2011
| 21206 views |
  • submit to reddit

I have seen a number of tests where a JVM has 10K threads. However, what happens if you go beyond this?

My recommendation is to consider having more servers once your total reaches 10K. You can get a decent server for $2K and a powerful one for $10K.

Creating threads gets slower

The time it takes to create a thread increases as you create more thread. For the 32-bit JVM, the stack size appears to limit the number of threads you can create. This may be due to the limited address space. In any case, the memory used by each thread's stack add up. If you have a stack of 128KB and you have 20K threads it will use 2.5 GB of virtual memory.

BitnessStack SizeMax threads
32-bit 64K32,073
32-bit128K20,549
32-bit256K11,216
64-bit 64Kstack too small
64-bit128K32,072
64-bit512K32,072
Note: in the last case, the thread stacks total 16 GB of virtual memory.

Java 6 update 26 32-bit,-XX:ThreadStackSize=64
4,000 threads: Time to create 4,000 threads was 0.522 seconds 
8,000 threads: Time to create 4,000 threads was 1.281 seconds 
12,000 threads: Time to create 4,000 threads was 1.874 seconds 
16,000 threads: Time to create 4,000 threads was 2.725 seconds 
20,000 threads: Time to create 4,000 threads was 3.333 seconds 
24,000 threads: Time to create 4,000 threads was 4.151 seconds 
28,000 threads: Time to create 4,000 threads was 5.293 seconds 
32,000 threads: Time to create 4,000 threads was 6.636 seconds 
After creating 32,073 threads, java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46)
 at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)

Java 6 update 26 32-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.525 seconds 
8,000 threads: Time to create 4,000 threads was 1.239 seconds 
12,000 threads: Time to create 4,000 threads was 1.902 seconds 
16,000 threads: Time to create 4,000 threads was 2.529 seconds 
20,000 threads: Time to create 4,000 threads was 3.165 seconds 
After creating 20,549 threads, java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46)
 at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)

Java 6 update 26 32-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.526 seconds 
8,000 threads: Time to create 4,000 threads was 1.212 seconds 
After creating 11,216 threads, java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46)
 at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)

Java 6 update 26 64-bit,-XX:ThreadStackSize=128
4,000 threads: Time to create 4,000 threads was 0.577 seconds 
8,000 threads: Time to create 4,000 threads was 1.292 seconds 
12,000 threads: Time to create 4,000 threads was 1.995 seconds 
16,000 threads: Time to create 4,000 threads was 2.653 seconds 
20,000 threads: Time to create 4,000 threads was 3.456 seconds 
24,000 threads: Time to create 4,000 threads was 4.663 seconds 
28,000 threads: Time to create 4,000 threads was 5.818 seconds 
32,000 threads: Time to create 4,000 threads was 6.792 seconds 
After creating 32,072 threads, java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46)
 at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)


Java 6 update 26 64-bit,-XX:ThreadStackSize=512
4,000 threads: Time to create 4,000 threads was 0.577 seconds 
8,000 threads: Time to create 4,000 threads was 1.292 seconds 
12,000 threads: Time to create 4,000 threads was 1.995 seconds 
16,000 threads: Time to create 4,000 threads was 2.653 seconds 
20,000 threads: Time to create 4,000 threads was 3.456 seconds 
24,000 threads: Time to create 4,000 threads was 4.663 seconds 
28,000 threads: Time to create 4,000 threads was 5.818 seconds 
32,000 threads: Time to create 4,000 threads was 6.792 seconds 
After creating 32,072 threads, java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at com.google.code.java.core.threads.MaxThreadsMain.addThread(MaxThreadsMain.java:46)
 at com.google.code.java.core.threads.MaxThreadsMain.main(MaxThreadsMain.java:16)

The Code

MaxThreadsMain.java

 

From http://vanillajava.blogspot.com/2011/07/java-what-is-limit-to-number-of-threads.html

Published at DZone with permission of Peter Lawrey, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Wojciech Kudla replied on Tue, 2011/07/26 - 5:48am

I think it's worth mentioning that if you're handling several thousands of threads in your application there's obviously something wrong with your design. This will seriously degrade performance because of frequent context switches, the need to handle many TLABs, CPU cache misses or many other more or less obvious reasons.
When you're using thread pools the cost of creating a thread becomes insignificant. Just out of curiosity, what is the point of measuring how JVM performs with handling this number of threads?

Jose Maria Arranz replied on Tue, 2011/07/26 - 8:36am

I think it's worth mentioning that if you're handling several thousands of threads in your application there's obviously something wrong with your design. This will seriously degrade performance because of frequent context switches

I'm sorry but this statement is FALSE.

More info here and here.

The number of concurrent alive threads running without significative degraded performance is SO big that in this case be sure your system will be painfully slow because you are trying to serve too many concurrent users/tasks, for instance try to figure many thousands of concurrent database operations in the same server, be sure the small penalty of context switching is not the real problem in this scenario.

By the way.

Xmx + MaxPermSize + (Xss * number of threads) = Max Memory of a process in OS (2Gb in 32 bit Windows)

If you set  Xmx with a higher value you get more threads (Xmx has a default value).

 

 

Wojciech Kudla replied on Tue, 2011/07/26 - 12:18pm in response to: Jose Maria Arranz

The statement is not entirely FALSE.
If your threads are executing CPU-intensive jobs then large number of threads and frequent context switching will degrade the performance severely. I will be showing the impact of context switching for such scenarios in my upcoming article on the matter.
For instance, I managed to boost performance of the disruptor 4 times by assigning fixed CPU affinity to worker threads. The gain comes mainly from avoiding context switching.
However, I agree that for use cases with a lot of io-based waiting involved it makes perfect sense to employ larger number of threads.
So basically, it depends on the problem you're dealing with; large number of threads or pinning threads to specific cores cannot be treated as generic solution for all sorts of concurrency problems.
So we are both partially right and partially wrong :)

Jose Maria Arranz replied on Tue, 2011/07/26 - 2:32pm in response to: Wojciech Kudla

If your threads are executing CPU-intensive jobs then large number of threads and frequent context switching will degrade the performance severely

As you can read in my tests (TheServerSide.com article) I've evaluated the cost of context switching in  extremely shared-none tests and the cost of thread switching is basically NONE because the results are the same in thousands of threads than in case of 1 thread per CPU core.

In my opinion the cost of context switching is a myth in modern OSs and JVMs, may be significative with a very high number of threads but in this case, no system can support so big load with a decent time to serve any unit of work, a lower number of threads does not solve the primary problem (too many concurrent users for too few cores-hardware-threads).

I don't know your case (disruptor) but trying to assign the performance problem to context switching is only possible in extremely shared-none code where the thread scheduler is intensively working.

 

Nicolas Bousquet replied on Tue, 2011/07/26 - 3:12pm

I think the main advantage of NIO is that you lower the cost in memory of an open connection to the client. Context swith problem is directly linked to memory anyway as loading a context that is not is the cache will be costly no matter what.

A few years ago, the maximum number of threads were severely limited by available memory. So if the cost of one active connection could be reduced to a small session footprint (say 1KB)k  gain were very senssible.

Now that a 1000$ PC can come with 4 core and 16GB of RAM, dealing with a few thousand threads is no longer a problem.

The thing is not so many application in production need to maintain 10K user connected at a time, and when this need really arise, the cost of an aditionnal or more powerfull server might be more subtenable than spending all your engineering time in optimizing you manual context switching code.

Wojciech Kudla replied on Tue, 2011/07/26 - 9:13pm in response to: Jose Maria Arranz

Your articles are focusing on io-related threading problems, which as I mentioned may be a good opportunity for employing larger number of threads. Unfortunately the approach taken to conduct the experiment seems to be debatable. What sense does it make to test IO vs NIO with Tomcat using the former and Glassfish using the latter?
On top of that, saying the cost of context switching is a myth in modern OSs and JVMs is a bold statement to make. I'd suggest more scientific and organised approach to measuring the cost of context switching:
Revisiting the Cache Interference Costs of Context Switching
Quantifying The Cost of Context Switch
Unnecessary Context Switches & the Myth of Multitasking
Also can you explain the "extremely shared-none tests" in more detail? I read your articles but could not find any information on those.

Loren Kratzke replied on Tue, 2011/07/26 - 9:19pm in response to: Nicolas Bousquet

Agreed about the 10K users. But it sure is nice knowing I can fire up 10K threads without tipping over just because of the fact. Nice to know about the overhead per thread too. And of course, much depends upon what the threads are doing, and the context switching, and general contention for resources between threads. But given a problem that would otherwise result in caching or iterative processing to solve, I think I could find uses for 10,000 threads. Very enlightening article.

Jose Maria Arranz replied on Wed, 2011/07/27 - 2:37am in response to: Wojciech Kudla

focusing on io-related threading problems

The second article has nothing to do with IO.

I'd suggest more scientific and organised approach

I have no time at this moment to read your cited articles, anyway be careful with its conclusions they are old stuff (around 2001), thread schedulers were crap some time ago for instance in Linux kernel and old JVMs, this is no longer true.

Also can you explain the "extremely shared-none tests" in more detail?

Read it as "extremely non-blocking tests" The TheServerSide.com article shows an example of a simple not shared variable used as a counter, apparently no blocking happens, I've tried with mathematical calculus (again non-blocking) with the same results, the cost of context switching is almost none and the more threads more use of cores because this reduces the probability to get some CPU core waiting for threads stopped/blocked, of course everything has a limit, but the limit is very high.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.