DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • The Perceptron Algorithm and the Kernel Trick
  • Running PyTorch on GPUs
  • Building Intelligent AI Agents Using Semantic Kernels and Azure OpenAI Models: A Step-By-Step Guide
  • Seccomp, eBPF, and the Importance of Kernel System Call Filtering

Trending

  • The Future of Java and AI: Coding in 2025
  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning
  • Agentic AI for Automated Application Security and Vulnerability Management
  • FIPS 140-3: The Security Standard That Protects Our Federal Data

OOM relation to vm.swappiness=0 in new kernel

By 
Peter Zaitsev user avatar
Peter Zaitsev
·
Apr. 30, 14 · Interview
Likes (0)
Comment
Save
Tweet
Share
10.4K Views

Join the DZone community and get the full member experience.

Join For Free

This article was originally written by Ovais Tariq for the MySQL blog.

I have recently been involved in diagnosing the reasons behind OOM invocation that would kill the MySQL server process. Of course these servers were primarily running MySQL. As such the MySQL server process was the one with the largest amount of memory allocated.

But the strange thing was that in all the cases, there was no swapping activity seen and there were enough pages in the page cache. Ironically all of these servers were CentOS 6.4 running kernel version 2.6.32-358. Another commonality was the fact that vm.swappiness was set to 0. This is a pretty much standard practice and one that is applied on nearly every server that runs MySQL.

Looking into this further I realized that there was a change introduced in kernel 3.5-rc1 that altered the swapping behavior when “vm.swappiness=0″.

Below is the description of the commit that changed “vm.swappiness=0″ behavior, together with the diff:

$ git show fe35004fbf9eaf67482b074a2e032abb9c89b1dd
commit fe35004fbf9eaf67482b074a2e032abb9c89b1dd
Author: Satoru Moriya <satoru.moriya@hds.com>
Date: Tue May 29 15:06:47 2012 -0700
 
mm: avoid swapping out with swappiness==0
 
Sometimes we'd like to avoid swapping out anonymous memory. In
particular, avoid swapping out pages of important process or process
groups while there is a reasonable amount of pagecache on RAM so that we
can satisfy our customers' requirements.
 
OTOH, we can control how aggressive the kernel will swap memory pages with
/proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.
 
But with current reclaim implementation, the kernel may swap out even if
we set swappiness=0 and there is pagecache in RAM.
 
This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely (for global reclaim
until the amount of free pages and filebacked pages in a zone has been
reduced to something very very small (nr_free + nr_filebacked < high
watermark)).
 
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 67a4fd4..ee97530 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1761,10 +1761,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
* proportional to the fraction of recently scanned pages on
* each list that were recently referenced and in active use.
*/
- ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+ ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
ap /= reclaim_stat->recent_rotated[0] + 1;
 
- fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+ fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
spin_unlock_irq(&mz->zone->lru_lock);
 
@@ -1777,7 +1777,7 @@ out:
unsigned long scan;
scan = zone_nr_lru_pages(mz, lru);
- if (priority || noswap) {
+ if (priority || noswap || !vmscan_swappiness(mz, sc)) {
scan >>= priority;
if (!scan && force_scan)
scan = SWAP_CLUSTER_MAX;

This change was merged into the RHEL kernel 2.6.32-303:

* Mon Aug 27 2012 Jarod Wilson <jarod@redhat.com> [2.6.32-303.el6]
...
- [mm] avoid swapping out with swappiness==0 (Satoru Moriya) [787885]

This obviously changed the way we think about “vm.swappiness=0″. Previously, setting this to 0 was thought to reduce the tendency to swap userland processes but not disable that completely. As such it was expected to see little swapping instead of OOM.

This applies to all RHEL/CentOS kernels > 2.6.32-303 and to other distributions that provide newer kernels such as Debian and Ubuntu. Or any other distribution where this change has been backported as in RHEL.

Let me share with you memory zones related statistics that were logged to the system log from one of the OOM event.

Mar 11 11:01:45 db01 kernel: Node 0 DMA free:15680kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 11 11:01:45 db01 kernel: Node 0 DMA32 free:45448kB min:25140kB low:31424kB high:37708kB active_anon:1741812kB inactive_anon:520348kB active_file:4792kB inactive_file:462576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:386328kB writeback:76268kB mapped:936kB shmem:0kB slab_reclaimable:20420kB slab_unreclaimable:6964kB kernel_stack:0kB pagetables:572kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:142592 all_unreclaimable? no
Mar 11 11:01:45 db01 kernel: Node 0 Normal free:42436kB min:42316kB low:52892kB high:63472kB active_anon:3041852kB inactive_anon:643624kB active_file:340156kB inactive_file:1003512kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:979444kB writeback:22040kB mapped:15616kB shmem:180kB slab_reclaimable:41052kB slab_unreclaimable:35996kB kernel_stack:2720kB pagetables:19912kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:31552 all_unreclaimable? no

As can be seen the amount of free memory and the amount of memory in the page cache was greater than the high watermark, which prevented any swapping activity. Yet unnecessary memory pressure caused OOM to be invoked which killed the MySQL server process.

MySQL getting OOM’ed is bad for many reasons and can have an undesirable impact such as causing loss of uncommitted transactions or transactions not yet flushed to the log because of innodb_flush_log_at_trx_commit=0, or a much more heavy impact because of cold caches upon restart.

I prefer the old behavior of vm.swappiness and as such I now set it to a value of “1″. Setting vm.swappiness=0 would mean that you will now have to be much more accurate in how you configure the size of various global and session buffers.

Kernel (operating system)

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Perceptron Algorithm and the Kernel Trick
  • Running PyTorch on GPUs
  • Building Intelligent AI Agents Using Semantic Kernels and Azure OpenAI Models: A Step-By-Step Guide
  • Seccomp, eBPF, and the Importance of Kernel System Call Filtering

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!