Strange ColdFusion Issue, JRUN Eating Up to 50% of CPU

For one of our project we recently move to Amazon EC2 to make system easily scalable. This is kind of seasonal website which has high traffic during September and October and rest of the year traffic reduced to 70% with compare to October month. I think Amazon’s usage based model is perfect for this client as he doesn’t need to pay much on his off season also we can easily scale and add new instances during peak season and cover cost that we saved during off season.

Finally we happily moved everything on cloud EC2 and working fine. Suddenly after 1 hour of so CPU usage increased to 50% and remain constance 50% for hour or so. While looking into Task manager we realize that JRUN is eating up that CPU but there wan’t much traffic at that time (average 1 user/sec) and system was happily handling this much traffic in our dedicated server. Tried to install Fusion Reactor to see realtime users and other stats but it seem that even Fusion Reactor can not figure out who is eating up CPU. For a day we keep running server as our all sites working correctly and wasn’t making much performance issue.

On next day get CPU utilization graph provided by EC2 and found strange output. It seems that CPU usage remain 50% or UP for around 2 hours or so then came back to normal and again after 1 hour to went high to 50%, and repeating same schedule for whole day.

Wait schedule???? We must be running some schedule file which causing this issue and I check for schedule task but there is no schedule task for this website. Well, ColdFusion also running some schedule tasks for its own purpose (like sending mail, running schedule task etc.). Meanwhile got busy with other assigned task and another day is over with same issue but did not bother more about that as we were still happy with site performance even though 50% of CPU was used by some hidden thread.

Day 2, come on morning and get same CPU utilization graph from EC2 but today it remain high for more than two and half hour and remain normal for one hour or so.

Day 3, Go high more than 3 hours but remain normal for one hour or so..

It seems that everyday hidden thread’s require more and more execution time but every one hour it restarted. So I am too confident that there is some scheduled thread causing this issue which running after every one hour after previous thread stop. I have submitted question on stackoverflow.com hoping someone has already face same kind of issue and give solution for this. And as I expected Mark Brunt has replied me on this post with some really useful link to find out running thread on JRUN. But it seems those steps I had already performed before a day and didn’t get any useful information. Although it provide running thread count but not thread detailed information so I look for tool which can give me list of running threads but this time I search for tool which can monitor Java application instead of ColdFusion as I know that my application wasn’t causing this issue but any ColdFusion internal scheduler causing whole issue.

My search stop at VisualVM and jConsole, as VisualVM has very good GUI and also plugin which allow you to integrate JConsole as separate tab within VisualVM (Click here to know more about VisualVm and ColdFusion configuration). After configuring ColdFusion with VisualVM I have started reviewing for next hour (I know after an hour evil thread going to wake up) and just after an hour and 10 minutes CPU usage reach to more than 50% and when I look into thread list found many threads running at that time and can’t figure out which one is taking more CPU time. Started hunting for plugin which can give me CPU usage by thread but can’t find any for VisualVM but there is one (jTop) for jConsole. So added jConsole warper for visualVM and installed jTop (Click here for more info).

Here I got whole list of threads with high usage on top and that was ‘scheduler-1’ thread which was eating up CPU. Now I need to figure it out what was running under scheduler-1 thread and another VisualVM plugin "thread inspector" help me to figure it out and below is stacktrace for ‘Scheduler-1’

"scheduler-1" – Thread t@128

java.lang.Thread.State: RUNNABLE
at coldfusion.tagext.NativeCfx.processRequest(Native Method)
at coldfusion.tagext.lang.RegistryTag.doStartTag(RegistryTag.java:96)
at coldfusion.runtime.ClientScopeServiceImpl.PurgeRegistry(ClientScopeServiceImpl.java:598)
at coldfusion.runtime.ClientScopeServiceImpl.PurgeExpiredClients(ClientScopeServiceImpl.java:618)
at coldfusion.runtime.ClientScopeServiceImpl$ClientScopePurger.run(ClientScopeServiceImpl.java:787)
at coldfusion.scheduling.ThreadPool.run(ThreadPool.java:201)
at coldfusion.scheduling.WorkerThread.run(WorkerThread.java:71)

It seems like PurgeRegistry class causing this issue but where ColdFusion using registry and why we require to purge it. Yes, you catch it, that’s CLIENT variable, for window ColdFusion by default client variables stored in registry and every 1 hour and 7 minutes ColdFusion run schedule file to purge older client variables (have you notice schedule time 1 hour and 7 mins). Server has around 5 sites hosted so not sure any website is really using client variables or not so didn’t disable the client variable. So I plan to change client variable storage to datasource instead of registry as this is most preferable option to store CLIENT variables. After changing it to datasource restarted ColdFusion service to make sure changes has been applied and thought that issue has been resolved now. After one hour again CPU usage rise to 50%, oh… now what, I had already disabled registry option and we are storing it to database. Check VisualVM and same thread "scheduler-1" with class PurgeRegistry eating up CPU and this is because we disabled registry option but registry still have old client variables stored and ColdFusion still trying to purge those variables. Finally I went to regedit and open up node HKEY_LOCAL_MACHINE/SOFTWARE/Macromedia/ColdFusion/CurrentVersion and delete node Client which contains all my older client variables (my sites were not using client variables anywhere so it is safe to delete for me but you may need to export it before you delete). and check the CPU usage after 1 hour, 2 hour, 3 hour …. and CPU usage is normal.

Finally, my server is running NORMAL… Keep in mind do not store your client variables in registry