Starting up the AdminServer on Oracle WebLogic Server 11g hung indefinitely at the log entry "Initializing self-tuning thread pool". There was planned maintenance on the network, so we suspect it may (or may not) have been related to that.
This is the snippet from the output of startWebLogic.sh:
. . . <Aug 24, 2019 3:55:50 AM EDT> <Notice> <WebLogicServer> <BEA-000395> <Following extensions directory contents added to the end of the classpath: /u01/app/oracle/admin/aserver/soa_domain/lib/CSFUtil.jar> <Aug 24, 2019 3:55:51 AM EDT> <Info> <WebLogicServer> <BEA-000377> <Starting WebLogic Server with Java HotSpot(TM) 64-Bit Server VM Version 24.161-b13 from Oracle Corporation> <Aug 24, 2019 3:55:52 AM EDT> <Info> <Management> <BEA-141107> <Version: WebLogic Server Temporary Patch for BUG29800003 Mon May 20 03:48:59 PDT 2019 WebLogic Server 10.3.6.0.190416 PSU Patch for BUG29204678 Mon Feb 4 02:06:33 PST 2019 WebLogic Server Temporary Patch for BUG14339868 Thu Jun 27 00:39:43 CDT 2013 WebLogic Server 10.3.6.0 Tue Nov 15 08:52:36 PST 2011 1441050 > <Aug 24, 2019 3:55:57 AM EDT> <Info> <Management> <BEA-141227> <Making a backup copy of the configuration at /u01/app/oracle/admin/aserver/soa_domain/config-original.jar.> <Aug 24, 2019 3:55:58 AM EDT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING> <Aug 24, 2019 3:55:58 AM EDT> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool>
At that last line, it would hang. Forever.
No additional information is provided.
We checked and confirmed all of the following; none helped with the issue:
- Confirmed using the nc command that connectivity to other hosts, such as the database server and second administration server was fine.
- Confirmed no OS load issues, neither CPU nor I/O, by checking top, vmstat, and iostat.
- Deleted boot.properties and started the AdminServer directly using startWebLogic.sh.
- Verified that "-Djava.security.egd=file:///dev/./urandom" is already set.
- Since the binaries resided on NFS, checked and found no NFS errors in the OS logs and copied a 300 MB file in and out of NFS to local storage (took under 1 second).
- Cleared the ~/tmp and ~/cache folders on the AdminServer.
- Cleared the entire ~/data folder on the AdminServer.
- Added "securerandom.source=file:/tmp/big.random.file" to java.security.
- Changed the listen address "<listen-address>localhost</listen-address>" in config.xml to use localhost instead of the server hostname.
- Removed the Oracle Unified Directory external authentication provider to from config.xml.
As noted, there was some major planned network maintenance going on, so we anticipated issues related to that.
What was the issue in the end?
A problem with the DNS server.
Once that was resolved, everything was fine.