Welcome to "Failing 101 class"

Since the start of Project 42, I always wanted to have a "section" or "articles series" where I show some of those mistakes we all make and focus on how to try to avoid them instead of just mentioning the specific error and how to resolve it, or those times when you learn "how not to do something"

Not sure if this will end up as the beginning of the "bundle", or just "one off", but here it is:

I got recently into an issue where a Database was reporting ORA-20 errors since the system was trying to create more processes than the current maximum allowed.
The solution, after reducing the sessions created from one of the applications, was to increase the processes parameter.
During the process, I actually failed to restart the first of the instances.
During the "srvctl start" execution, the system reported "ORA-01078: failure in processing system parameters" and was not able to start.

Since the system was stable, the processes change was reverted and was actually someone else who needed to execute the change and actually found the issue that was making impossible to change the parameter and restart the system.

After my colleague shared the issue and its solution with the team and to try to avoid the issue in the future, I decided to recreate the issue on one of my Databases at home.

What I did was increase the system processes parameter as I mentioned.
In this test, I will increase it from 640 to 12000:

[oracle@rac1-node1 ~]$ sqlplus / as sysdba

SQL> show parameter processes

NAME                     TYPE    VALUE
------------------------------------ ----------- ------------------------------
[....]
processes                integer     640

SQL> alter system set processes=12000 scope=spfile;

System altered.

SQL> exit

Now, if I restart the Database instance cdb1221, I will have the same error I mentioned, but not much more clues why the system is not able to start

[oracle@rac1-node1 ~]$ srvctl stop instance -d cdb122 -i cdb1221
[oracle@rac1-node1 ~]$ srvctl start instance -d cdb122 -i cdb1221
PRCR-1013 : Failed to start resource ora.cdb122.db
PRCR-1064 : Failed to start resource ora.cdb122.db on node rac1-node1
CRS-5017: The resource action "ora.cdb122.db start" encountered the following error: 
ORA-01078: failure in processing system parameters
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rac1-node1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.cdb122.db' on 'rac1-node1' failed
[oracle@rac1-node1 ~]$ 

Checking the CRS logs mentioned, we won’t be able to see the actual issue, just the fact that system is not able to start the instance:

021-02-22 18:31:01.329 : USRTHRD:1293846272: [     INFO] {1:62087:1895} Thread:InstSta
rtThread: InstStartThread::didExceptionOccur excp:0x7f71100a3060
2021-02-22 18:31:01.329 : USRTHRD:1293846272: [     INFO] {1:62087:1895} Thread:InstStartThread: InstStartThread::didExceptionOccur excp:0x7f71100a3060
2021-02-22 18:31:01.329 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [start] InstAgent::start 380 thread exception:ORA-01078: failure in processing system parameters

2021-02-22 18:31:01.329 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.cdb122.db start" encountered the following error: 
ORA-01078: failure in processing system parameters
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rac1-node1/crs/trace/crsd_oraagent_oracle.trc".

2021-02-22 18:31:01.330 :    AGFW:1327466240: [     INFO] {1:62087:1895} Agent sending reply for: RESOURCE_START[ora.cdb122.db 1 1] ID 4098:5986
[....]
.0/dbhome_1 oracleSid:cdb1221 instanceType:1 instanceVersion:12.2.0.1.0 
2021-02-22 18:31:01.396 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [check] clsnInstConnection::makeConnectStr UsrOraEnv ,ORA_SERVER_THREAD_ENABLED=FALSE,ORACLE_BASE=
 m_oracleHome /u01/app/oracle/product/12.2.0/dbhome_1 Crshome /u01/app/19.3.0/grid
2021-02-22 18:31:01.396 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [check] clsnInstConnection::makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app
/oracle/product/12.2.0/dbhome_1/bin/oracle)(ARGV0=oraclecdb1221)(ENVS='ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1,ORACLE_SID=cdb1221,LD_LIBRARY_PATH=,ORA_SERVER_THRE
AD_ENABLED=FALSE,ORACLE_BASE=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(CONNECT_DATA=(SID=cdb1221))))
2021-02-22 18:31:01.397 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [check] InstConnection::connectInt 020 server not attached
2021-02-22 18:31:01.403 : USRTHRD:3552564992: [     INFO] {1:62087:2} Thread:UpgradeCheckThread UpgradeCheckThread running 0
2021-02-22 18:31:01.430 :CLSDYNAM:1293846272: [ora.cdb122.db]{1:62087:1895} [check] ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3701
Additional information: -1151133049
Process ID: 0
Session ID: 0 Serial number: 0

Regarding the Database alertlog, didn’t register anything since it was stopped

[oracle@rac1-node1 trace]$ tail -f /u01/app/oracle/diag/rdbms/cdb122/cdb1221/trace/alert_cdb1221.log
Archiving is disabled
2021-02-22T18:16:55.285384+00:00
JIT: pid 15083 requesting stop
2021-02-22T18:16:56.674745+00:00
freeing rdom 4
freeing rdom 3
freeing rdom 2
freeing rdom 0
2021-02-22T18:16:58.375228+00:00
Instance shutdown complete (OS id: 15083)

However, if we try to start the Instance using sqlplus, we will be able to see what is the problem:

SQL> startup
ORA-00093: pga_aggregate_limit must be between 36000M and 100000G   <<<<<<<<<<<<<<
ORA-01078: failure in processing system parameters
SQL> 

We can see the current value of that parameter and change it from the other Instance still running

SQL> show parameter pga

NAME                     TYPE    VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_limit          big integer 3154M
pga_aggregate_target             big integer 1577M

SQL> alter system set pga_aggregate_limit=36000M scope=spfile;

System altered.

SQL> 

If we try to start the system again, we will be able to see how it starts correctly:

SQL> startup
ORACLE instance started.

Total System Global Area 7365197824 bytes
Fixed Size          8809808 bytes
Variable Size        3170894512 bytes
Database Buffers     4177526784 bytes
Redo Buffers            7966720 bytes
Database mounted.
Database opened.
SQL> 

So, why was pga_aggregate_limit a problem here?
The answer is because pga_aggregate_limit needs to be at least 3 MB times the PROCESSES parameter
Since the process parameter was changed to 12000, we need 12000*3M = 36000M

Oracle Documentation pga_aggregate_limit
Oracle Documentation pga_aggregate_limit

Is obvious that some parameters will have direct influence on others and these parameters management has been simplified with 21c and future versions, since we will be able to use "Expression Based Parameters" as explained on this fantastic video by Daniel Overby Hansen for Oracle Database Upgrades and Migrations Youtube Channel



Some good lessons to remember here is to follow some basic rules:

  • If Clusterware fails to start an instance/database, always try to do it from sqlplus
  • Make sure you create a pfile from your spfile in case you need a quick change and restart the DB with it
  • Always try to do changes in rolling mode to avoid having the whole Database affected by something like this
  • If possible, have a session ready on another Instance for any quick change
Last modified: 26 February 2021

Author

Comments

Write a Reply or Comment

Your email address will not be published.