Sometimes, CICS statistics collection goes awry and loops endlessly. When this happens, CICS writes statistics to SMF for the same group of resources over and over in a never-ending loop. This causes CICS to become unusable and to have to be cancelled. However, in a recent occurrence, these never-ending writes to SMF resulted in SMF depleting ESQA such that an LPAR entered a wait state and had to be IPLed.
Often, this statistics-gathering loop happens while CICS is gathering terminal statistics. Let's say there are 10 terminals named A through J. CICS would normally access these terminals' control blocks one at a time, A through J, gathering the stats and writing them out to SMF. After CICS gathers stats for the last terminal, CICS would move on to gather stats for a different resource class. But when the problem happens, CICS never gets to the end of the terminals. Midway through the terminal resources, something causes CICS to start over at the beginning of the terminal resources. This results in CICS writing statistics for, for example, terminals A, B, C, and D over and over. The code will keep doing this until it gets to the end of the terminals. But because midway through CICS starts over at the beginning, CICS never makes it to the end of the terminals. CICS just writes terminal statistics to SMF for the same group of terminals over and over and over in a never-ending loop.
We have seen this behavior when there is an overlay of a CICS control block having to do with
the resource where CICS loops back to the beginning. The overlay has *not* caused the control blocks to be incorrectly chained in a loop. Rather the overlay just makes something not right about the control block. CICS behaves as if its design is to start over at the beginning of that resource class when something is not right with the control block whose stats are being gathered.
We are finding it is not possible to ensure that CICS control blocks will never be overlayed. Not all CICS regions can run with STGPROT=YES. And when STGPROT is active, some application programs need to be defined as CICSKEY. Sometimes application programs are defined as CICSKEY by mistake. Some vendor programs need to be defined as CICSKEY, or they simply run in CICSKEY. All Task Related User Exits and Global User Exits are invoked in CICS key. All of these programs can have bugs that result in overlays of CICS control blocks.
Given that overlays of CICS control blocks remain a practical reality, and given that such overlays can result in a statistics gathering loop that can bring down an LPAR, we request that CICS' statistics gathering code be enhanced as follows:
- It watches out for and detects when CICS statistics gathering is mired in a never-ending loop
- When it detects that, it fixes the problem or brings down CICS with an appropriate message.
For more information see https://www.ibm.com/docs/en/cics-ts/6.1?topic=whats-new
See Announcement letter https://www.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/2/897/ENUS222-092/index.html&request_locale=en
This is something we would like to address. The RFE is being moved into 'Planned for Future release' status. Please note:
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
This is a candidate for a future release