Skip to Main Content
IBM Z Software


This portal is to open public enhancement requests against IBM Z Software products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Not under consideration
Categories C/C++
Created by Guest
Created on Jul 3, 2018

Improve code generation for simple functions

Improve code generation for simple functions in METAL C:

int func(const void* a, const void* b) {
return memcmp(a,b,8);
}

in ILP32 mode the following code is generated
FUNC DS 0F
STM 14,15,12(13)
LR 15,13
L 13,8(,13)
ST 15,4(,13)
@@BGN@2 DS 0H
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
L 14,@2a
L 1,@3b
LA 15,0
CLC 0(8,14),0(1)
BRE @2L2
LA 15,1
BRH @2L2
LHI 15,-1
* }
@2L2 DS 0H
DROP
L 13,4(,13)
L 14,12(,13)
BR 14

Similar code (also not preserving GPR 1, inspired by GCC):
USING @@PARMD@2,1
L 15,@a
L 1,@b
CLC 0(8,1),0(15)
IPM 15
SLL 15,2
SRA 15,30
BR 14

in LP64 mode the currently generated code is even less efficient:
FUNC DS 0FD
STMG 14,4,8(13) << 7 registers saved
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@2 DS 0H
LLILH 4,X'C6F4' << DSA established and not used
OILL 4,X'E2C1' << GPR 4 used to fill in an eyecatcher
ST 4,4(,13) << even though GPR 15 could have been used
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
LG 14,@2a
LG 15,@3b
LGHI 0,0
CLC 0(8,14),0(15)
BRE @2L4
LGHI 0,1
BRH @2L4
LGHI 0,-1
@2L4 DS 0H
LGFR 15,0
* }
@2L2 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)

if LP64 linkage actually requires GPR 1 to be preserved, the following code could be generated
USING @@PARMD@2,1
LGR 0,1
LG 15,@a
LG 1,@b
CLC 0(8,1),0(15)
LGR 1,0
IPM 15
SLLG 15,15,34
SRAG 15,15,62
BR 14
if GPR 1 can be changed as well LGRs could be avoided

The inefficiency of the generated code in LP64 mode can be easily seen in the following example
void g() {}
The generated code is
G DS 0FD
STMG 14,4,8(13)
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@1 DS 0H
LLILH 4,X'C6F4'
OILL 4,X'E2C1'
ST 4,4(,13)
USING @@AUTO@1,13
* }
@1L3 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)
BR 14 <<< only this instruction should have been generated

Idea priority Medium
  • Guest
    Reply
    |
    May 21, 2020

    Hi, after reviewing this RFE again, we have determined that this is not inline with our near term road-map.

    In addition, regarding point 1, currently there is no evidence that the suggested sequence would perform better.

    Also, after reviewing points 2 and 3, we do not think this is something that can be done within the compiler.

    As a result, this RFE is being rejected.

  • Guest
    Reply
    |
    Apr 24, 2020

    Thank you for the latest response. We are currently review it and will update the RFE once we have a response.

  • Guest
    Reply
    |
    Oct 30, 2018

    re 1) I understand that when the branch prediction is accurate the code with branching will be faster, but I am thinking more about the case of comparison functions - where the result is essentially random.
    re 2) My concern was not just about the eyecatcher, but the entire process of establishing a new savearea. I understand that once the new savearea is created the eyecatcher must be filled in, I question the creation of the new stack entry in a leaf function that does not need/use it.
    re 3) this is essentially an extreme case of point 2.

  • Guest
    Reply
    |
    Oct 19, 2018

    It looks like there are 3 issues being covered in this RFE:

    1) The (current) Branch sequence vs. the branch-less sequence with the IPM/SLL/SRA

    - We are not entirely convinced the suggest sequence would performance better.
    ie, less instructions doesn't necessary mean faster code

    2) The eyecatcher (ie. the setup of the function save area)

    - We don't think this is something we can not generate.
    It is as part of the MVS linkage convention for 64-bit.

    Please refer to the following docs:
    - Metal C Programming Guide and Reference (Function save areas)
    - MVS Programming: Assembler Service Guide (Using a Caller-Provided Save Area)

    3) saving/restore of 7 registers

    - On the surface, we would agree r4 is probably not the best choice.

    However, we would have to dig further to understand why r4 is picked.

    #3 is the only part of the RFE that we may consider to accept.

    Please let us know your thoughts. Thanks!

  • Guest
    Reply
    |
    Sep 13, 2018

    This RFE is still being investigated and requires more time.