This portal is to open public enhancement requests against IBM Z Software products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
See this idea on ideas.ibm.com
Improve code generation for simple functions in METAL C:
int func(const void* a, const void* b) {
return memcmp(a,b,8);
}
in ILP32 mode the following code is generated
FUNC DS 0F
STM 14,15,12(13)
LR 15,13
L 13,8(,13)
ST 15,4(,13)
@@BGN@2 DS 0H
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
L 14,@2a
L 1,@3b
LA 15,0
CLC 0(8,14),0(1)
BRE @2L2
LA 15,1
BRH @2L2
LHI 15,-1
* }
@2L2 DS 0H
DROP
L 13,4(,13)
L 14,12(,13)
BR 14
Similar code (also not preserving GPR 1, inspired by GCC):
USING @@PARMD@2,1
L 15,@a
L 1,@b
CLC 0(8,1),0(15)
IPM 15
SLL 15,2
SRA 15,30
BR 14
in LP64 mode the currently generated code is even less efficient:
FUNC DS 0FD
STMG 14,4,8(13) << 7 registers saved
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@2 DS 0H
LLILH 4,X'C6F4' << DSA established and not used
OILL 4,X'E2C1' << GPR 4 used to fill in an eyecatcher
ST 4,4(,13) << even though GPR 15 could have been used
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
LG 14,@2a
LG 15,@3b
LGHI 0,0
CLC 0(8,14),0(15)
BRE @2L4
LGHI 0,1
BRH @2L4
LGHI 0,-1
@2L4 DS 0H
LGFR 15,0
* }
@2L2 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)
if LP64 linkage actually requires GPR 1 to be preserved, the following code could be generated
USING @@PARMD@2,1
LGR 0,1
LG 15,@a
LG 1,@b
CLC 0(8,1),0(15)
LGR 1,0
IPM 15
SLLG 15,15,34
SRAG 15,15,62
BR 14
if GPR 1 can be changed as well LGRs could be avoided
The inefficiency of the generated code in LP64 mode can be easily seen in the following example
void g() {}
The generated code is
G DS 0FD
STMG 14,4,8(13)
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@1 DS 0H
LLILH 4,X'C6F4'
OILL 4,X'E2C1'
ST 4,4(,13)
USING @@AUTO@1,13
* }
@1L3 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)
BR 14 <<< only this instruction should have been generated
Idea priority | Medium |
By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.
Hi, after reviewing this RFE again, we have determined that this is not inline with our near term road-map.
In addition, regarding point 1, currently there is no evidence that the suggested sequence would perform better.
Also, after reviewing points 2 and 3, we do not think this is something that can be done within the compiler.
As a result, this RFE is being rejected.
Thank you for the latest response. We are currently review it and will update the RFE once we have a response.
re 1) I understand that when the branch prediction is accurate the code with branching will be faster, but I am thinking more about the case of comparison functions - where the result is essentially random.
re 2) My concern was not just about the eyecatcher, but the entire process of establishing a new savearea. I understand that once the new savearea is created the eyecatcher must be filled in, I question the creation of the new stack entry in a leaf function that does not need/use it.
re 3) this is essentially an extreme case of point 2.
It looks like there are 3 issues being covered in this RFE:
1) The (current) Branch sequence vs. the branch-less sequence with the IPM/SLL/SRA
- We are not entirely convinced the suggest sequence would performance better.
ie, less instructions doesn't necessary mean faster code
2) The eyecatcher (ie. the setup of the function save area)
- We don't think this is something we can not generate.
It is as part of the MVS linkage convention for 64-bit.
Please refer to the following docs:
- Metal C Programming Guide and Reference (Function save areas)
- MVS Programming: Assembler Service Guide (Using a Caller-Provided Save Area)
3) saving/restore of 7 registers
- On the surface, we would agree r4 is probably not the best choice.
However, we would have to dig further to understand why r4 is picked.
#3 is the only part of the RFE that we may consider to accept.
Please let us know your thoughts. Thanks!
This RFE is still being investigated and requires more time.