Improve code generation for simple functions

Improve code generation for simple functions in METAL C:

int func(const void* a, const void* b) {
return memcmp(a,b,8);
}

in ILP32 mode the following code is generated
FUNC DS 0F
STM 14,15,12(13)
LR 15,13
L 13,8(,13)
ST 15,4(,13)
@@BGN@2 DS 0H
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
L 14,@2a
L 1,@3b
LA 15,0
CLC 0(8,14),0(1)
BRE @2L2
LA 15,1
BRH @2L2
LHI 15,-1
* }
@2L2 DS 0H
DROP
L 13,4(,13)
L 14,12(,13)
BR 14

Similar code (also not preserving GPR 1, inspired by GCC):
USING @@PARMD@2,1
L 15,@a
L 1,@b
CLC 0(8,1),0(15)
IPM 15
SLL 15,2
SRA 15,30
BR 14

in LP64 mode the currently generated code is even less efficient:
FUNC DS 0FD
STMG 14,4,8(13) << 7 registers saved
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@2 DS 0H
LLILH 4,X'C6F4' << DSA established and not used
OILL 4,X'E2C1' << GPR 4 used to fill in an eyecatcher
ST 4,4(,13) << even though GPR 15 could have been used
USING @@AUTO@2,13
* return memcmp(a,b,8);
USING @@PARMD@2,1
LG 14,@2a
LG 15,@3b
LGHI 0,0
CLC 0(8,14),0(15)
BRE @2L4
LGHI 0,1
BRH @2L4
LGHI 0,-1
@2L4 DS 0H
LGFR 15,0
* }
@2L2 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)

if LP64 linkage actually requires GPR 1 to be preserved, the following code could be generated
USING @@PARMD@2,1
LGR 0,1
LG 15,@a
LG 1,@b
CLC 0(8,1),0(15)
LGR 1,0
IPM 15
SLLG 15,15,34
SRAG 15,15,62
BR 14
if GPR 1 can be changed as well LGRs could be avoided

The inefficiency of the generated code in LP64 mode can be easily seen in the following example
void g() {}
The generated code is
G DS 0FD
STMG 14,4,8(13)
LGR 15,13
LG 13,136(,13)
STG 15,128(,13)
@@BGN@1 DS 0H
LLILH 4,X'C6F4'
OILL 4,X'E2C1'
ST 4,4(,13)
USING @@AUTO@1,13
* }
@1L3 DS 0H
DROP
LG 13,128(,13)
LG 14,8(,13)
LMG 1,4,32(13)
BR 14 <<< only this instruction should have been generated

Idea priority

Medium

Post comment

Guest

May 21, 2020

Hi, after reviewing this RFE again, we have determined that this is not inline with our near term road-map.

In addition, regarding point 1, currently there is no evidence that the suggested sequence would perform better.

Also, after reviewing points 2 and 3, we do not think this is something that can be done within the compiler.

As a result, this RFE is being rejected.

Reply
Hide replies

Guest

Apr 24, 2020

Thank you for the latest response. We are currently review it and will update the RFE once we have a response.

Reply
Hide replies

Guest

Oct 30, 2018

re 1) I understand that when the branch prediction is accurate the code with branching will be faster, but I am thinking more about the case of comparison functions - where the result is essentially random.
re 2) My concern was not just about the eyecatcher, but the entire process of establishing a new savearea. I understand that once the new savearea is created the eyecatcher must be filled in, I question the creation of the new stack entry in a leaf function that does not need/use it.
re 3) this is essentially an extreme case of point 2.

Reply
Hide replies

Guest

Oct 19, 2018

It looks like there are 3 issues being covered in this RFE:

1) The (current) Branch sequence vs. the branch-less sequence with the IPM/SLL/SRA

- We are not entirely convinced the suggest sequence would performance better.
ie, less instructions doesn't necessary mean faster code

2) The eyecatcher (ie. the setup of the function save area)

- We don't think this is something we can not generate.
It is as part of the MVS linkage convention for 64-bit.

Please refer to the following docs:
- Metal C Programming Guide and Reference (Function save areas)
- MVS Programming: Assembler Service Guide (Using a Caller-Provided Save Area)

3) saving/restore of 7 registers

- On the surface, we would agree r4 is probably not the best choice.

However, we would have to dig further to understand why r4 is picked.

#3 is the only part of the RFE that we may consider to accept.

Please let us know your thoughts. Thanks!

Reply
Hide replies

Guest

Sep 13, 2018

This RFE is still being investigated and requires more time.

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

Improve code generation for simple functions

Please enter your email address

RELATED IDEAS

Improve code generation for simple functions