Sun Studio 10 Compilers - EARLY ACCESS FAQ

Last updated: 4 November 2004


Resources:

Note that HTML documentation (man pages and readme files) installed with the Sun Studio 10 Early Access bits can be found at file:/opt/SUNWspro/docs (if the product is installed in /opt). Additional information has also been provided:


FAQ:


General Topics



What is in Sun Studio 10 and why should I care about it?

Sun Studio 10 provides several major new features:

The x86 Solaris 10 platform not only provides 64-bit addressing, but also provides improved performance for many applications that would otherwise work well in 32-bit mode.

Sun Studio compiler features available on Solaris OS SPARC platforms are now available on Solaris OS x86 platforms, for both 32 and 64 bits. These include:


Is the entire toolset being offered on 64-bit x86 Solaris platforms?

Yes. C, C++ and Fortran compilers offer -xarch=amd64 mode for compiling for the AMD platform. Additionally, there is support in dbx and performance tools to analyze 64-bit binaries. Math and performance libraries are specifically tuned for AMD64 architecture. The assembler and disassembler tools have also been extended to understand new instructions and exploit new hardware. The rest of the toolset remains largely unchanged.

How do I know if the platform I'm using is running the 64-bit kernel?
At a Solaris 10 shell prompt, run the command isainfo. You should see:
amd64 i386 
You will see amd64 only if you are running the 64-bit kernel.

Getting Started on Migration to 64-bit x86 Platforms



I dont even know where to start. Can you help?

Here are some quick references to get ready/motivated: Also, see the Solaris 10 64-bit Developer's Guide

What is the best way to port to 64-bit x86?

If you are moving an application to Solaris 10 for x86 platforms for the first time (in 32-bit mode), use Studio 9 to compile and develop on Solaris 8, 9, or 10 platforms. Do the final build with Sun Studio 10 compilers. You might see a substantial performance improvement with Sun Studio 10 compilers, both for 32-bit and 64-bit code.

If you are moving from 64-bit SPARC V9, it's a straight recompile (with some of the caveats listed here).

Read Chapter 8 Converting Applications for a 64-Bit Environment in the Sun Studio 9 C User's Guide. This will be updated for 64-bit x86 with the final release of Sun Studio 10.

Also, see the Solaris 10 64-bit Developer's Guide


Will I need to recompile my SPARC code for the new 64-bit Solaris x86 systems?

Yes. The AMD Opteron 64-bit instruction set is very different from the SPARC instruction set.

What options do I use to compile 64-bit on my Opteron?

Use -xarch=amd64. You can also use -xarch=generic64, which is available on SPARC also. So you can use the same option in your makefiles for compiling codes on 64-bit x86 and 64-bit SPARC V9 processors.

For the latest Sun Studio 10 compiler option information, see the combined readme


Will I need to recompile my 32-bit Linux application for64-bit Solaris x86 platforms?

Janus will enable Linux applications to run on Solaris platforms, unchanged.

Will I be able to link Linux and 32-bit Solaris x86 code together?

No.


What does -fast expand to when compiling on x86 platforms compared with SPARC platforms?

The -fast option is a macro that can be effectively used as a starting point for tuning an executable for maximum runtime performance. -fast is a macro that can change from one release of the compiler to the next and expands to options that are target platform specific. Compile with the -# option or -xdryrun to examine the expansion of -fast, and incorporate the appropriate options of -fast into the ongoing process of tuning the executable.

Note that to compile a 64-bit x86 object with -fast you need to follow the -fast option with -xarch=amd64 on the command line. (Why? See the next item.)

   x86 SPARC
cc -D__MATHERR_ERRNO_DONTCARE
-dalign
-fns
-nofstore
-fsimple=2
-fsingle
-xarch=sse2
-xbuiltin=%all
-xcache=64/64/2:1024/64/8
-xchip=opteron
-xlibmil
-xlibmopt
-xO5
-D__MATHERR_ERRNO_DONTCARE
-fns
-fsimple=2
-fsingle
-xalias_level=basic
-xarch=v8plusa
-xbuiltin=%all
-xcache=16/32/1:4096/64/1
-xchip=ultra2
-xdepend
-xlibmil
-xlibmopt
-xmemalign=8s
-xO5
-xprefetch=auto,explicit
CC -xO5
-xarch=sse2
-xcache=64/64/2:1024/64/8
-xchip=opteron
-fsimple=2
-fns=yes
-ftrap=%none
-xlibmil
-xlibmopt
-xbuiltin=%all
-nofstore
-xO5
-xarch=v8plusa
-xcache=16/32/1:4096/64/1
-xchip=ultra2
-xmemalign=8s
-fsimple=2
-fns=yes
-ftrap=%none
-xlibmil
-xlibmopt
-xbuiltin=%all
f95

-xO5
-xarch=sse2
-xcache=64/64/2:1024/64/8
-xchip=opteron
-dalign
-fsimple=2
-fns=yes
-ftrap=common
-xlibmil
-xlibmopt
-nofstore

-xO5
-xarch=v8plusa
-xcache=16/32/1:4096/64/1
-xchip=ultra2
-xdepend=yes
-xpad=local
-xvector=yes
-xprefetch=auto,explicit
-dalign
-fsimple=2
-fns=yes
-ftrap=common
-xlibmil
-xlibmopt
-fround=nearest


Why must -xarch=amd64 also be specified following -fast?

Compiling with -fast on an 64-bit x86 (AMD64) platform is not sufficient to generate 64-bit code. You must also specify -xarch=amd64. Here's why:

The -xarch option is evaluated from left to right on the command line, so the last specification of -xarch appearing on the command line determines which value of -xarch will be used.

-fast is a macro option whose expansion includes -xtarget=native. However, even on an AMD64 platform, -xtarget=native will expand to -xarch=sse2, which is a 32-bit architecture. You also need to explicitly follow -fast on the command line with -xarch=amd64 to signal 64-bit code generation.

Be aware that the order of these two options is important. Specifying -xarch=amd64 -fast would expand to -xarch=amd64 -xarch=sse2 which still would result in 32-bit code generation. Specifying -fast -xarch=amd64 would expand to -xarch=sse2 -xarch=amd64, which would correctly signal 64-bit code generation.



AMD64 ABI Questions



Where can I find the AMD64 ABI?

http://www.x86-64.org/documentation (currently at version 0.92)

What, in short, is unique to this ABI that I should care about?

To summarize:
 Frame pointers can be optimised away, so they are optional. There is a separate eh_frame mechanism to deal with stack unwind
 Finally a note: at the compiler level, the ABI will be common between Solaris OS and Linux, thereby allowing a greater level of interoperability than  in the past.

Will I need to recompile my 64-bit Linux code for 64-bit Solaris 10 x86 platforms?

Our goal is binary compatibility between Linux and Solaris for the 64-bit AMD Opteron instruction set over a useful range of programs.

We are not yet at our goal, but we are working closely with AMD, Linux, and Solaris developers to produce a common Application Binary Interface (ABI). This document will likely result in changes to Linux, so you may need to upgrade to a newer version of Linux to get binary compatiblity.

Note, however, that ABI compatibility has limitations when files appear in different places within the file system. Furthermore, Solaris is POSIX compliant and Linux is not. So, binary compatibility will only be effective if programmers code to the common subset of Linux and Solaris.


Will I be able to link Linux and 64-bit Solaris x86 code together?

Yes, but with the caveats of the previous question.

What are the C data types differences between 32-bit (ILP32) and 64-bit (LP64) x86?

See the table below :

    Size and alignment of C types for AMD64 Architecture

C Type ILP32 LP64  
sizeof
(bytes)
Alignment (bytes)
sizeof
(bytes)
Alignment (bytes)
Integral
_Bool 1 1 1 1  
char
signed char
1 1 1 1  
unsigned char 1 1 1 1  
short
signed short
2 2 2 2  
unsigned short 2 2 2 2  
int
signed int
enum
4 4 4 4  
unsigned int 4 4 4 4  
long
signed long
4 4 8 4  
unsigned long 4 4 8 4  
long long
signed long long
8 4 8 8  
unsigned long long 8 4 8 8  
Pointer
any-type *
any-type (*) ()
8 4 8 8  
Floating Point
float
double
long double
4
8
12
4
4
4
4
8
16
4
8
16
 
Complex Types
float _Complex
double _Complex
long double _Complex
8
16
24
4
4
4
8
16
32

4
8
16

 
Imaginary Types
float _Imaginary
double _Imaginary
long double _Imaginary
4
8
12
4
4
4
4
8
16

4
8
16

 

For more information, including data type sizes and alignment on SPARC platforms, see Appendix F of the Sun Studio 9 C Compiler User's Guide.


Will my binary data files be the same between SPARC and 64-bit x86?

If they are, you got lucky. Look at the 64-bit x86 data types table above and compare that with 64-bit SPARC data types.

Will my binary data files be the same between 32-bit and 64-bit x86?

If your data files consist of an array of one integer type, except long and unsigned long, then the answer is that they probably will be the same. If your data files contain floating point types, structures or unions, they probably won't be the same.

Will I need to recompile my 32-bit x86 code to run it on 64-bit Solaris 10 x86 platforms?

No. 64-bit Solaris 10 x86 OS will run existing 32-bit Solaris x86 binaries without change.

While recompiling is not necessary, many customers will experience a boost in performance when re-compiling to 64-bit x86 code.


So why will I get a performance boost when recompiling for 64-bit x86?

There are several reasons, mostly from performance techniques that the industry has developed after the 32-bit ABI was frozen.

The AMD64 architecture has twice as many registers as 32-bit x86: 16 general registers versus 8 and 32 XMM registers versus 16. The ability of the compiler to keep data in the fastest available location is much improved.

The AMD64 ABI requires types to be aligned on their size, which enables fast loads and stores.

Rather than passing parameters in memory on the stack, the AMD64 ABI passes integer and pointer parameters in general registers and floating-point parameters in XMM registers.

The AMD64 ABI passes and returns small structures in registers. This feature will mostly benefit C++ codes.


Is it possible I would lose performance recompiling my application from 32-bit to 64-bit x86?

Yes. The potential loss of performance comes from three things, heavy use of pointers, heavy use of varargs, and heavy use of stack walkback. It is generally hard to predict whether a specific application will gain or lose performance. Your best bet is to measure the performance when compiled with both 32-bit and 64-bit x86 builds, and then choose the best.

What is the problem with pointers?

Pointers are larger. If your application data is mostly pointers to other data, and you spend most of your execution time waiting on main memory, the increased size of pointers decreases the number of pointers that fit in the cache, and will more likely saturate the bandwidth to memory, thus reducing performance.

What is the problem with varargs?

Varargs processing is relatively slow on 64-bit x86 because arguments are really packed into registers and one needs to track a fair amount of information to get the next parameter from the proper place. Normal non-varargs functions should be faster because of this approach, but the varargs functions themselves will be slower. There's not much you can do about it, so don't worry about it.

What is the problem with stack walkback?

The calling convention needs a lot of information about each function to walk back up the stack. Much of this information is stored in the executable as auxillary information, separate from the actual code. The result is that object files are much larger, often as much as twice as large as they would be on 32-bit x86. Pulling together all the information necessary to walk back up the stack means that C++ exception processing, Java exception processing, POSIX thread cancellation, etc, will be relatively slow.

What is this I hear about the frame pointer?

The AMD64 ABI permits the compiler to reuse the register that normally contains the frame pointer. The reason is that one extra register can sometimes make a significant difference in the speed of loops. Unfortunately, without the frame pointer available and in a consistent location, debugging and performance analysis tools cannot easily follow the chain of function calls. In particular, when the compiler reuses the frame pointer register, dtrace will not work. Dtrace is a Solaris 10 OS facility for whole-system performance analysis. It can help you identify the big problems in system performance. Because this facility is so important, Sun Studio compilers will not reuse the frame pointer by default.

For some applications, particularly benchmarks, the higher-level performance problems that dtrace will help you find have already been eliminated. In these circumstances, reusing the frame pointer register will provide an extra boost of speed. To make this boost more easily available, we reuse the frame pointer register with the -fast option.


How will varargs be different on 64-bit x86? After all, isnt all that stuff invisible to the users?

The AMD64 ABI requires parameters to be passed in specific registers.

So if you pass a double to a long hex printf specifier, it won't work. Example:

#define   L(d)   ((unsigned long long *) &d)[0]

int main () {
double dval = 132.674;

/* This technique won't work on AMD64 */
printf("dval = %5.2f (%llX)\n", dval, dval);

/* This technique will work on IPL32 and LP64,
SPARC or x86 */
printf("dval = %5.2f (%llX)\n", dval, L(dval));

return 0;
}

amd64% /set/vulcan/lang/intel-S2/bin/cc t.c -xarch=amd64
amd64% ./a.out
dval = 132.67 (FFFFFD7FFFFFF5B8)
dval = 132.67 (406095916872B021)
amd64%

See also: Size and alignment of C types on AMD64



Performance Questions



What kind of performance improvement will I see from Sun Studio 10 compilers?

It will generally vary by the kind of application you have and what hotspots it can present to the compiler to optimize. With SPEC, we expect to see 10% improvement in SPEC INT and about 40% in SPEC FP. Taking advantage of AMD64 hardware, instructions and memory model, can yield 7-20% improvement over 32-bit applications. With the improvements we have added to Sun Studio 10, you might see significant improvements above that threshold. To be fair, the compiler will be in a constant state of improvement up until final release of Sun Studio 10, so exact numbers would be hard to give. Here's some competitive information on SPEC:

Compilers SPEC INT SPEC FP
GCC/g77 1369 1001(estimated)
Studio9 1160 1110
Studio10 EA 1301 (estimated) 1365 (estimated)

Notes:


I heard there was stunning improvement on the STREAM benchmark. How much was it and how did you get it?

We added microvectorization and prefetching to the code generator and it boosted performance by 2x. Here are some competitive numbers; they are roughly on the same kind of box.

STREAM Numbers Copy Scale Add Triad
GCC 2140 2318 2487 2197
Studio9 2031 2089 237 1913
Studio10/V65x 2586 2454 2495 2517
Studio10/v20z 4717 4635 4275 4349
Studio10/autopar 7905 7396 7169 7220




Debugging on 64-bit Solaris 10 x86 Platforms



dbx

dbx is changing rapidly. For the latest information, see the combined readme

Is there a special version of dbx for 64-bit x86 platforms?

As with Sun Studio compilers on SPARC platforms, we ship two dbx binaries, a 32-bit dbx that can debug 32-bit programs only, and a 64-bit dbx that can debug both 32-bit and 64-bit binaries. On an x86 Solaris system running a 64-bit kernel, the 64-bit dbx is the default.

What works today in dbx?

See the latest information in the combined readme


What can't the 64-bit dbx do yet?

Again, for the latest information, see the combined readme


How do I use the 32-bit dbx on a 64-bit x86 system?

dbx -x exec32 ....

See the combined readme



Tuning 64-bit x86 Applications



Are there tools for tuning 64-bit x86 applications?

The Sun Studio Performance Tools can help find bottlenecks in C, C++, Fortran, and Java applications. In many ways, these tools are more flexible and detailed than prof and gprof. They can help answer the following kinds of questions:

For more information about the performance tools in Sun Studio, see the Developer Portal.


How do I use the Performance Tools?

First, record an application's run with Collect, then view and analyze the results with Analyzer. More details can be found at http://developers.sun.com/tools/cc/articles/perftools_tip.html


What are new features for Sun Studio 10?

See the combined readme


Are there limitations for 64-bit x86?

See the combined readme


Do I need to compile my application differently?

In general, you don't need to recompile your application. However, the ability to show full call stacks depends on the use of frame pointers. For AMD64 processors, frame pointers are used in C++, but they are disabled for C at higher levels of optimization. You may ensure use of frame pointers by compiling your application with the following options:



Porting from 64-bit SPARC V9 to AMD64

Programs that are already LP64 clean for the most part can just be compiled -xarch=amd64 and should run. Makefiles with SPARC specific compiler options may need to be adjusted.

Why does passing an int where a long was expected work on SPARC V9 but not AMD64?

Prototypes should match function signature:

    With wrong prototype             With correct prototype
-------------------- ----------------------
void insert_stc(int); void insert_stcc(long);

void string_append() { void string_append() {
insert_stc(-1); insert_stc(-1);
} }

On SPARC V9 the call to insert_stc will appear to sign extend the argument from int to a long where the wrong prototype has been used. This allows the incorrect program to function as if a correct prototype was in scope. On AMD64 a 4-byte -1 will be passed as specified by the prototype, resulting in zero extension, and incorrect or undefined execution of the program.



Copyright © 2004 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.