Aaron Gutierrez
amgutier@andrew
Carnegie Mellon University
ISPC++ adds support for numeric polymorphic functions to the Intel SPMD Program Compiler. Programmers can write a single function definition and use the same function on any appropriate numeric type.
ISPC++ is a fork of the Intel SPMD Program Compiler (ISPC). ISPC aims to make writing programs that take advantage of SIMD vector instructions easy for a C or C++ programmer. Because of the target application, most functions perform computations on numeric data. Based off of the C language, an ISPC program must include multiple different definitions for each appropriate datatype.
ISPC++ removes that burden by providing polymorphic numeric types and automatically compiling multiple versions of functions for use with different datatypes. The programmer now needs to only write and maintain a single version of each function.
number pow(number b, int a) {
number out = b;
for (int i = 1; i<a; i++) {
out *= b;
}
return out;
}
export void square(uniform int N, uniform number b[], uniform number out[]) {
foreach (i = 0 ... N) {
out[i] = pow(b[i], 2);
}
}
Modifications to ISPC fall into two broad categories: adding support for polymorphic type, and expanding functions with polymorphic arguments.
To add support for polymorphic types, I created a new type class PolyType. There are three varieties: integer, floating, and number. As with the other basic types in ISPC, we can have uniform or varying, in addition to const, versions of each type. With the new types added, the parser was modified to produce the new types. Finally, the new cases produced by the new types are handled in typechecking.
After typechecking a function, we check for polymorphic parameters. If a function has any polymorphic parameters, we create all of the permutations of the function prototype. These prototypes are stored to produce the proper header file later.
When adding definitions to the declarations, we traverse the abstract syntax tree replacing the polymorphic types with the appropriate type from the function definition. After this step is done, we are left with overloaded functions and no polymorphic types. The rest of compilation occurs normally.
When writing the final output, we add wrappers around all of the polymorphic functions to enable convenient use from our C++ program.
For example, if we have a function void foo(uniform int N, uniform floating A[]), we'd have the following header file:
/* boilerplate from ISPC */
extern void foo_uni_un_3C_und_3E_(int32_t N, double * X);
extern void foo_uni_un_3C_unf_3E_(int32_t N, float * X);
#if defined(__cplusplus)
extern void foo(int32_t N, double * X) {
return foo_uni_un_3C_und_3E_(N, X);
}
extern void foo(int32_t N, float * X) {
return foo_uni_un_3C_unf_3E_(N, X);
}
#endif // __cplusplus
/* more boilerplate */
Most of the language is unchanged from ISPC. The specification can be found in the ISPC documentation.
The only change is the 3 new types. The semantics follow if you treat the types as their representative atomic types.
<type> := <variability> <typename><quant>;
| /* all the other ISPC types
;
<variability> := uniform
| varying
| /* empty, varying */
;
<typename> := integer
| floating
| number
;
<quant> := $[0-9]+
| /* empty, default quantifier */
;
integer is expanded to all of the integer types (signed and unsigned, 8, 16, 32, and 64 bits). floating is expanded to both float and double. number is expanded to the union of the set of types for integer and floating.
The quantifier is used to distinguish independent polymorphic types. If no quantifier is specified, all similar types are considered identical. Otherwise, types with different quantifiers are expanded independently. The syntax for quantifiers has changed since the start of my project to avoid conflict with vector types.
In the saxpy function bellow, we use type quantifiers to specify that the input types can vary independent of the output type.
export void saxpy(uniform int N,
uniform floating$0 scale,
uniform floating$1 X[],
uniform floating$1 Y[],
uniform floating$2 result[])
{
foreach (i = 0 ... N) {
floating$2 tmp = scale * X[i] + Y[i];
result[i] = tmp;
}
}
A function may only return a polymorphic type if that type appears in the parameters. Any polymorphic type found in a function body not found in the parameters is cause for a typechecking error.
At present, my compiler works as intended on all of my test cases.
To my first goal of maintaining compatibility with ISPC, out of 1330 test cases, I fail 1 due to a variable named number which is now the name of a type, and then 3 others that involve an implicit struct definition, which I'll accept as minimally disruptive.
Compatibility with C++ is excellent: the produced header file defines a single overloaded function for each exported polymorphic function that can be used from the namespace ispc.
Compatibility with C is worse, but functional. Each version of a polymorphic function is exported, but with a mangled name. Given that C does not support polymorphic functions, this is as best as we can hope to do, but working in C++ should be greatly preferred.
Given that the backend of the compiler is not modified, the resulting object file should be identical to non-polymorphic code. I do not have extensive benchmarks, as I assumed performance would not be an issue, but from my testing performance is not a concern.
Within ISPC, polymorphic functions are supported natively through function overloading, including with concurrency through the launch semantic.
Overall, I would consider ISPC++ a success. Aside from the issues with implicitly defined structs, my finished product matches my vision coming into this project and from my proposal.
We are able to easily make some academic observations using the new functionality. For example, I modified our square root function from the first assignment to observe single vs. double precision performance.
[sqrt float serial]: [779.727] ms
[sqrt double serial]: [785.532] ms
[sqrt float ispc]: [117.996] ms
[sqrt double ispc]: [270.264] ms
[sqrt float task ispc]: [27.195] ms
[sqrt double task ispc]: [56.185] ms
(6.61x speedup from ISPC float)
(2.91x speedup from ISPC double)
(28.67x speedup from task ISPC float)
(13.98x speedup from task ISPC double)
We can see that the speedup from vectorization is much lower with doubles, while the speedup from tasks is comparable regardless of type. I was able to run this test without duplicating any ISPC code. This conclusion isn't surprising, given that we operate on half as many values when we double the precision, but as a curious student, I can try any datatype or modify the function without duplicating definitions.
In the same way ISPC enables a programmer to easily produce programs that take advantage of modern CPU's, ISPC++ furthers that mission by reducing code duplication. The easier it is for programmers to write high performance code, the more programmers will write high performance code.
I have a proof of concept implementation working. Using aggressive textual replacement, it produces valid ISPC, with a different function for each combination of polymorphic types.
export void saxpy(uniform int N,
uniform floating<0> scale,
uniform floating<1> X[],
uniform floating<1> Y[],
uniform floating<2> result[])
{
foreach (i = 0 ... N) {
floating<2> tmp = scale * X[i] + Y[i];
result[i] = tmp;
}
}
This simple SAXPY program produces a header file with multiple overloaded definitions for the saxpy function. From the example, you can see the syntax used for polymorphic types: floating<0> represents an arbitrary floating point type, where the number is used to differentiate independent polymorphic types. The only polymorphic types allowed are floating, integer, and number. This restriction solves one of the more complicated issues with separate compilation of polymorphic code: we always generate all permutations of polymorphic functions. We end up with an exponential blow up in code size, but practical functions will likely only have a few polymorphic parameters.
The next big step will be to modify the ISPC parser and elaboration phases. My goals are largely unchanged, and I feel on track to produce a complete, working solution:
Due to illness and a realization that a preprocessor based implementation would result in redundant work, I am no longer aiming to produce a complete preprocessor based solution.
ISPC++ is a fork of the Intel SPMD Program Compiler (ISPC) project that includes support of polymorphic datatypes and functions.
ISPC aims to make writing single program multiple data (SPMD) programs easy and relatively target-agnostic. The programmer does not need to write vector intrinsics for each target platform, but can still closely reason about a programs' mapping to hardware. The language closely resembles C, and produces object files compatible with conventional C or C++ programs.
With roots in C, ISPC faces the same limitation to code reuse as C; in particular, both languages lack polymorphism. Given that ISPC functions are primarily arithmetic in nature, adding polymorphism will enable programmers to write a single procedure and use it with any vector-supported datatype. A single function definition can be used with different sized datatypes or even both floating point and integer versions.
Similarly to how ISPC frees the programmer from re-writing functions using different versions of intrinsics, ISPC++ will extend the abstraction to include different operands.
The root of the challenge stems from maintaining compatibility with ISPC and with C or C++. ISPC is a large application with support for a wide range of targets, and a successful project will not inhibit that flexibility.
This is a direct fork of the ISPC project. Hopefully, ISPC++ will share the same compatibility as ISPC, able to run across platforms and with any number of vector targets.
This project aims to
and hopefully fully supports
The first three goals are pretty hard requirements. For the last two there is some ability to have partial success. This project will be completely successful if a programmer can write a single ISPC++ function and use that function with any appropriate datatypes via overloading in C++. Any restrictions placed on expressiveness should be avoided. For example, if the solution makes use of regular expressions rather than integrating with the AST, I would consider the project only a partial success.