Introduction
Planning document for refactoring the R module in SWIG - the simplified wrapper interface generator. The purpose of this document, which doubles as a report for phase 1 of the project, is to catalog the inappropriate use of string comparisons in the R module. The string comparisons are typically applied to variable and method names of that are, in turn, derived from the code being interfaced to. This leads to a loss of information about the methods/variables and many problems.
One example is the use of specific suffixes, that are likely to clash with names in user code. For example, SWIG generates functions to allow set/get access (aka accessors) to structure members, and these functions have a specific naming scheme that parts of the R module depend on. However this naming scheme - a set suffix - frequently clashes with names used in the code for which interfaces are being generated, leading to a range of problems. The critical information being lost in this case is the fact that a function was generated by SWIG. The module should not depend on naming for this.
The next phase will replace these occurences with the appropriate use of the swig parse tree.
Background
SWIG generates C/C++ code and interpreter code to interface between an interpreted language, such as R, and C/C++ code. Each target language supported by SWIG has a swig C++ module providing the required language specific functions. There is also an extensive set of typemap configuration files that provide default behaviour that can be overridden on a per-project basis. This project is refactoring the C++ module.
Report structure
This report is a catalog of problem areas, which will be displayed as hyperlinks into a github hosted SWIG repository. The C++ module file is I’ll be examining is:
String comparisons
String comparison functions
The R module uses Strcmp, Strncmp, strcmp and strncmp for comparing strings.
Function catalog
I’ll begin with a list of functions using comparisons, before going into detail of whether the cases represent something that should be changed.
There are repeats, with different comparisons being used in the same function.
Strcmp
- getRTypeName
- getRClassNameCopyStruct
- createFunctionPointerHandler
- OutputArrayMethod
- enumDeclaration
- addAccessor
- Swig_overload_rank
- dispatchFunction
- functionWrapper
- registerClass
- classDeclaration
- generateCopyRoutines
Strncmp
- replaceInitialDash
- getRTypeName
- getRClassNameCopyStruct
- expandTypedef
- addCopyParameter
- functionWrapper
- typedefHandler
strcmp
strncmp
Call graphs
SWIGs build system allows tests to be run via callgrind. We can thus test which of the functions in the catalog are used by the current suite of R tests.
# run r test suite with callgrind
make check-r-test-suite SWIGTOOL="valgrind --tool=callgrind"
The resulting set of profile files are in Examples/test-suite/r
,
with names like callgrind.out.12345
.
KCachegrind can be used to examine them in detail. The header of each profile file lists the command used to create it, and can be used to identify the associated test.
Grepping for the function names identified above indicates that the following functions are not tested during the R test suite. (confirmed with low-tech Printf).
- OutputArrayMethod
- OutputClassMemberTable
- getRTypeName
- getRClassNameCopyStruct
Also confirmed that SimpleITK doesn’t use these functions.
TODO - write some tests or remove them.
Note that there is a “debugMode” flag in the R module. Requires recompilation to enable.
Deeper description of the functions with string comparisons
getRTypeName (not covered by tests)
Called from processType
, which is labelled as needs to be
reworked. Seems to be something of a backup if the swig type can’t be
resolved.
String comparisons used to detect “struct “, “p.” and “a(“.
No “struct “ in R code, so this must be C, and probably swig generated C?
processType
is called from a few places
getRClassNameCopyStruct (not covered by tests)
Called from generateCopyRoutines
. Need to confirm that
generateCopyRoutines is covered by tests - looks important.
Has a bunch of #if
macros. The remaining part is using comparisons
to look for, “struct “, “p.” and “a(“, same as getRTypeName.
createFunctionPointerHandler
Checks for “void” return type. Elsewhere in the function the Cmp option is used, which appears to be a more appropriate option.
OutputArrayMethod (not covered by tests)
Calls to this function are all commented out - thus may not be used. However most likely it is called from the parent class.
Comments say that is is creating R bracket operators for arrays.
Checking for “setitem” and “getitem”.
enumDeclaration
Called from the parent class, presumably.
Comparisons used to check something about typedef name. Then in bunch of hacks to deal with complex enum cases.
Note: attempting to fix this function a few years back was what got me into this mess!.
addAccessor
Tests for names ending in “_set”.
Called from functionWrapper
.
Swig_overload_rank
Complex set of tests about precedence.
Called from dispatchFunction
.
dispatchFunction
Called from functionWrapper
.
Comparisons used to test R type names.
functionWrapper
Tests for “_set”, “void” return, leading “arg”.
registerClass
Checks for “class”.
classDeclaration
Checks for “typedef”, matching tdname,
generateCopyRoutines
Checks for character typemap
replaceInitialDash
Testing for the initial dash.
expandTypedef
checks for “f” and “p.f” prefix.
addCopyParameter
Checking type for “struct”, “p.struct”, “p.”.
typedefHandler
Tests for “struct “.
OutputMemberReferenceMethod
Looks for “get” (note - not “_get”), and various forms of “operator”.
Plans
Test coverage
Need to extend tests to include those functions not currently tested. If not possible, then the functions may be redundant.
Accessors - closer investigation
Now looking in detail at a simple case that causes accessor functions to be created. The test “struct_value.i” features a pair of nested structures:
struct Foo {
int x;
};
struct Bar {
Foo a;
struct Foo b;
};
The generate R code includes accessor functions named Bar_a_get, Bar_a_set etc.
How does this happen?
membervariableHandler sets a flag indicating that it is processing class or structure members, calls its superclass, then resets the flag. The Java module uses a similar approach.
The superclass (common to all languages) sets an attribute called “memberset”, which is deleted later. The code in R::functionWrapper should check this attribute instead of the symbol name. Similar approach for read accessors. Same approach to be used across all situations in which symbol names are tested for “set/get” suffixes.
addAccessor is called later on, also from functionWrapper, and builds the R function code. It can also use the attributes. Need to be a bit careful here as it seems to handle set and get methods, but only tests for _set suffixes.
OutputMemberReferenceMethod is called twice from classDeclaration - each time with explicit values for the isSet argument, depending on settings of class_member_functions and class_member_set_functions, the two lists which are set up in addAccessor. Also called from OutputClassMemberTable.
Accessor refactoring plans
Extended tests
Include tests that break the current implementation by including member variables with “_set” and “_get” suffixes.
Changes
Use memberset and memberget attributes everywhere.
Function return type refactoring
The R::createFunctionPointerHandler uses Strcmp to check for a void return type. The first comes into play at the end of the function, indicating that a return typename needs to be processed. There’s another use, checking for void parameters.
The java code doesn’t do things in the same way. In general it uses “Cmp” instead of Strcmp. This is a subtle detail that I’ll need to work on. There doesn’t appear to be a special SWIG way of checking for void return.
Actions
Further searching and queries to developers.
Name mangling
A moderate amount of code is used to construct names that encode types. For example, getRClassNameCopyStruct constructs R class names based on C types. There are string operations use to do this. However SWIG provides operations for this purpose - e.g SwigType_ispointer. The correct tests are used intermittently in the module.
SWIG generate R code
General comments
Classes and and structures have an R side class definition, usually with
R classes are prefixed with “_p”, and inherit from a C++ reference base class. The class names gets used in attributes of functions associated with that class.
SWIG generated R functions have attributes describing input and return types.
There is extra complexity around enumerated types, which have a related structure. There’s a typemap that uses the R_class attribute to decide which enumeration is being looked up.
Thus, changing name generation procedures has to be done carefully. This is need to do things like implement the %nspace feature, which toggles inclusion of namespaces in mangled names.
Action
Search and make sure all mangling is done to SWIG standards.
Implement the %nspace feature.