View on GitHub

Refactoring and updating the SWIG R module

Planning documents and updates on this R Infrastructure Steering Committee funded project

Introduction

Planning document for refactoring the R module in SWIG - the simplified wrapper interface generator. The purpose of this document, which doubles as a report for phase 1 of the project, is to catalog the inappropriate use of string comparisons in the R module. The string comparisons are typically applied to variable and method names of that are, in turn, derived from the code being interfaced to. This leads to a loss of information about the methods/variables and many problems.

One example is the use of specific suffixes, that are likely to clash with names in user code. For example, SWIG generates functions to allow set/get access (aka accessors) to structure members, and these functions have a specific naming scheme that parts of the R module depend on. However this naming scheme - a set suffix - frequently clashes with names used in the code for which interfaces are being generated, leading to a range of problems. The critical information being lost in this case is the fact that a function was generated by SWIG. The module should not depend on naming for this.

The next phase will replace these occurences with the appropriate use of the swig parse tree.

Background

SWIG generates C/C++ code and interpreter code to interface between an interpreted language, such as R, and C/C++ code. Each target language supported by SWIG has a swig C++ module providing the required language specific functions. There is also an extensive set of typemap configuration files that provide default behaviour that can be overridden on a per-project basis. This project is refactoring the C++ module.

Report structure

This report is a catalog of problem areas, which will be displayed as hyperlinks into a github hosted SWIG repository. The C++ module file is I’ll be examining is:

swig/Source/Modules/r.cxx

String comparisons

String comparison functions

The R module uses Strcmp, Strncmp, strcmp and strncmp for comparing strings.

Function catalog

I’ll begin with a list of functions using comparisons, before going into detail of whether the cases represent something that should be changed.

There are repeats, with different comparisons being used in the same function.

Strcmp

  1. getRTypeName
  2. getRClassNameCopyStruct
  3. createFunctionPointerHandler
  4. OutputArrayMethod
  5. enumDeclaration
  6. addAccessor
  7. Swig_overload_rank
  8. dispatchFunction
  9. functionWrapper
  10. registerClass
  11. classDeclaration
  12. generateCopyRoutines

Strncmp

  1. replaceInitialDash
  2. getRTypeName
  3. getRClassNameCopyStruct
  4. expandTypedef
  5. addCopyParameter
  6. functionWrapper
  7. typedefHandler

strcmp

  1. OutputClassMemberTable
  2. OutputMemberReferenceMethod
  3. main

strncmp

  1. getRClassNameCopyStruct

Call graphs

SWIGs build system allows tests to be run via callgrind. We can thus test which of the functions in the catalog are used by the current suite of R tests.

# run r test suite with callgrind
make check-r-test-suite SWIGTOOL="valgrind --tool=callgrind"

The resulting set of profile files are in Examples/test-suite/r, with names like callgrind.out.12345.

KCachegrind can be used to examine them in detail. The header of each profile file lists the command used to create it, and can be used to identify the associated test.

Grepping for the function names identified above indicates that the following functions are not tested during the R test suite. (confirmed with low-tech Printf).

  1. OutputArrayMethod
  2. OutputClassMemberTable
  3. getRTypeName
  4. getRClassNameCopyStruct

Also confirmed that SimpleITK doesn’t use these functions.

TODO - write some tests or remove them.

Note that there is a “debugMode” flag in the R module. Requires recompilation to enable.

Deeper description of the functions with string comparisons

getRTypeName (not covered by tests)

Called from processType, which is labelled as needs to be reworked. Seems to be something of a backup if the swig type can’t be resolved.

String comparisons used to detect “struct “, “p.” and “a(“.

No “struct “ in R code, so this must be C, and probably swig generated C?

processType is called from a few places

getRClassNameCopyStruct (not covered by tests)

Called from generateCopyRoutines. Need to confirm that generateCopyRoutines is covered by tests - looks important.

Has a bunch of #if macros. The remaining part is using comparisons to look for, “struct “, “p.” and “a(“, same as getRTypeName.

createFunctionPointerHandler

Checks for “void” return type. Elsewhere in the function the Cmp option is used, which appears to be a more appropriate option.

OutputArrayMethod (not covered by tests)

Calls to this function are all commented out - thus may not be used. However most likely it is called from the parent class.

Comments say that is is creating R bracket operators for arrays.

Checking for “setitem” and “getitem”.

enumDeclaration

Called from the parent class, presumably.

Comparisons used to check something about typedef name. Then in bunch of hacks to deal with complex enum cases.

Note: attempting to fix this function a few years back was what got me into this mess!.

addAccessor

Tests for names ending in “_set”.

Called from functionWrapper.

Swig_overload_rank

Complex set of tests about precedence.

Called from dispatchFunction.

dispatchFunction

Called from functionWrapper.

Comparisons used to test R type names.

functionWrapper

Tests for “_set”, “void” return, leading “arg”.

registerClass

Checks for “class”.

classDeclaration

Checks for “typedef”, matching tdname,

generateCopyRoutines

Checks for character typemap

replaceInitialDash

Testing for the initial dash.

expandTypedef

checks for “f” and “p.f” prefix.

addCopyParameter

Checking type for “struct”, “p.struct”, “p.”.

typedefHandler

Tests for “struct “.

OutputMemberReferenceMethod

Looks for “get” (note - not “_get”), and various forms of “operator”.

Plans

Test coverage

Need to extend tests to include those functions not currently tested. If not possible, then the functions may be redundant.

Accessors - closer investigation

Now looking in detail at a simple case that causes accessor functions to be created. The test “struct_value.i” features a pair of nested structures:


struct Foo {
   int x;
};

struct Bar {
   Foo   a;
   struct Foo b;
};

The generate R code includes accessor functions named Bar_a_get, Bar_a_set etc.

How does this happen?

membervariableHandler sets a flag indicating that it is processing class or structure members, calls its superclass, then resets the flag. The Java module uses a similar approach.

The superclass (common to all languages) sets an attribute called “memberset”, which is deleted later. The code in R::functionWrapper should check this attribute instead of the symbol name. Similar approach for read accessors. Same approach to be used across all situations in which symbol names are tested for “set/get” suffixes.

addAccessor is called later on, also from functionWrapper, and builds the R function code. It can also use the attributes. Need to be a bit careful here as it seems to handle set and get methods, but only tests for _set suffixes.

OutputMemberReferenceMethod is called twice from classDeclaration - each time with explicit values for the isSet argument, depending on settings of class_member_functions and class_member_set_functions, the two lists which are set up in addAccessor. Also called from OutputClassMemberTable.

Accessor refactoring plans

Extended tests

Include tests that break the current implementation by including member variables with “_set” and “_get” suffixes.

Changes

Use memberset and memberget attributes everywhere.

Function return type refactoring

The R::createFunctionPointerHandler uses Strcmp to check for a void return type. The first comes into play at the end of the function, indicating that a return typename needs to be processed. There’s another use, checking for void parameters.

The java code doesn’t do things in the same way. In general it uses “Cmp” instead of Strcmp. This is a subtle detail that I’ll need to work on. There doesn’t appear to be a special SWIG way of checking for void return.

Actions

Further searching and queries to developers.

Name mangling

A moderate amount of code is used to construct names that encode types. For example, getRClassNameCopyStruct constructs R class names based on C types. There are string operations use to do this. However SWIG provides operations for this purpose - e.g SwigType_ispointer. The correct tests are used intermittently in the module.

SWIG generate R code

General comments

Classes and and structures have an R side class definition, usually with

R classes are prefixed with “_p”, and inherit from a C++ reference base class. The class names gets used in attributes of functions associated with that class.

SWIG generated R functions have attributes describing input and return types.

There is extra complexity around enumerated types, which have a related structure. There’s a typemap that uses the R_class attribute to decide which enumeration is being looked up.

Thus, changing name generation procedures has to be done carefully. This is need to do things like implement the %nspace feature, which toggles inclusion of namespaces in mangled names.

Action

Search and make sure all mangling is done to SWIG standards.

Implement the %nspace feature.